aboutsummaryrefslogtreecommitdiffstats
path: root/linguistics/src/main/java/com/yahoo/language/simple
Commit message (Collapse)AuthorAgeFilesLines
* Trace no stemming due to language=UNKNOWNJon Bratseth2024-05-121-1/+1
|
* add comment for intention in determineScript functionMariusArhaug2024-04-041-0/+1
|
* Add SimpleTokenScript to SimpleTokenizerMariusArhaug2024-04-032-1/+85
| | | | | | | | When parsing datasets such as WikiDumps to a significance model, we want to only keep characters of that language script within our model. So when adding the script value to our tokenizer we are able to use this to filter out non-latin words when creating an english significance model for example.
* Revert "Merge pull request #29328 from ↵Jon Bratseth2023-11-142-11/+28
| | | | | | | vespa-engine/revert-29314-bratseth/casing-take-2" This reverts commit a72e949533a46d665440a9c72ca2b8fb58f3a9c3, reversing changes made to 944d635d00e165166508ef23399e9ed65a87a9c8.
* Revert "Bratseth/casing take 2"Harald Musum2023-11-132-28/+11
|
* Prefer first stem to original if non equalJon Bratseth2023-11-102-11/+28
|
* Update copyrightJon Bratseth2023-10-0921-21/+21
|
* Don't remove indexable symbols when stemmingJon Bratseth2023-06-022-2/+9
|
* Use dollar and hour base unitsJon Bratseth2023-05-191-2/+2
|
* No functional changesJon Bratseth2022-09-111-11/+9
|
* Determine token types considering all charactersJon Bratseth2022-08-163-85/+84
|
* Use '@Inject' from 'annotations' in multiple bundlesBjørn Christian Seime2022-05-061-1/+1
|
* annotate intentional switch fallthroughArne H Juul2022-01-061-0/+1
|
* Revert "Merge pull request #20578 from ↵Jon Marius Venstad2021-12-202-4/+6
| | | | | | | vespa-engine/revert-20568-jonmv/replace-optimaize-with-lingua" This reverts commit 5476504932cd90eb2dad82dbab633e3ffa2034c3, reversing changes made to 235a78cc4707f78d18c6818a577de1b7507f5e40.
* Revert "Replace optimaize with OpenNLP language detector [run-systemtest]"Jon Marius Venstad2021-12-182-6/+4
|
* Replace optimaize with OpenNLP language detectorJon Marius Venstad2021-12-172-4/+6
|
* Update 2017 copyright notices.gjoranv2021-10-0721-21/+21
|
* Linguistics cleanupJon Bratseth2021-09-213-9/+2
|
* Pure Java sentencepiece implementationJon Bratseth2021-09-131-2/+3
|
* we want to compare Linguistics objects for equivalenceArne Juul2021-08-041-0/+2
|
* Require replacements to be applied during tokenizationJon Bratseth2021-06-151-0/+4
|
* Revert "Merge pull request #17754 from ↵Jon Bratseth2021-05-052-11/+23
| | | | | | | vespa-engine/revert-17747-bratseth/special-tokens-take-2" This reverts commit a2c9cd4bc04f1a3eaa31524b3970b96be5c2eda9, reversing changes made to 8c61a373af0066fbdf1cca354c24b197c7347321.
* Revert "Reapply "Bratseth/special tokens""Jon Bratseth2021-05-052-23/+11
|
* Revert "Merge pull request #17746 from ↵Jon Bratseth2021-05-052-11/+23
| | | | | | | vespa-engine/revert-17738-revert-17737-revert-17736-bratseth/special-tokens" This reverts commit 491856b396d003885e159345fe3f533f0fa35933, reversing changes made to 3720186303f4aef1d185525eaf61092097a64ec9.
* Revert "Revert "Revert "Bratseth/special tokens"""Jon Bratseth2021-05-052-23/+11
|
* Revert "Revert "Bratseth/special tokens""Jon Bratseth2021-05-042-11/+23
|
* Revert "Bratseth/special tokens"Jon Bratseth2021-05-042-23/+11
|
* Avoid config in simple tokenizerJon Bratseth2021-05-041-7/+4
|
* Wire in (but don't use) SpecialTokensJon Bratseth2021-05-042-12/+27
|
* No functional changesJon Bratseth2021-04-141-1/+0
|
* variables in lambdas must be finalArne Juul2020-04-241-5/+8
|
* Apply suggestions from code reviewArne H Juul2020-04-241-2/+2
| | | Co-Authored-By: Jon Bratseth <bratseth@oath.com>
* add more tracing and debug logging of stemmingArne Juul2020-04-241-1/+9
|
* Remove deprecated method (again)Jon Bratseth2019-01-211-7/+0
|
* Make SimpleLinguistics simple againJon Bratseth2019-01-212-97/+3
| | | | | - Remove SimpleLinguistics config and optional use of Optimaize - Add Optimaize to OpennlpLinguistics; on by default and config to disable
* Remove deprecated apis in linguistics.gjoranv2019-01-211-10/+0
|
* Deprecated methods and add OptimaizeDetectorJon Bratseth2018-11-012-0/+13
|
* Prepare for removal of deprecated membersJon Bratseth2018-10-161-1/+2
|
* Reduce code duplicationHenning Baldersheim2018-10-051-14/+8
|
* Do not create huge optimaize structures when not necessary.Henning Baldersheim2018-10-052-1/+9
|
* Defer loading the huge optimaize knowledgepool until you really need it. ↵Henning Baldersheim2018-09-101-20/+32
| | | | This cuts min memory footprint by 100MB+.
* Send global constantsJon Bratseth2018-09-061-0/+1
|
* Add config for simple-linguisticsBjørn Christian Seime2018-07-262-8/+40
| | | | Add a config parameter for enabling/disabling optimaize detector
* use com.optimaize.langdetect for lang detectionJefim Matskin2018-07-241-2/+52
|
* Fix author tag for SimonBjørn Christian Seime2018-07-052-2/+2
|
* Merge pull request #6228 from vespa-engine/bratseth/nonfunctional-changesgjoranv2018-06-193-4/+4
|\ | | | | Nonfunctional changes only
| * Nonfunctional changes onlyJon Bratseth2018-06-193-4/+4
| |
* | make function publicJefim Matskin2018-06-181-1/+1
| | | | | | make function public to facilitate other lingustics
* | Please make the class publicJefim Matskin2018-06-181-1/+1
|/ | | Please make the class public to facilitate work on custom liguistics
* Update copyright headersJon Bratseth2017-06-1421-21/+21
|