Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Trace no stemming due to language=UNKNOWN | Jon Bratseth | 2024-05-12 | 1 | -1/+1 |
| | |||||
* | add comment for intention in determineScript function | MariusArhaug | 2024-04-04 | 1 | -0/+1 |
| | |||||
* | Add SimpleTokenScript to SimpleTokenizer | MariusArhaug | 2024-04-03 | 2 | -1/+85 |
| | | | | | | | | When parsing datasets such as WikiDumps to a significance model, we want to only keep characters of that language script within our model. So when adding the script value to our tokenizer we are able to use this to filter out non-latin words when creating an english significance model for example. | ||||
* | Revert "Merge pull request #29328 from ↵ | Jon Bratseth | 2023-11-14 | 2 | -11/+28 |
| | | | | | | | vespa-engine/revert-29314-bratseth/casing-take-2" This reverts commit a72e949533a46d665440a9c72ca2b8fb58f3a9c3, reversing changes made to 944d635d00e165166508ef23399e9ed65a87a9c8. | ||||
* | Revert "Bratseth/casing take 2" | Harald Musum | 2023-11-13 | 2 | -28/+11 |
| | |||||
* | Prefer first stem to original if non equal | Jon Bratseth | 2023-11-10 | 2 | -11/+28 |
| | |||||
* | Update copyright | Jon Bratseth | 2023-10-09 | 21 | -21/+21 |
| | |||||
* | Don't remove indexable symbols when stemming | Jon Bratseth | 2023-06-02 | 2 | -2/+9 |
| | |||||
* | Use dollar and hour base units | Jon Bratseth | 2023-05-19 | 1 | -2/+2 |
| | |||||
* | No functional changes | Jon Bratseth | 2022-09-11 | 1 | -11/+9 |
| | |||||
* | Determine token types considering all characters | Jon Bratseth | 2022-08-16 | 3 | -85/+84 |
| | |||||
* | Use '@Inject' from 'annotations' in multiple bundles | Bjørn Christian Seime | 2022-05-06 | 1 | -1/+1 |
| | |||||
* | annotate intentional switch fallthrough | Arne H Juul | 2022-01-06 | 1 | -0/+1 |
| | |||||
* | Revert "Merge pull request #20578 from ↵ | Jon Marius Venstad | 2021-12-20 | 2 | -4/+6 |
| | | | | | | | vespa-engine/revert-20568-jonmv/replace-optimaize-with-lingua" This reverts commit 5476504932cd90eb2dad82dbab633e3ffa2034c3, reversing changes made to 235a78cc4707f78d18c6818a577de1b7507f5e40. | ||||
* | Revert "Replace optimaize with OpenNLP language detector [run-systemtest]" | Jon Marius Venstad | 2021-12-18 | 2 | -6/+4 |
| | |||||
* | Replace optimaize with OpenNLP language detector | Jon Marius Venstad | 2021-12-17 | 2 | -4/+6 |
| | |||||
* | Update 2017 copyright notices. | gjoranv | 2021-10-07 | 21 | -21/+21 |
| | |||||
* | Linguistics cleanup | Jon Bratseth | 2021-09-21 | 3 | -9/+2 |
| | |||||
* | Pure Java sentencepiece implementation | Jon Bratseth | 2021-09-13 | 1 | -2/+3 |
| | |||||
* | we want to compare Linguistics objects for equivalence | Arne Juul | 2021-08-04 | 1 | -0/+2 |
| | |||||
* | Require replacements to be applied during tokenization | Jon Bratseth | 2021-06-15 | 1 | -0/+4 |
| | |||||
* | Revert "Merge pull request #17754 from ↵ | Jon Bratseth | 2021-05-05 | 2 | -11/+23 |
| | | | | | | | vespa-engine/revert-17747-bratseth/special-tokens-take-2" This reverts commit a2c9cd4bc04f1a3eaa31524b3970b96be5c2eda9, reversing changes made to 8c61a373af0066fbdf1cca354c24b197c7347321. | ||||
* | Revert "Reapply "Bratseth/special tokens"" | Jon Bratseth | 2021-05-05 | 2 | -23/+11 |
| | |||||
* | Revert "Merge pull request #17746 from ↵ | Jon Bratseth | 2021-05-05 | 2 | -11/+23 |
| | | | | | | | vespa-engine/revert-17738-revert-17737-revert-17736-bratseth/special-tokens" This reverts commit 491856b396d003885e159345fe3f533f0fa35933, reversing changes made to 3720186303f4aef1d185525eaf61092097a64ec9. | ||||
* | Revert "Revert "Revert "Bratseth/special tokens""" | Jon Bratseth | 2021-05-05 | 2 | -23/+11 |
| | |||||
* | Revert "Revert "Bratseth/special tokens"" | Jon Bratseth | 2021-05-04 | 2 | -11/+23 |
| | |||||
* | Revert "Bratseth/special tokens" | Jon Bratseth | 2021-05-04 | 2 | -23/+11 |
| | |||||
* | Avoid config in simple tokenizer | Jon Bratseth | 2021-05-04 | 1 | -7/+4 |
| | |||||
* | Wire in (but don't use) SpecialTokens | Jon Bratseth | 2021-05-04 | 2 | -12/+27 |
| | |||||
* | No functional changes | Jon Bratseth | 2021-04-14 | 1 | -1/+0 |
| | |||||
* | variables in lambdas must be final | Arne Juul | 2020-04-24 | 1 | -5/+8 |
| | |||||
* | Apply suggestions from code review | Arne H Juul | 2020-04-24 | 1 | -2/+2 |
| | | | Co-Authored-By: Jon Bratseth <bratseth@oath.com> | ||||
* | add more tracing and debug logging of stemming | Arne Juul | 2020-04-24 | 1 | -1/+9 |
| | |||||
* | Remove deprecated method (again) | Jon Bratseth | 2019-01-21 | 1 | -7/+0 |
| | |||||
* | Make SimpleLinguistics simple again | Jon Bratseth | 2019-01-21 | 2 | -97/+3 |
| | | | | | - Remove SimpleLinguistics config and optional use of Optimaize - Add Optimaize to OpennlpLinguistics; on by default and config to disable | ||||
* | Remove deprecated apis in linguistics. | gjoranv | 2019-01-21 | 1 | -10/+0 |
| | |||||
* | Deprecated methods and add OptimaizeDetector | Jon Bratseth | 2018-11-01 | 2 | -0/+13 |
| | |||||
* | Prepare for removal of deprecated members | Jon Bratseth | 2018-10-16 | 1 | -1/+2 |
| | |||||
* | Reduce code duplication | Henning Baldersheim | 2018-10-05 | 1 | -14/+8 |
| | |||||
* | Do not create huge optimaize structures when not necessary. | Henning Baldersheim | 2018-10-05 | 2 | -1/+9 |
| | |||||
* | Defer loading the huge optimaize knowledgepool until you really need it. ↵ | Henning Baldersheim | 2018-09-10 | 1 | -20/+32 |
| | | | | This cuts min memory footprint by 100MB+. | ||||
* | Send global constants | Jon Bratseth | 2018-09-06 | 1 | -0/+1 |
| | |||||
* | Add config for simple-linguistics | Bjørn Christian Seime | 2018-07-26 | 2 | -8/+40 |
| | | | | Add a config parameter for enabling/disabling optimaize detector | ||||
* | use com.optimaize.langdetect for lang detection | Jefim Matskin | 2018-07-24 | 1 | -2/+52 |
| | |||||
* | Fix author tag for Simon | Bjørn Christian Seime | 2018-07-05 | 2 | -2/+2 |
| | |||||
* | Merge pull request #6228 from vespa-engine/bratseth/nonfunctional-changes | gjoranv | 2018-06-19 | 3 | -4/+4 |
|\ | | | | | Nonfunctional changes only | ||||
| * | Nonfunctional changes only | Jon Bratseth | 2018-06-19 | 3 | -4/+4 |
| | | |||||
* | | make function public | Jefim Matskin | 2018-06-18 | 1 | -1/+1 |
| | | | | | | make function public to facilitate other lingustics | ||||
* | | Please make the class public | Jefim Matskin | 2018-06-18 | 1 | -1/+1 |
|/ | | | Please make the class public to facilitate work on custom liguistics | ||||
* | Update copyright headers | Jon Bratseth | 2017-06-14 | 21 | -21/+21 |
| |