Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Update copyright | Jon Bratseth | 2023-10-09 | 20 | -20/+20 |
| | |||||
* | Don't remove indexable symbols when stemming | Jon Bratseth | 2023-06-02 | 1 | -2/+2 |
| | |||||
* | Always treat each symbol as a separate token | Jon Bratseth | 2023-05-22 | 2 | -3/+25 |
| | |||||
* | Threat 'other symbols' as letters | Jon Bratseth | 2023-05-22 | 1 | -0/+8 |
| | | | | | The unicode class 'other symbols' contains emojis, math symbols, etc. Treat these as letter characters to support searching for them. | ||||
* | Compute code points in whole string only when needed | jonmv | 2022-12-06 | 1 | -1/+14 |
| | |||||
* | Split out opennlp-linguistics | Henning Baldersheim | 2022-11-26 | 3 | -365/+0 |
| | |||||
* | No functional changes | Jon Bratseth | 2022-09-11 | 1 | -2/+2 |
| | |||||
* | Determine token types considering all characters | Jon Bratseth | 2022-08-16 | 2 | -11/+44 |
| | |||||
* | Expand test case for language detection | Jon Marius Venstad | 2021-12-20 | 1 | -3/+28 |
| | |||||
* | Revert "Merge pull request #20578 from ↵ | Jon Marius Venstad | 2021-12-20 | 3 | -35/+82 |
| | | | | | | | vespa-engine/revert-20568-jonmv/replace-optimaize-with-lingua" This reverts commit 5476504932cd90eb2dad82dbab633e3ffa2034c3, reversing changes made to 235a78cc4707f78d18c6818a577de1b7507f5e40. | ||||
* | Revert "Replace optimaize with OpenNLP language detector [run-systemtest]" | Jon Marius Venstad | 2021-12-18 | 3 | -82/+35 |
| | |||||
* | Re-add files | Jon Marius Venstad | 2021-12-18 | 2 | -0/+82 |
| | |||||
* | Replace optimaize with OpenNLP language detector | Jon Marius Venstad | 2021-12-17 | 1 | -35/+0 |
| | |||||
* | Time out requests after 200s | Jon Marius Venstad | 2021-12-13 | 1 | -1/+0 |
| | |||||
* | Update 2020 Oath copyrights. | gjoranv | 2021-10-27 | 1 | -1/+1 |
| | |||||
* | Update Verizon Media copyright notices. | gjoranv | 2021-10-07 | 1 | -1/+1 |
| | |||||
* | Update 2017 copyright notices. | gjoranv | 2021-10-07 | 20 | -20/+20 |
| | |||||
* | Separate component from linguistics | Jon Bratseth | 2021-09-25 | 3 | -197/+0 |
| | |||||
* | Refactor to separate classes | Jon Bratseth | 2021-09-17 | 1 | -2/+1 |
| | |||||
* | Encode to sparse tensor | Jon Bratseth | 2021-09-16 | 1 | -0/+6 |
| | |||||
* | Encode to dense tensor | Jon Bratseth | 2021-09-16 | 2 | -0/+23 |
| | |||||
* | Make SentencePieceEncoder configurable | Jon Bratseth | 2021-09-16 | 3 | -30/+102 |
| | |||||
* | More unit tests | Jon Bratseth | 2021-09-14 | 1 | -1/+20 |
| | |||||
* | Pure Java sentencepiece implementation | Jon Bratseth | 2021-09-13 | 1 | -0/+78 |
| | |||||
* | Revert "Merge pull request #17754 from ↵ | Jon Bratseth | 2021-05-05 | 1 | -0/+40 |
| | | | | | | | vespa-engine/revert-17747-bratseth/special-tokens-take-2" This reverts commit a2c9cd4bc04f1a3eaa31524b3970b96be5c2eda9, reversing changes made to 8c61a373af0066fbdf1cca354c24b197c7347321. | ||||
* | Revert "Reapply "Bratseth/special tokens"" | Jon Bratseth | 2021-05-05 | 1 | -40/+0 |
| | |||||
* | Revert "Merge pull request #17746 from ↵ | Jon Bratseth | 2021-05-05 | 1 | -0/+40 |
| | | | | | | | vespa-engine/revert-17738-revert-17737-revert-17736-bratseth/special-tokens" This reverts commit 491856b396d003885e159345fe3f533f0fa35933, reversing changes made to 3720186303f4aef1d185525eaf61092097a64ec9. | ||||
* | Revert "Revert "Revert "Bratseth/special tokens""" | Jon Bratseth | 2021-05-05 | 1 | -40/+0 |
| | |||||
* | Revert "Revert "Bratseth/special tokens"" | Jon Bratseth | 2021-05-04 | 1 | -0/+40 |
| | |||||
* | Revert "Bratseth/special tokens" | Jon Bratseth | 2021-05-04 | 1 | -40/+0 |
| | |||||
* | Expose tokens as map | Jon Bratseth | 2021-05-04 | 1 | -5/+3 |
| | |||||
* | Move specialtokens to linguistics | Jon Bratseth | 2021-05-04 | 1 | -0/+42 |
| | |||||
* | No functional changes | Jon Bratseth | 2021-04-14 | 1 | -37/+26 |
| | |||||
* | No functional changes | Jon Bratseth | 2021-04-14 | 8 | -19/+16 |
| | |||||
* | No functional changes | Jon Bratseth | 2021-02-03 | 1 | -0/+19 |
| | |||||
* | handle plugin tokenizer returning tokens with empty original string | Arne Juul | 2020-08-24 | 1 | -0/+51 |
| | |||||
* | Minor unification of tests. | Henning Baldersheim | 2020-08-12 | 2 | -20/+36 |
| | |||||
* | Surrogate aware gram splitting | Jon Bratseth | 2020-06-25 | 1 | -9/+37 |
| | |||||
* | Add/corect copyright headers | Jon Bratseth | 2020-01-03 | 1 | -0/+1 |
| | |||||
* | Remove deprecated apis in linguistics. | gjoranv | 2019-01-21 | 1 | -27/+0 |
| | |||||
* | Deprecated methods and add OptimaizeDetector | Jon Bratseth | 2018-11-01 | 2 | -6/+36 |
| | |||||
* | use com.optimaize.langdetect for lang detection | Jefim Matskin | 2018-07-24 | 1 | -0/+5 |
| | |||||
* | add opennlp stemmers - revert previous changes | Jefim Matskin | 2018-07-18 | 3 | -5/+238 |
| | | | | https://github.com/vespa-engine/vespa/issues/6403 | ||||
* | add lang detection and opennlp stemmers | Jefim Matskin | 2018-07-17 | 2 | -0/+6 |
| | | | | https://github.com/vespa-engine/vespa/issues/6403 | ||||
* | Fix author tag for Simon | Bjørn Christian Seime | 2018-07-05 | 12 | -12/+12 |
| | |||||
* | Update copyright headers | Jon Bratseth | 2017-06-14 | 20 | -20/+20 |
| | |||||
* | Revert "Update copyright headers" | Jon Bratseth | 2017-06-14 | 20 | -20/+20 |
| | |||||
* | Update copyright headers | Jon Bratseth | 2017-06-14 | 20 | -20/+20 |
| | |||||
* | Remove carriage return | Jon Bratseth | 2017-06-14 | 1 | -1/+1 |
| | |||||
* | Revert "Copyright header" | Jon Bratseth | 2017-06-13 | 20 | -21/+21 |
| |