aboutsummaryrefslogtreecommitdiffstats
path: root/linguistics/src/test
Commit message (Expand)AuthorAgeFilesLines
* Replace all usages of Arrays.asList with List.of where possible.Henning Baldersheim2024-04-126-30/+25
* Merge pull request #30816 from vespa-engine/marius/add-significance-model-reg...Marius Arhaug2024-04-094-0/+110
|\
| * add illegal arg exception for languages not registeredMariusArhaug2024-04-091-0/+3
| * fix cr failuresMariusArhaug2024-04-094-10/+17
| * add significance model registry to linguisticsMariusArhaug2024-04-044-0/+100
* | Add SimpleTokenScript to SimpleTokenizerMariusArhaug2024-04-032-0/+39
|/
* Update copyrightJon Bratseth2023-10-0920-20/+20
* Don't remove indexable symbols when stemmingJon Bratseth2023-06-021-2/+2
* Always treat each symbol as a separate tokenJon Bratseth2023-05-222-3/+25
* Threat 'other symbols' as lettersJon Bratseth2023-05-221-0/+8
* Compute code points in whole string only when neededjonmv2022-12-061-1/+14
* Split out opennlp-linguisticsHenning Baldersheim2022-11-263-365/+0
* No functional changesJon Bratseth2022-09-111-2/+2
* Determine token types considering all charactersJon Bratseth2022-08-162-11/+44
* Expand test case for language detectionJon Marius Venstad2021-12-201-3/+28
* Revert "Merge pull request #20578 from vespa-engine/revert-20568-jonmv/replac...Jon Marius Venstad2021-12-203-35/+82
* Revert "Replace optimaize with OpenNLP language detector [run-systemtest]"Jon Marius Venstad2021-12-183-82/+35
* Re-add filesJon Marius Venstad2021-12-182-0/+82
* Replace optimaize with OpenNLP language detectorJon Marius Venstad2021-12-171-35/+0
* Time out requests after 200sJon Marius Venstad2021-12-131-1/+0
* Update 2020 Oath copyrights.gjoranv2021-10-271-1/+1
* Update Verizon Media copyright notices.gjoranv2021-10-071-1/+1
* Update 2017 copyright notices.gjoranv2021-10-0720-20/+20
* Separate component from linguisticsJon Bratseth2021-09-255-197/+0
* Refactor to separate classesJon Bratseth2021-09-171-2/+1
* Encode to sparse tensorJon Bratseth2021-09-161-0/+6
* Encode to dense tensorJon Bratseth2021-09-162-0/+23
* Make SentencePieceEncoder configurableJon Bratseth2021-09-163-30/+102
* More unit testsJon Bratseth2021-09-141-1/+20
* Pure Java sentencepiece implementationJon Bratseth2021-09-133-0/+78
* Revert "Merge pull request #17754 from vespa-engine/revert-17747-bratseth/spe...Jon Bratseth2021-05-051-0/+40
* Revert "Reapply "Bratseth/special tokens""Jon Bratseth2021-05-051-40/+0
* Revert "Merge pull request #17746 from vespa-engine/revert-17738-revert-17737...Jon Bratseth2021-05-051-0/+40
* Revert "Revert "Revert "Bratseth/special tokens"""Jon Bratseth2021-05-051-40/+0
* Revert "Revert "Bratseth/special tokens""Jon Bratseth2021-05-041-0/+40
* Revert "Bratseth/special tokens"Jon Bratseth2021-05-041-40/+0
* Expose tokens as mapJon Bratseth2021-05-041-5/+3
* Move specialtokens to linguisticsJon Bratseth2021-05-041-0/+42
* No functional changesJon Bratseth2021-04-141-37/+26
* No functional changesJon Bratseth2021-04-148-19/+16
* No functional changesJon Bratseth2021-02-031-0/+19
* handle plugin tokenizer returning tokens with empty original stringArne Juul2020-08-241-0/+51
* Minor unification of tests.Henning Baldersheim2020-08-122-20/+36
* Surrogate aware gram splittingJon Bratseth2020-06-251-9/+37
* Add/corect copyright headersJon Bratseth2020-01-031-0/+1
* Remove deprecated apis in linguistics.gjoranv2019-01-211-27/+0
* Deprecated methods and add OptimaizeDetectorJon Bratseth2018-11-012-6/+36
* use com.optimaize.langdetect for lang detectionJefim Matskin2018-07-241-0/+5
* add opennlp stemmers - revert previous changesJefim Matskin2018-07-183-5/+238
* add lang detection and opennlp stemmersJefim Matskin2018-07-172-0/+6