aboutsummaryrefslogtreecommitdiffstats
path: root/linguistics/src/test/java/com/yahoo
Commit message (Expand)AuthorAgeFilesLines
* Add multiple languages to significance model mapMariusArhaug4 days1-1/+32
* Add zst support for significance cli and registryMariusArhaug7 days1-0/+21
* Fix CR commentsMariusArhaug2024-04-301-2/+8
* Update significance model field and logic from architect meetingMariusArhaug2024-04-241-5/+38
* Merge pull request #30871 from vespa-engine/marius/add-significance-searcherMarius Arhaug2024-04-241-6/+10
|\
| * fix cr failuresMariusArhaug2024-04-161-6/+10
* | Replace all usages of Arrays.asList with List.of where possible.Henning Baldersheim2024-04-126-30/+25
|/
* Merge pull request #30816 from vespa-engine/marius/add-significance-model-reg...Marius Arhaug2024-04-092-0/+79
|\
| * add illegal arg exception for languages not registeredMariusArhaug2024-04-091-0/+3
| * fix cr failuresMariusArhaug2024-04-092-4/+11
| * add significance model registry to linguisticsMariusArhaug2024-04-042-0/+69
* | Add SimpleTokenScript to SimpleTokenizerMariusArhaug2024-04-032-0/+39
|/
* Update copyrightJon Bratseth2023-10-0920-20/+20
* Don't remove indexable symbols when stemmingJon Bratseth2023-06-021-2/+2
* Always treat each symbol as a separate tokenJon Bratseth2023-05-222-3/+25
* Threat 'other symbols' as lettersJon Bratseth2023-05-221-0/+8
* Compute code points in whole string only when neededjonmv2022-12-061-1/+14
* Split out opennlp-linguisticsHenning Baldersheim2022-11-263-365/+0
* No functional changesJon Bratseth2022-09-111-2/+2
* Determine token types considering all charactersJon Bratseth2022-08-162-11/+44
* Expand test case for language detectionJon Marius Venstad2021-12-201-3/+28
* Revert "Merge pull request #20578 from vespa-engine/revert-20568-jonmv/replac...Jon Marius Venstad2021-12-203-35/+82
* Revert "Replace optimaize with OpenNLP language detector [run-systemtest]"Jon Marius Venstad2021-12-183-82/+35
* Re-add filesJon Marius Venstad2021-12-182-0/+82
* Replace optimaize with OpenNLP language detectorJon Marius Venstad2021-12-171-35/+0
* Time out requests after 200sJon Marius Venstad2021-12-131-1/+0
* Update 2020 Oath copyrights.gjoranv2021-10-271-1/+1
* Update Verizon Media copyright notices.gjoranv2021-10-071-1/+1
* Update 2017 copyright notices.gjoranv2021-10-0720-20/+20
* Separate component from linguisticsJon Bratseth2021-09-253-197/+0
* Refactor to separate classesJon Bratseth2021-09-171-2/+1
* Encode to sparse tensorJon Bratseth2021-09-161-0/+6
* Encode to dense tensorJon Bratseth2021-09-162-0/+23
* Make SentencePieceEncoder configurableJon Bratseth2021-09-163-30/+102
* More unit testsJon Bratseth2021-09-141-1/+20
* Pure Java sentencepiece implementationJon Bratseth2021-09-131-0/+78
* Revert "Merge pull request #17754 from vespa-engine/revert-17747-bratseth/spe...Jon Bratseth2021-05-051-0/+40
* Revert "Reapply "Bratseth/special tokens""Jon Bratseth2021-05-051-40/+0
* Revert "Merge pull request #17746 from vespa-engine/revert-17738-revert-17737...Jon Bratseth2021-05-051-0/+40
* Revert "Revert "Revert "Bratseth/special tokens"""Jon Bratseth2021-05-051-40/+0
* Revert "Revert "Bratseth/special tokens""Jon Bratseth2021-05-041-0/+40
* Revert "Bratseth/special tokens"Jon Bratseth2021-05-041-40/+0
* Expose tokens as mapJon Bratseth2021-05-041-5/+3
* Move specialtokens to linguisticsJon Bratseth2021-05-041-0/+42
* No functional changesJon Bratseth2021-04-141-37/+26
* No functional changesJon Bratseth2021-04-148-19/+16
* No functional changesJon Bratseth2021-02-031-0/+19
* handle plugin tokenizer returning tokens with empty original stringArne Juul2020-08-241-0/+51
* Minor unification of tests.Henning Baldersheim2020-08-122-20/+36
* Surrogate aware gram splittingJon Bratseth2020-06-251-9/+37