aboutsummaryrefslogtreecommitdiffstats
path: root/linguistics/src/test/java/com
Commit message (Collapse)AuthorAgeFilesLines
* Update copyrightJon Bratseth2023-10-0920-20/+20
|
* Don't remove indexable symbols when stemmingJon Bratseth2023-06-021-2/+2
|
* Always treat each symbol as a separate tokenJon Bratseth2023-05-222-3/+25
|
* Threat 'other symbols' as lettersJon Bratseth2023-05-221-0/+8
| | | | | The unicode class 'other symbols' contains emojis, math symbols, etc. Treat these as letter characters to support searching for them.
* Compute code points in whole string only when neededjonmv2022-12-061-1/+14
|
* Split out opennlp-linguisticsHenning Baldersheim2022-11-263-365/+0
|
* No functional changesJon Bratseth2022-09-111-2/+2
|
* Determine token types considering all charactersJon Bratseth2022-08-162-11/+44
|
* Expand test case for language detectionJon Marius Venstad2021-12-201-3/+28
|
* Revert "Merge pull request #20578 from ↵Jon Marius Venstad2021-12-203-35/+82
| | | | | | | vespa-engine/revert-20568-jonmv/replace-optimaize-with-lingua" This reverts commit 5476504932cd90eb2dad82dbab633e3ffa2034c3, reversing changes made to 235a78cc4707f78d18c6818a577de1b7507f5e40.
* Revert "Replace optimaize with OpenNLP language detector [run-systemtest]"Jon Marius Venstad2021-12-183-82/+35
|
* Re-add filesJon Marius Venstad2021-12-182-0/+82
|
* Replace optimaize with OpenNLP language detectorJon Marius Venstad2021-12-171-35/+0
|
* Time out requests after 200sJon Marius Venstad2021-12-131-1/+0
|
* Update 2020 Oath copyrights.gjoranv2021-10-271-1/+1
|
* Update Verizon Media copyright notices.gjoranv2021-10-071-1/+1
|
* Update 2017 copyright notices.gjoranv2021-10-0720-20/+20
|
* Separate component from linguisticsJon Bratseth2021-09-253-197/+0
|
* Refactor to separate classesJon Bratseth2021-09-171-2/+1
|
* Encode to sparse tensorJon Bratseth2021-09-161-0/+6
|
* Encode to dense tensorJon Bratseth2021-09-162-0/+23
|
* Make SentencePieceEncoder configurableJon Bratseth2021-09-163-30/+102
|
* More unit testsJon Bratseth2021-09-141-1/+20
|
* Pure Java sentencepiece implementationJon Bratseth2021-09-131-0/+78
|
* Revert "Merge pull request #17754 from ↵Jon Bratseth2021-05-051-0/+40
| | | | | | | vespa-engine/revert-17747-bratseth/special-tokens-take-2" This reverts commit a2c9cd4bc04f1a3eaa31524b3970b96be5c2eda9, reversing changes made to 8c61a373af0066fbdf1cca354c24b197c7347321.
* Revert "Reapply "Bratseth/special tokens""Jon Bratseth2021-05-051-40/+0
|
* Revert "Merge pull request #17746 from ↵Jon Bratseth2021-05-051-0/+40
| | | | | | | vespa-engine/revert-17738-revert-17737-revert-17736-bratseth/special-tokens" This reverts commit 491856b396d003885e159345fe3f533f0fa35933, reversing changes made to 3720186303f4aef1d185525eaf61092097a64ec9.
* Revert "Revert "Revert "Bratseth/special tokens"""Jon Bratseth2021-05-051-40/+0
|
* Revert "Revert "Bratseth/special tokens""Jon Bratseth2021-05-041-0/+40
|
* Revert "Bratseth/special tokens"Jon Bratseth2021-05-041-40/+0
|
* Expose tokens as mapJon Bratseth2021-05-041-5/+3
|
* Move specialtokens to linguisticsJon Bratseth2021-05-041-0/+42
|
* No functional changesJon Bratseth2021-04-141-37/+26
|
* No functional changesJon Bratseth2021-04-148-19/+16
|
* No functional changesJon Bratseth2021-02-031-0/+19
|
* handle plugin tokenizer returning tokens with empty original stringArne Juul2020-08-241-0/+51
|
* Minor unification of tests.Henning Baldersheim2020-08-122-20/+36
|
* Surrogate aware gram splittingJon Bratseth2020-06-251-9/+37
|
* Add/corect copyright headersJon Bratseth2020-01-031-0/+1
|
* Remove deprecated apis in linguistics.gjoranv2019-01-211-27/+0
|
* Deprecated methods and add OptimaizeDetectorJon Bratseth2018-11-012-6/+36
|
* use com.optimaize.langdetect for lang detectionJefim Matskin2018-07-241-0/+5
|
* add opennlp stemmers - revert previous changesJefim Matskin2018-07-183-5/+238
| | | | https://github.com/vespa-engine/vespa/issues/6403
* add lang detection and opennlp stemmersJefim Matskin2018-07-172-0/+6
| | | | https://github.com/vespa-engine/vespa/issues/6403
* Fix author tag for SimonBjørn Christian Seime2018-07-0512-12/+12
|
* Update copyright headersJon Bratseth2017-06-1420-20/+20
|
* Revert "Update copyright headers"Jon Bratseth2017-06-1420-20/+20
|
* Update copyright headersJon Bratseth2017-06-1420-20/+20
|
* Remove carriage returnJon Bratseth2017-06-141-1/+1
|
* Revert "Copyright header"Jon Bratseth2017-06-1320-21/+21
|