aboutsummaryrefslogtreecommitdiffstats
path: root/linguistics-components/src/test/java/com/yahoo/language
Commit message (Expand)AuthorAgeFilesLines
* Update copyrightJon Bratseth2023-10-095-6/+6
* Prefer truncation configuration from tokenizer modelBjørn Christian Seime2023-06-121-1/+8
* Test padding with truncationBjørn Christian Seime2023-06-081-2/+3
* Verify presence of special tokenBjørn Christian Seime2023-06-081-2/+7
* Disable padding and make it configurableBjørn Christian Seime2023-06-081-9/+23
* Make truncation and max length configurableBjørn Christian Seime2023-05-261-2/+28
* Revert "Revert "Bjorncs/huggingface tokenizer""Bjørn Christian Seime2023-05-121-0/+88
* Revert "Bjorncs/huggingface tokenizer"Arnstein Ressem2023-05-121-88/+0
* Disable special tokens by defaultBjørn Christian Seime2023-05-111-0/+1
* Make HF tokenizer a separate embedderBjørn Christian Seime2023-05-111-0/+87
* Add skipping of control tokensLester Solbakken2023-02-102-1/+12
* Add decoding of sentencepiece token sequence to textLester Solbakken2023-02-102-0/+16
* Test segmentation with subwordsJon Bratseth2021-12-172-4/+13
* BERT -> WordPiece, make subword prefix configurableJon Bratseth2021-12-176-129/+119
* Add a BERT embedderJon Bratseth2021-12-162-6/+54
* Encapsulate in a contextJon Bratseth2021-10-011-2/+3
* Update linguisticvs-componentsJon Bratseth2021-09-301-2/+2
* encode -> embedJon Bratseth2021-09-283-23/+23
* Separate component from linguisticsJon Bratseth2021-09-253-0/+197