aboutsummaryrefslogtreecommitdiffstats
path: root/linguistics-components/src/test/java/com/yahoo/language
Commit message (Collapse)AuthorAgeFilesLines
* Update copyrightJon Bratseth2023-10-095-6/+6
|
* Prefer truncation configuration from tokenizer modelBjørn Christian Seime2023-06-121-1/+8
| | | | | | | Only override truncation if not specified or max length exceeds max tokens accepted by model. Use JNI wrapper directly to determine existing truncation configuration (JSON format is not really documented). Simply configuration for pure tokenizer embedder. Disable DJL usage telemetry.
* Test padding with truncationBjørn Christian Seime2023-06-081-2/+3
|
* Verify presence of special tokenBjørn Christian Seime2023-06-081-2/+7
|
* Disable padding and make it configurableBjørn Christian Seime2023-06-081-9/+23
|
* Make truncation and max length configurableBjørn Christian Seime2023-05-261-2/+28
|
* Revert "Revert "Bjorncs/huggingface tokenizer""Bjørn Christian Seime2023-05-121-0/+88
| | | | This reverts commit 2bb74878879b3acb1919fd658b8f2c476d8129d6.
* Revert "Bjorncs/huggingface tokenizer"Arnstein Ressem2023-05-121-88/+0
|
* Disable special tokens by defaultBjørn Christian Seime2023-05-111-0/+1
|
* Make HF tokenizer a separate embedderBjørn Christian Seime2023-05-111-0/+87
|
* Add skipping of control tokensLester Solbakken2023-02-102-1/+12
|
* Add decoding of sentencepiece token sequence to textLester Solbakken2023-02-102-0/+16
|
* Test segmentation with subwordsJon Bratseth2021-12-172-4/+13
|
* BERT -> WordPiece, make subword prefix configurableJon Bratseth2021-12-176-129/+119
|
* Add a BERT embedderJon Bratseth2021-12-162-6/+54
|
* Encapsulate in a contextJon Bratseth2021-10-011-2/+3
|
* Update linguisticvs-componentsJon Bratseth2021-09-301-2/+2
|
* encode -> embedJon Bratseth2021-09-283-23/+23
|
* Separate component from linguisticsJon Bratseth2021-09-253-0/+197