Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Update copyright | Jon Bratseth | 2023-10-09 | 5 | -6/+6 |
| | |||||
* | Prefer truncation configuration from tokenizer model | Bjørn Christian Seime | 2023-06-12 | 1 | -1/+8 |
| | | | | | | | Only override truncation if not specified or max length exceeds max tokens accepted by model. Use JNI wrapper directly to determine existing truncation configuration (JSON format is not really documented). Simply configuration for pure tokenizer embedder. Disable DJL usage telemetry. | ||||
* | Test padding with truncation | Bjørn Christian Seime | 2023-06-08 | 1 | -2/+3 |
| | |||||
* | Verify presence of special token | Bjørn Christian Seime | 2023-06-08 | 1 | -2/+7 |
| | |||||
* | Disable padding and make it configurable | Bjørn Christian Seime | 2023-06-08 | 1 | -9/+23 |
| | |||||
* | Make truncation and max length configurable | Bjørn Christian Seime | 2023-05-26 | 1 | -2/+28 |
| | |||||
* | Revert "Revert "Bjorncs/huggingface tokenizer"" | Bjørn Christian Seime | 2023-05-12 | 1 | -0/+88 |
| | | | | This reverts commit 2bb74878879b3acb1919fd658b8f2c476d8129d6. | ||||
* | Revert "Bjorncs/huggingface tokenizer" | Arnstein Ressem | 2023-05-12 | 1 | -88/+0 |
| | |||||
* | Disable special tokens by default | Bjørn Christian Seime | 2023-05-11 | 1 | -0/+1 |
| | |||||
* | Make HF tokenizer a separate embedder | Bjørn Christian Seime | 2023-05-11 | 1 | -0/+87 |
| | |||||
* | Add skipping of control tokens | Lester Solbakken | 2023-02-10 | 2 | -1/+12 |
| | |||||
* | Add decoding of sentencepiece token sequence to text | Lester Solbakken | 2023-02-10 | 2 | -0/+16 |
| | |||||
* | Test segmentation with subwords | Jon Bratseth | 2021-12-17 | 2 | -4/+13 |
| | |||||
* | BERT -> WordPiece, make subword prefix configurable | Jon Bratseth | 2021-12-17 | 6 | -129/+119 |
| | |||||
* | Add a BERT embedder | Jon Bratseth | 2021-12-16 | 2 | -6/+54 |
| | |||||
* | Encapsulate in a context | Jon Bratseth | 2021-10-01 | 1 | -2/+3 |
| | |||||
* | Update linguisticvs-components | Jon Bratseth | 2021-09-30 | 1 | -2/+2 |
| | |||||
* | encode -> embed | Jon Bratseth | 2021-09-28 | 3 | -23/+23 |
| | |||||
* | Separate component from linguistics | Jon Bratseth | 2021-09-25 | 3 | -0/+197 |