Commit message (Expand) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Update copyright | Jon Bratseth | 2023-10-09 | 5 | -6/+6 |
* | Prefer truncation configuration from tokenizer model | Bjørn Christian Seime | 2023-06-12 | 1 | -1/+8 |
* | Test padding with truncation | Bjørn Christian Seime | 2023-06-08 | 1 | -2/+3 |
* | Verify presence of special token | Bjørn Christian Seime | 2023-06-08 | 1 | -2/+7 |
* | Disable padding and make it configurable | Bjørn Christian Seime | 2023-06-08 | 1 | -9/+23 |
* | Make truncation and max length configurable | Bjørn Christian Seime | 2023-05-26 | 1 | -2/+28 |
* | Revert "Revert "Bjorncs/huggingface tokenizer"" | Bjørn Christian Seime | 2023-05-12 | 1 | -0/+88 |
* | Revert "Bjorncs/huggingface tokenizer" | Arnstein Ressem | 2023-05-12 | 1 | -88/+0 |
* | Disable special tokens by default | Bjørn Christian Seime | 2023-05-11 | 1 | -0/+1 |
* | Make HF tokenizer a separate embedder | Bjørn Christian Seime | 2023-05-11 | 1 | -0/+87 |
* | Add skipping of control tokens | Lester Solbakken | 2023-02-10 | 2 | -1/+12 |
* | Add decoding of sentencepiece token sequence to text | Lester Solbakken | 2023-02-10 | 2 | -0/+16 |
* | Test segmentation with subwords | Jon Bratseth | 2021-12-17 | 2 | -4/+13 |
* | BERT -> WordPiece, make subword prefix configurable | Jon Bratseth | 2021-12-17 | 6 | -129/+119 |
* | Add a BERT embedder | Jon Bratseth | 2021-12-16 | 2 | -6/+54 |
* | Encapsulate in a context | Jon Bratseth | 2021-10-01 | 1 | -2/+3 |
* | Update linguisticvs-components | Jon Bratseth | 2021-09-30 | 1 | -2/+2 |
* | encode -> embed | Jon Bratseth | 2021-09-28 | 3 | -23/+23 |
* | Separate component from linguistics | Jon Bratseth | 2021-09-25 | 3 | -0/+197 |