Commit message (Expand) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Update copyright | Jon Bratseth | 2023-10-09 | 23 | -27/+27 |
* | HuggingFace Tokenizer expects path to be a directory | Bjørn Christian Seime | 2023-08-31 | 1 | -2/+18 |
* | Prefer truncation configuration from tokenizer model | Bjørn Christian Seime | 2023-06-12 | 3 | -14/+112 |
* | Test padding with truncation | Bjørn Christian Seime | 2023-06-08 | 2 | -3/+4 |
* | Verify presence of special token | Bjørn Christian Seime | 2023-06-08 | 1 | -2/+7 |
* | Disable padding and make it configurable | Bjørn Christian Seime | 2023-06-08 | 2 | -12/+30 |
* | Introduce services.xml syntax for configuring HuggingFace embedders | Bjørn Christian Seime | 2023-06-02 | 2 | -13/+1 |
* | Make truncation and max length configurable | Bjørn Christian Seime | 2023-05-26 | 3 | -7/+45 |
* | Implement deconstruct | Bjørn Christian Seime | 2023-05-16 | 1 | -0/+1 |
* | Change parameter type to 'model' | Bjørn Christian Seime | 2023-05-12 | 1 | -1/+1 |
* | Revert "Revert "Bjorncs/huggingface tokenizer"" | Bjørn Christian Seime | 2023-05-12 | 7 | -0/+271 |
* | Revert "Bjorncs/huggingface tokenizer" | Arnstein Ressem | 2023-05-12 | 7 | -271/+0 |
* | Disable special tokens by default | Bjørn Christian Seime | 2023-05-11 | 3 | -12/+10 |
* | Mark HF integration as beta | Bjørn Christian Seime | 2023-05-11 | 2 | -0/+5 |
* | Make HF tokenizer a separate embedder | Bjørn Christian Seime | 2023-05-11 | 7 | -0/+268 |
* | Add skipping of control tokens | Lester Solbakken | 2023-02-10 | 4 | -7/+34 |
* | Add decoding of sentencepiece token sequence to text | Lester Solbakken | 2023-02-10 | 5 | -2/+40 |
* | Revert "Revert collect(Collectors.toList())" | Henning Baldersheim | 2022-12-04 | 1 | -1/+1 |
* | Revert collect(Collectors.toList()) | Henning Baldersheim | 2022-12-04 | 1 | -1/+1 |
* | collect(Collectors.toList()) -> toList() | Henning Baldersheim | 2022-12-02 | 1 | -1/+1 |
* | Use '@Inject' from 'annotations' in multiple bundles | Bjørn Christian Seime | 2022-05-06 | 2 | -2/+2 |
* | Test segmentation with subwords | Jon Bratseth | 2021-12-17 | 2 | -4/+13 |
* | BERT -> WordPiece, make subword prefix configurable | Jon Bratseth | 2021-12-17 | 12 | -162/+184 |
* | Add a BERT embedder | Jon Bratseth | 2021-12-16 | 9 | -43/+30881 |
* | Add custom `@Beta` annotation | Bjørn Christian Seime | 2021-12-03 | 1 | -1/+1 |
* | Correct copyright headers | Jon Bratseth | 2021-10-20 | 1 | -1/+0 |
* | Add missiung copyrights | Jon Bratseth | 2021-10-20 | 1 | -0/+1 |
* | Encapsulate in a context | Jon Bratseth | 2021-10-01 | 2 | -11/+10 |
* | Update linguisticvs-components | Jon Bratseth | 2021-09-30 | 2 | -5/+11 |
* | encode -> embed | Jon Bratseth | 2021-09-28 | 5 | -39/+38 |
* | Use full filename | Jon Bratseth | 2021-09-27 | 1 | -0/+0 |
* | Separate component from linguistics | Jon Bratseth | 2021-09-25 | 15 | -0/+1015 |