Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Construct array right away instead of going via a single element list and ↵ | Henning Baldersheim | 2024-01-18 | 1 | -2/+3 |
| | | | | the java stream api. | ||||
* | Update copyright | Jon Bratseth | 2023-10-09 | 16 | -18/+18 |
| | |||||
* | HuggingFace Tokenizer expects path to be a directory | Bjørn Christian Seime | 2023-08-31 | 1 | -2/+18 |
| | |||||
* | Prefer truncation configuration from tokenizer model | Bjørn Christian Seime | 2023-06-12 | 2 | -13/+104 |
| | | | | | | | Only override truncation if not specified or max length exceeds max tokens accepted by model. Use JNI wrapper directly to determine existing truncation configuration (JSON format is not really documented). Simply configuration for pure tokenizer embedder. Disable DJL usage telemetry. | ||||
* | Test padding with truncation | Bjørn Christian Seime | 2023-06-08 | 1 | -1/+1 |
| | |||||
* | Disable padding and make it configurable | Bjørn Christian Seime | 2023-06-08 | 1 | -3/+7 |
| | |||||
* | Introduce services.xml syntax for configuring HuggingFace embedders | Bjørn Christian Seime | 2023-06-02 | 1 | -0/+1 |
| | |||||
* | Make truncation and max length configurable | Bjørn Christian Seime | 2023-05-26 | 1 | -4/+14 |
| | |||||
* | Implement deconstruct | Bjørn Christian Seime | 2023-05-16 | 1 | -0/+1 |
| | |||||
* | Revert "Revert "Bjorncs/huggingface tokenizer"" | Bjørn Christian Seime | 2023-05-12 | 3 | -0/+172 |
| | | | | This reverts commit 2bb74878879b3acb1919fd658b8f2c476d8129d6. | ||||
* | Revert "Bjorncs/huggingface tokenizer" | Arnstein Ressem | 2023-05-12 | 3 | -172/+0 |
| | |||||
* | Disable special tokens by default | Bjørn Christian Seime | 2023-05-11 | 1 | -12/+7 |
| | |||||
* | Mark HF integration as beta | Bjørn Christian Seime | 2023-05-11 | 2 | -0/+5 |
| | |||||
* | Make HF tokenizer a separate embedder | Bjørn Christian Seime | 2023-05-11 | 3 | -0/+172 |
| | |||||
* | Add skipping of control tokens | Lester Solbakken | 2023-02-10 | 2 | -6/+22 |
| | |||||
* | Add decoding of sentencepiece token sequence to text | Lester Solbakken | 2023-02-10 | 3 | -2/+24 |
| | |||||
* | Revert "Revert collect(Collectors.toList())" | Henning Baldersheim | 2022-12-04 | 1 | -1/+1 |
| | |||||
* | Revert collect(Collectors.toList()) | Henning Baldersheim | 2022-12-04 | 1 | -1/+1 |
| | |||||
* | collect(Collectors.toList()) -> toList() | Henning Baldersheim | 2022-12-02 | 1 | -1/+1 |
| | |||||
* | Use '@Inject' from 'annotations' in multiple bundles | Bjørn Christian Seime | 2022-05-06 | 2 | -2/+2 |
| | |||||
* | BERT -> WordPiece, make subword prefix configurable | Jon Bratseth | 2021-12-17 | 4 | -31/+60 |
| | |||||
* | Add a BERT embedder | Jon Bratseth | 2021-12-16 | 5 | -37/+294 |
| | |||||
* | Add custom `@Beta` annotation | Bjørn Christian Seime | 2021-12-03 | 1 | -1/+1 |
| | | | | Replace use of Guava's `com.google.common.annotations.Beta` with custom annotation. | ||||
* | Correct copyright headers | Jon Bratseth | 2021-10-20 | 1 | -1/+0 |
| | |||||
* | Add missiung copyrights | Jon Bratseth | 2021-10-20 | 1 | -0/+1 |
| | |||||
* | Encapsulate in a context | Jon Bratseth | 2021-10-01 | 1 | -9/+7 |
| | |||||
* | Update linguisticvs-components | Jon Bratseth | 2021-09-30 | 1 | -3/+9 |
| | |||||
* | encode -> embed | Jon Bratseth | 2021-09-28 | 1 | -15/+14 |
| | |||||
* | Separate component from linguistics | Jon Bratseth | 2021-09-25 | 8 | -0/+490 |