Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Construct array right away instead of going via a single element list and ↵ | Henning Baldersheim | 2024-01-18 | 1 | -2/+3 |
| | | | | the java stream api. | ||||
* | Update copyright | Jon Bratseth | 2023-10-09 | 25 | -29/+29 |
| | |||||
* | Use version tag | Henning Baldersheim | 2023-10-04 | 1 | -1/+0 |
| | |||||
* | Update dependency ai.djl.huggingface:tokenizers to v0.24.0 | renovate[bot] | 2023-10-04 | 1 | -1/+1 |
| | |||||
* | Use Guice 6.0 | Bjørn Christian Seime | 2023-09-04 | 1 | -1/+1 |
| | | | | | | https://github.com/google/guice/wiki/Guice600 We cannot upgrade to 7.x as we export javax.inject from container. 6.x supports both the old javax.inject and the new jakarta.inject replacement. | ||||
* | HuggingFace Tokenizer expects path to be a directory | Bjørn Christian Seime | 2023-08-31 | 1 | -2/+18 |
| | |||||
* | Update dependency ai.djl.huggingface:tokenizers to v0.23.0 | renovate[bot] | 2023-08-30 | 1 | -1/+1 |
| | |||||
* | Update abi-specs after making config class Builders final | gjoranv | 2023-07-17 | 1 | -4/+8 |
| | |||||
* | Prefer truncation configuration from tokenizer model | Bjørn Christian Seime | 2023-06-12 | 3 | -14/+112 |
| | | | | | | | Only override truncation if not specified or max length exceeds max tokens accepted by model. Use JNI wrapper directly to determine existing truncation configuration (JSON format is not really documented). Simply configuration for pure tokenizer embedder. Disable DJL usage telemetry. | ||||
* | Test padding with truncation | Bjørn Christian Seime | 2023-06-08 | 2 | -3/+4 |
| | |||||
* | Verify presence of special token | Bjørn Christian Seime | 2023-06-08 | 1 | -2/+7 |
| | |||||
* | Disable padding and make it configurable | Bjørn Christian Seime | 2023-06-08 | 2 | -12/+30 |
| | |||||
* | Introduce services.xml syntax for configuring HuggingFace embedders | Bjørn Christian Seime | 2023-06-02 | 3 | -13/+7 |
| | |||||
* | Make truncation and max length configurable | Bjørn Christian Seime | 2023-05-26 | 3 | -7/+45 |
| | |||||
* | Implement deconstruct | Bjørn Christian Seime | 2023-05-16 | 1 | -0/+1 |
| | |||||
* | Change parameter type to 'model' | Bjørn Christian Seime | 2023-05-12 | 1 | -1/+1 |
| | |||||
* | Revert "Revert "Bjorncs/huggingface tokenizer"" | Bjørn Christian Seime | 2023-05-12 | 8 | -2/+303 |
| | | | | This reverts commit 2bb74878879b3acb1919fd658b8f2c476d8129d6. | ||||
* | Revert "Bjorncs/huggingface tokenizer" | Arnstein Ressem | 2023-05-12 | 8 | -303/+2 |
| | |||||
* | Disable special tokens by default | Bjørn Christian Seime | 2023-05-11 | 3 | -12/+10 |
| | |||||
* | Mark HF integration as beta | Bjørn Christian Seime | 2023-05-11 | 2 | -0/+5 |
| | |||||
* | Make HF tokenizer a separate embedder | Bjørn Christian Seime | 2023-05-11 | 8 | -2/+300 |
| | |||||
* | Add skipping of control tokens | Lester Solbakken | 2023-02-10 | 5 | -7/+35 |
| | |||||
* | Add abi spec | Lester Solbakken | 2023-02-10 | 1 | -1/+3 |
| | |||||
* | Add decoding of sentencepiece token sequence to text | Lester Solbakken | 2023-02-10 | 5 | -2/+40 |
| | |||||
* | Revert "Revert collect(Collectors.toList())" | Henning Baldersheim | 2022-12-04 | 1 | -1/+1 |
| | |||||
* | Revert collect(Collectors.toList()) | Henning Baldersheim | 2022-12-04 | 1 | -1/+1 |
| | |||||
* | collect(Collectors.toList()) -> toList() | Henning Baldersheim | 2022-12-02 | 1 | -1/+1 |
| | |||||
* | Split out opennlp-linguistics | Henning Baldersheim | 2022-11-26 | 1 | -0/+6 |
| | |||||
* | Update ABI spec format, and update all specs | jonmv | 2022-10-25 | 1 | -102/+102 |
| | |||||
* | Set project version to 8-SNAPSHOT | gjoranv | 2022-06-08 | 1 | -2/+2 |
| | |||||
* | Remove config version on Vespa 8 | Jon Bratseth | 2022-06-08 | 1 | -4/+0 |
| | |||||
* | install_jar CMake function | Håkon Hallingstad | 2022-05-20 | 1 | -1/+1 |
| | |||||
* | Use '@Inject' from 'annotations' in multiple bundles | Bjørn Christian Seime | 2022-05-06 | 2 | -2/+2 |
| | |||||
* | Don't embed annotations in osgi bundles | Bjørn Christian Seime | 2022-05-04 | 1 | -0/+6 |
| | |||||
* | unify java warnings (use compiler args from parent) | Arne H Juul | 2022-01-06 | 1 | -8/+0 |
| | |||||
* | Test segmentation with subwords | Jon Bratseth | 2021-12-17 | 2 | -4/+13 |
| | |||||
* | BERT -> WordPiece, make subword prefix configurable | Jon Bratseth | 2021-12-17 | 13 | -276/+306 |
| | |||||
* | Add a BERT embedder | Jon Bratseth | 2021-12-16 | 10 | -43/+31009 |
| | |||||
* | update ABI for generated builders | Arne H Juul | 2021-12-09 | 1 | -0/+1 |
| | |||||
* | Add custom `@Beta` annotation | Bjørn Christian Seime | 2021-12-03 | 1 | -1/+1 |
| | | | | Replace use of Guava's `com.google.common.annotations.Beta` with custom annotation. | ||||
* | Update 2019 Oath copyrights. | gjoranv | 2021-10-27 | 1 | -1/+1 |
| | |||||
* | Correct copyright headers | Jon Bratseth | 2021-10-20 | 1 | -1/+0 |
| | |||||
* | Add missiung copyrights | Jon Bratseth | 2021-10-20 | 1 | -0/+1 |
| | |||||
* | Update 2017 copyright notices. | gjoranv | 2021-10-07 | 1 | -1/+1 |
| | |||||
* | Encapsulate in a context | Jon Bratseth | 2021-10-01 | 3 | -13/+12 |
| | |||||
* | Update linguisticvs-components | Jon Bratseth | 2021-09-30 | 3 | -7/+13 |
| | |||||
* | encode -> embed | Jon Bratseth | 2021-09-28 | 6 | -49/+48 |
| | |||||
* | Use full filename | Jon Bratseth | 2021-09-27 | 1 | -0/+0 |
| | |||||
* | Separate component from linguistics | Jon Bratseth | 2021-09-25 | 21 | -0/+1300 |