aboutsummaryrefslogtreecommitdiffstats
path: root/linguistics-components
Commit message (Collapse)AuthorAgeFilesLines
* Construct array right away instead of going via a single element list and ↵Henning Baldersheim2024-01-181-2/+3
| | | | the java stream api.
* Update copyrightJon Bratseth2023-10-0925-29/+29
|
* Use version tagHenning Baldersheim2023-10-041-1/+0
|
* Update dependency ai.djl.huggingface:tokenizers to v0.24.0renovate[bot]2023-10-041-1/+1
|
* Use Guice 6.0Bjørn Christian Seime2023-09-041-1/+1
| | | | | | https://github.com/google/guice/wiki/Guice600 We cannot upgrade to 7.x as we export javax.inject from container. 6.x supports both the old javax.inject and the new jakarta.inject replacement.
* HuggingFace Tokenizer expects path to be a directoryBjørn Christian Seime2023-08-311-2/+18
|
* Update dependency ai.djl.huggingface:tokenizers to v0.23.0renovate[bot]2023-08-301-1/+1
|
* Update abi-specs after making config class Builders finalgjoranv2023-07-171-4/+8
|
* Prefer truncation configuration from tokenizer modelBjørn Christian Seime2023-06-123-14/+112
| | | | | | | Only override truncation if not specified or max length exceeds max tokens accepted by model. Use JNI wrapper directly to determine existing truncation configuration (JSON format is not really documented). Simply configuration for pure tokenizer embedder. Disable DJL usage telemetry.
* Test padding with truncationBjørn Christian Seime2023-06-082-3/+4
|
* Verify presence of special tokenBjørn Christian Seime2023-06-081-2/+7
|
* Disable padding and make it configurableBjørn Christian Seime2023-06-082-12/+30
|
* Introduce services.xml syntax for configuring HuggingFace embeddersBjørn Christian Seime2023-06-023-13/+7
|
* Make truncation and max length configurableBjørn Christian Seime2023-05-263-7/+45
|
* Implement deconstructBjørn Christian Seime2023-05-161-0/+1
|
* Change parameter type to 'model'Bjørn Christian Seime2023-05-121-1/+1
|
* Revert "Revert "Bjorncs/huggingface tokenizer""Bjørn Christian Seime2023-05-128-2/+303
| | | | This reverts commit 2bb74878879b3acb1919fd658b8f2c476d8129d6.
* Revert "Bjorncs/huggingface tokenizer"Arnstein Ressem2023-05-128-303/+2
|
* Disable special tokens by defaultBjørn Christian Seime2023-05-113-12/+10
|
* Mark HF integration as betaBjørn Christian Seime2023-05-112-0/+5
|
* Make HF tokenizer a separate embedderBjørn Christian Seime2023-05-118-2/+300
|
* Add skipping of control tokensLester Solbakken2023-02-105-7/+35
|
* Add abi specLester Solbakken2023-02-101-1/+3
|
* Add decoding of sentencepiece token sequence to textLester Solbakken2023-02-105-2/+40
|
* Revert "Revert collect(Collectors.toList())"Henning Baldersheim2022-12-041-1/+1
|
* Revert collect(Collectors.toList())Henning Baldersheim2022-12-041-1/+1
|
* collect(Collectors.toList()) -> toList()Henning Baldersheim2022-12-021-1/+1
|
* Split out opennlp-linguisticsHenning Baldersheim2022-11-261-0/+6
|
* Update ABI spec format, and update all specsjonmv2022-10-251-102/+102
|
* Set project version to 8-SNAPSHOTgjoranv2022-06-081-2/+2
|
* Remove config version on Vespa 8Jon Bratseth2022-06-081-4/+0
|
* install_jar CMake functionHåkon Hallingstad2022-05-201-1/+1
|
* Use '@Inject' from 'annotations' in multiple bundlesBjørn Christian Seime2022-05-062-2/+2
|
* Don't embed annotations in osgi bundlesBjørn Christian Seime2022-05-041-0/+6
|
* unify java warnings (use compiler args from parent)Arne H Juul2022-01-061-8/+0
|
* Test segmentation with subwordsJon Bratseth2021-12-172-4/+13
|
* BERT -> WordPiece, make subword prefix configurableJon Bratseth2021-12-1713-276/+306
|
* Add a BERT embedderJon Bratseth2021-12-1610-43/+31009
|
* update ABI for generated buildersArne H Juul2021-12-091-0/+1
|
* Add custom `@Beta` annotationBjørn Christian Seime2021-12-031-1/+1
| | | | Replace use of Guava's `com.google.common.annotations.Beta` with custom annotation.
* Update 2019 Oath copyrights.gjoranv2021-10-271-1/+1
|
* Correct copyright headersJon Bratseth2021-10-201-1/+0
|
* Add missiung copyrightsJon Bratseth2021-10-201-0/+1
|
* Update 2017 copyright notices.gjoranv2021-10-071-1/+1
|
* Encapsulate in a contextJon Bratseth2021-10-013-13/+12
|
* Update linguisticvs-componentsJon Bratseth2021-09-303-7/+13
|
* encode -> embedJon Bratseth2021-09-286-49/+48
|
* Use full filenameJon Bratseth2021-09-271-0/+0
|
* Separate component from linguisticsJon Bratseth2021-09-2521-0/+1300