aboutsummaryrefslogtreecommitdiffstats
path: root/linguistics-components/src/main
Commit message (Collapse)AuthorAgeFilesLines
* Construct array right away instead of going via a single element list and ↵Henning Baldersheim2024-01-181-2/+3
| | | | the java stream api.
* Update copyrightJon Bratseth2023-10-0918-21/+21
|
* HuggingFace Tokenizer expects path to be a directoryBjørn Christian Seime2023-08-311-2/+18
|
* Prefer truncation configuration from tokenizer modelBjørn Christian Seime2023-06-122-13/+104
| | | | | | | Only override truncation if not specified or max length exceeds max tokens accepted by model. Use JNI wrapper directly to determine existing truncation configuration (JSON format is not really documented). Simply configuration for pure tokenizer embedder. Disable DJL usage telemetry.
* Test padding with truncationBjørn Christian Seime2023-06-081-1/+1
|
* Disable padding and make it configurableBjørn Christian Seime2023-06-081-3/+7
|
* Introduce services.xml syntax for configuring HuggingFace embeddersBjørn Christian Seime2023-06-022-13/+1
|
* Make truncation and max length configurableBjørn Christian Seime2023-05-262-5/+17
|
* Implement deconstructBjørn Christian Seime2023-05-161-0/+1
|
* Change parameter type to 'model'Bjørn Christian Seime2023-05-121-1/+1
|
* Revert "Revert "Bjorncs/huggingface tokenizer""Bjørn Christian Seime2023-05-124-0/+183
| | | | This reverts commit 2bb74878879b3acb1919fd658b8f2c476d8129d6.
* Revert "Bjorncs/huggingface tokenizer"Arnstein Ressem2023-05-124-183/+0
|
* Disable special tokens by defaultBjørn Christian Seime2023-05-112-12/+9
|
* Mark HF integration as betaBjørn Christian Seime2023-05-112-0/+5
|
* Make HF tokenizer a separate embedderBjørn Christian Seime2023-05-114-0/+181
|
* Add skipping of control tokensLester Solbakken2023-02-102-6/+22
|
* Add decoding of sentencepiece token sequence to textLester Solbakken2023-02-103-2/+24
|
* Revert "Revert collect(Collectors.toList())"Henning Baldersheim2022-12-041-1/+1
|
* Revert collect(Collectors.toList())Henning Baldersheim2022-12-041-1/+1
|
* collect(Collectors.toList()) -> toList()Henning Baldersheim2022-12-021-1/+1
|
* Use '@Inject' from 'annotations' in multiple bundlesBjørn Christian Seime2022-05-062-2/+2
|
* BERT -> WordPiece, make subword prefix configurableJon Bratseth2021-12-175-33/+65
|
* Add a BERT embedderJon Bratseth2021-12-166-37/+305
|
* Add custom `@Beta` annotationBjørn Christian Seime2021-12-031-1/+1
| | | | Replace use of Guava's `com.google.common.annotations.Beta` with custom annotation.
* Correct copyright headersJon Bratseth2021-10-201-1/+0
|
* Add missiung copyrightsJon Bratseth2021-10-201-0/+1
|
* Encapsulate in a contextJon Bratseth2021-10-011-9/+7
|
* Update linguisticvs-componentsJon Bratseth2021-09-301-3/+9
|
* encode -> embedJon Bratseth2021-09-282-16/+15
|
* Use full filenameJon Bratseth2021-09-271-0/+0
|
* Separate component from linguisticsJon Bratseth2021-09-2510-0/+818