summaryrefslogtreecommitdiffstats
path: root/linguistics-components/src
Commit message (Collapse)AuthorAgeFilesLines
* Test padding with truncationBjørn Christian Seime2023-06-082-3/+4
|
* Verify presence of special tokenBjørn Christian Seime2023-06-081-2/+7
|
* Disable padding and make it configurableBjørn Christian Seime2023-06-082-12/+30
|
* Introduce services.xml syntax for configuring HuggingFace embeddersBjørn Christian Seime2023-06-022-13/+1
|
* Make truncation and max length configurableBjørn Christian Seime2023-05-263-7/+45
|
* Implement deconstructBjørn Christian Seime2023-05-161-0/+1
|
* Change parameter type to 'model'Bjørn Christian Seime2023-05-121-1/+1
|
* Revert "Revert "Bjorncs/huggingface tokenizer""Bjørn Christian Seime2023-05-127-0/+271
| | | | This reverts commit 2bb74878879b3acb1919fd658b8f2c476d8129d6.
* Revert "Bjorncs/huggingface tokenizer"Arnstein Ressem2023-05-127-271/+0
|
* Disable special tokens by defaultBjørn Christian Seime2023-05-113-12/+10
|
* Mark HF integration as betaBjørn Christian Seime2023-05-112-0/+5
|
* Make HF tokenizer a separate embedderBjørn Christian Seime2023-05-117-0/+268
|
* Add skipping of control tokensLester Solbakken2023-02-104-7/+34
|
* Add decoding of sentencepiece token sequence to textLester Solbakken2023-02-105-2/+40
|
* Revert "Revert collect(Collectors.toList())"Henning Baldersheim2022-12-041-1/+1
|
* Revert collect(Collectors.toList())Henning Baldersheim2022-12-041-1/+1
|
* collect(Collectors.toList()) -> toList()Henning Baldersheim2022-12-021-1/+1
|
* Use '@Inject' from 'annotations' in multiple bundlesBjørn Christian Seime2022-05-062-2/+2
|
* Test segmentation with subwordsJon Bratseth2021-12-172-4/+13
|
* BERT -> WordPiece, make subword prefix configurableJon Bratseth2021-12-1712-162/+184
|
* Add a BERT embedderJon Bratseth2021-12-169-43/+30881
|
* Add custom `@Beta` annotationBjørn Christian Seime2021-12-031-1/+1
| | | | Replace use of Guava's `com.google.common.annotations.Beta` with custom annotation.
* Correct copyright headersJon Bratseth2021-10-201-1/+0
|
* Add missiung copyrightsJon Bratseth2021-10-201-0/+1
|
* Encapsulate in a contextJon Bratseth2021-10-012-11/+10
|
* Update linguisticvs-componentsJon Bratseth2021-09-302-5/+11
|
* encode -> embedJon Bratseth2021-09-285-39/+38
|
* Use full filenameJon Bratseth2021-09-271-0/+0
|
* Separate component from linguisticsJon Bratseth2021-09-2515-0/+1015