diff options
author | Bjørn Christian Seime <bjorncs@yahooinc.com> | 2023-06-12 16:41:37 +0200 |
---|---|---|
committer | Bjørn Christian Seime <bjorncs@yahooinc.com> | 2023-06-12 16:51:26 +0200 |
commit | 4f722322cc9f8df5146ffb27d74239b3b4f2d634 (patch) | |
tree | dad0f0a70513a861844d10a35ba93c1901b48057 /configdefinitions | |
parent | 838f918baf2f64b5cb737a59e624f20773d95baa (diff) |
Prefer truncation configuration from tokenizer model
Only override truncation if not specified or max length exceeds max tokens accepted by model.
Use JNI wrapper directly to determine existing truncation configuration (JSON format is not really documented).
Simply configuration for pure tokenizer embedder.
Disable DJL usage telemetry.
Diffstat (limited to 'configdefinitions')
-rw-r--r-- | configdefinitions/src/vespa/hugging-face-tokenizer.def | 13 |
1 files changed, 10 insertions, 3 deletions
diff --git a/configdefinitions/src/vespa/hugging-face-tokenizer.def b/configdefinitions/src/vespa/hugging-face-tokenizer.def index bc0d5300de5..896a7b03234 100644 --- a/configdefinitions/src/vespa/hugging-face-tokenizer.def +++ b/configdefinitions/src/vespa/hugging-face-tokenizer.def @@ -8,7 +8,14 @@ model[].language string # The path to the model relative to the application package root model[].path model +# Include special tokens in output addSpecialTokens bool default=true -maxLength int default=512 -truncation bool default=true -padding bool default=false + +# Used for truncation/padding. Use -1 for model default. +maxLength int default=-1 + +# Truncation strategy. Use NOTSET for model default. +truncation enum { ON, OFF, NOTSET } default=NOTSET + +# Padding strategy. Use NOTSET for model default. +padding enum { ON, OFF, NOTSET } default=NOTSET |