diff options
author | Bjørn Christian Seime <bjorncs@yahooinc.com> | 2023-06-12 16:41:37 +0200 |
---|---|---|
committer | Bjørn Christian Seime <bjorncs@yahooinc.com> | 2023-06-12 16:51:26 +0200 |
commit | 4f722322cc9f8df5146ffb27d74239b3b4f2d634 (patch) | |
tree | dad0f0a70513a861844d10a35ba93c1901b48057 /config-model/src/main/resources/schema/common.rnc | |
parent | 838f918baf2f64b5cb737a59e624f20773d95baa (diff) |
Prefer truncation configuration from tokenizer model
Only override truncation if not specified or max length exceeds max tokens accepted by model.
Use JNI wrapper directly to determine existing truncation configuration (JSON format is not really documented).
Simply configuration for pure tokenizer embedder.
Disable DJL usage telemetry.
Diffstat (limited to 'config-model/src/main/resources/schema/common.rnc')
-rw-r--r-- | config-model/src/main/resources/schema/common.rnc | 8 |
1 files changed, 2 insertions, 6 deletions
diff --git a/config-model/src/main/resources/schema/common.rnc b/config-model/src/main/resources/schema/common.rnc index e130bed0297..ba7e2b6674e 100644 --- a/config-model/src/main/resources/schema/common.rnc +++ b/config-model/src/main/resources/schema/common.rnc @@ -88,7 +88,7 @@ HuggingFaceEmbedder = attribute type { "hugging-face-embedder" } & element transformer-model { ModelReference } & element tokenizer-model { ModelReference }? & - element max-tokens { xsd:nonNegativeInteger }? & + element max-tokens { xsd:positiveInteger }? & element transformer-input-ids { xsd:string }? & element transformer-attention-mask { xsd:string }? & element transformer-token-type-ids { xsd:string }? & @@ -99,11 +99,7 @@ HuggingFaceEmbedder = HuggingFaceTokenizer = attribute type { "hugging-face-tokenizer" } & - element model { attribute language { xsd:string }? & ModelReference }+ & - element special-tokens { xsd:boolean }? & - element max-length { xsd:integer }? & - element truncation { xsd:boolean }? & - element padding { xsd:boolean }? + element model { attribute language { xsd:string }? & ModelReference }+ BertBaseEmbedder = attribute type { "bert-embedder" } & |