aboutsummaryrefslogtreecommitdiffstats
path: root/model-integration
Commit message (Collapse)AuthorAgeFilesLines
* All embedders are the sameJon Bratseth2024-02-091-2/+2
| | | | | This is to avoid a validation override from changed indexing expression when embedder details are changed.
* Support embedding into rank 3 tensorsJon Bratseth2024-02-023-29/+42
|
* - Add alternative sparsify implementation using generic tensor.reduce/map.Henning Baldersheim2024-01-312-9/+52
| | | | | | | - Add options for specifying which one to use in tests and performance benchmark. Based on original implementation prior to custom reduce with the following improvements. - Apply Math.log after reduction which is the samp optimization as done in the custom implementation. - Join the 2 separate single dimension reduce statements into single 2 dimensional reduce.
* - Put the inner loops in separate methods. This improves ability to inline.Henning Baldersheim2024-01-202-54/+52
| | | | | | | - Use Buffer.get(int index) instead of Buffer.get(). That avoids a write. - Use int as loop variable. - This brings the splade perfoamnce test down from 8s to 7s - TensorConverter.toVespaTensor more than doubled speed.
* Rename getIndex => getDirectIndexHenning Baldersheim2024-01-201-1/+1
|
* Add a class for assist efficient traversal of dimensions in an IndexedTensor.Henning Baldersheim2024-01-192-4/+9
|
* Cache sizes.totalSize() in variable to prevent recomputation.Henning Baldersheim2024-01-181-20/+19
|
* Since both value and log(value) are monotonically increasing for value >= 1,Henning Baldersheim2024-01-181-8/+8
| | | | | we can just gather max(value) and do log at the end. Avoiding general Math.max which seems to have very costly NaN handling was quite benefiscal.
* Construct array right away instead of going via a single element list and ↵Henning Baldersheim2024-01-181-5/+15
| | | | the java stream api.
* Avoid generic reduce and keep PAD token embeddingJo Kristian Bergum2024-01-152-24/+47
|
* remove extra spaceJo Kristian Bergum2024-01-111-1/+1
|
* address reviewJo Kristian Bergum2024-01-112-43/+25
|
* Avoid generic reduce to reduce gc pressureJo Kristian Bergum2024-01-112-19/+61
|
* finalJo Kristian Bergum2024-01-061-1/+1
|
* handle multilingual models betterJo Kristian Bergum2024-01-063-65/+147
|
* Allow mapped 1d tensor for embed expressionsJo Kristian Bergum2023-12-172-13/+13
|
* Add a splade embedder implementationJo Kristian Bergum2023-12-155-0/+30962
|
* Move Jackson util from vespajlib to container-core.Henning Baldersheim2023-11-243-3/+3
|
* jackson 2.16 changes some of its default settings so we consolidate our use ↵Henning Baldersheim2023-11-233-8/+7
| | | | | | of the ObjectMapper. Unless special options are used, use a common instance, or create via factory metod.
* unpack_bits_from_int8 -> unpack_bitsArne Juul2023-11-101-2/+2
|
* add simple expandBitTensor functionArne Juul2023-11-102-9/+35
|
* Add support and upgrade opsetJo Kristian Bergum2023-10-264-7/+31
|
* Add support for bfloat16 and float16Jo Kristian Bergum2023-10-264-0/+82
|
* Less verbose logging when failing to find CUDA and it is optionalJo Kristian Bergum2023-10-262-2/+53
|
* Disable CPU arena allocator for ONNXBjørn Christian Seime2023-10-191-0/+1
| | | | | The arena memory allocator pre-allocates excessive of memory up front. Disabling matches the existing configuration in ONNX integration for backend.
* Update copyrightJon Bratseth2023-10-09122-122/+131
|
* Don't index PAD and re-factoringJo Kristian Bergum2023-09-262-41/+37
|
* Add config options + licenseJo Kristian Bergum2023-09-212-0/+2
|
* Ensure Onnx/Hugginface resources are cleaned up on deconstructionBjørn Christian Seime2023-09-211-0/+6
|
* Add ColBERT embedderJo Kristian Bergum2023-09-214-0/+599
|
* - Use equals when comparing Optional<Long>Henning Baldersheim2023-09-132-4/+4
| | | | - Minor cleanup
* Use thread safe hash mapBjørn Christian Seime2023-08-311-2/+2
|
* Merge pull request #27969 from vespa-engine/bjorncs/embedder-metricsJon Bratseth2023-08-315-8/+94
|\ | | | | Add generic metrics for embedders
| * Allow sampling of fractional millisBjørn Christian Seime2023-08-253-15/+10
| |
| * Add generic metrics for embeddersBjørn Christian Seime2023-08-045-8/+99
| |
* | Better error message when importing models with illegal namesLester Solbakken2023-08-291-0/+25
|/
* Log when GPU configuration is successfulMartin Polden2023-07-191-3/+8
|
* Log warning when failing to use GPUMartin Polden2023-07-191-1/+6
|
* update onnx.protoArne Juul2023-06-234-80/+453
| | | | | * use latest version from https://github.com/onnx/onnx/blob/main/onnx/onnx.proto * track API changes (enum -> int32)
* Prefer truncation configuration from tokenizer modelBjørn Christian Seime2023-06-121-6/+19
| | | | | | | Only override truncation if not specified or max length exceeds max tokens accepted by model. Use JNI wrapper directly to determine existing truncation configuration (JSON format is not really documented). Simply configuration for pure tokenizer embedder. Disable DJL usage telemetry.
* Add missing wiring of pooling strategyBjørn Christian Seime2023-06-081-11/+1
|
* Disable padding and make it configurableBjørn Christian Seime2023-06-081-0/+1
|
* Merge pull request #27297 from vespa-engine/bjorncs/bert-embedder-services-xmlBjørn Christian Seime2023-06-064-49/+54
|\ | | | | Bjorncs/bert embedder services xml
| * Make pooling strategy configurable for Huggingface embedderBjørn Christian Seime2023-06-053-17/+54
| |
| * Move config definition to `configdefinitions`Bjørn Christian Seime2023-06-051-32/+0
| |
* | Add necessary options to use failOnWarningsgjoranv2023-06-051-0/+4
|/
* Introduce services.xml syntax for configuring HuggingFace embeddersBjørn Christian Seime2023-06-022-29/+6
|
* Properly ignore token type ids from tokenizer if disabledBjørn Christian Seime2023-05-301-2/+2
|
* Remove dead codeBjørn Christian Seime2023-05-262-43/+0
|
* Make truncation and max length configurableBjørn Christian Seime2023-05-261-12/+3
|