aboutsummaryrefslogtreecommitdiffstats
path: root/model-integration
Commit message (Collapse)AuthorAgeFilesLines
* Merge pull request #31049 from ↵Bjørn Christian Seime2024-04-262-1/+64
|\ | | | | | | | | vespa-engine/jobergum/add-prepend-embedder-support add prepend support to embedder
| * add prepend supportJo Kristian Bergum2024-04-252-1/+64
| |
* | Update defaults for local LLM configLester Solbakken2024-04-241-3/+3
|/
* Revert "Specifically set number of threads to use in llama unit test"Harald Musum2024-04-221-4/+5
|
* Specifically set number of threads to use in llama unit testLester Solbakken2024-04-221-5/+4
|
* Remove unneccessary importLester Solbakken2024-04-221-1/+0
|
* Set minimum number of threads to 1Lester Solbakken2024-04-221-1/+1
|
* Disable local LLM unit testsLester Solbakken2024-04-161-1/+6
|
* Reapply "Lesters/add local llms 2"Lester Solbakken2024-04-1613-0/+957
| | | | This reverts commit ed62b750494822cc67a328390178754512baf032.
* Revert "Lesters/add local llms 2"Harald Musum2024-04-1513-957/+0
|
* Reapply "Lesters/add local llms"Lester Solbakken2024-04-1513-0/+957
| | | | This reverts commit 7518d93961ac7c5c5da1cd41717d42f600dae647.
* Revert "Lesters/add local llms"Lester Solbakken2024-04-1513-957/+0
|
* Merge branch 'master' into lesters/add-local-llmsLester Solbakken2024-04-129-23/+15
|\
| * Unify on List.ofHenning Baldersheim2024-04-117-17/+11
| |
| * Unify on Map.ofHenning Baldersheim2024-04-111-3/+2
| |
* | Move LLM client stuff from container-search to model-integrationLester Solbakken2024-04-1213-0/+958
|/
* cache more and re-factorJo Kristian Bergum2024-04-082-68/+109
|
* Key by embedder id and don't recompute inputsJon Bratseth2024-04-072-65/+73
|
* Add equivalent to `Map.computeIfAbsent()` to simplify typical usage of the cacheBjørn Christian Seime2024-04-042-20/+3
| | | | Current interface requires a lot of boilerplate code.
* Add caching of onnx inference output using Context cacheJo Kristian Bergum2024-04-042-18/+55
|
* Support for dimensionality flexbility and caching onnx inference output ↵Jo Kristian Bergum2024-04-042-53/+131
| | | | using Context cache
* Add some more tests on the binarizationJo Kristian Bergum2024-03-302-2/+39
|
* relax testing on float strings due to small inference differences in platformsJo Kristian Bergum2024-03-291-5/+10
|
* fix unwanted importJo Kristian Bergum2024-03-291-1/+0
|
* Add support for binarization and matryoshka for hf-embedderJo Kristian Bergum2024-03-293-5/+140
|
* All embedders are the sameJon Bratseth2024-02-091-2/+2
| | | | | This is to avoid a validation override from changed indexing expression when embedder details are changed.
* Support embedding into rank 3 tensorsJon Bratseth2024-02-023-29/+42
|
* - Add alternative sparsify implementation using generic tensor.reduce/map.Henning Baldersheim2024-01-312-9/+52
| | | | | | | - Add options for specifying which one to use in tests and performance benchmark. Based on original implementation prior to custom reduce with the following improvements. - Apply Math.log after reduction which is the samp optimization as done in the custom implementation. - Join the 2 separate single dimension reduce statements into single 2 dimensional reduce.
* - Put the inner loops in separate methods. This improves ability to inline.Henning Baldersheim2024-01-202-54/+52
| | | | | | | - Use Buffer.get(int index) instead of Buffer.get(). That avoids a write. - Use int as loop variable. - This brings the splade perfoamnce test down from 8s to 7s - TensorConverter.toVespaTensor more than doubled speed.
* Rename getIndex => getDirectIndexHenning Baldersheim2024-01-201-1/+1
|
* Add a class for assist efficient traversal of dimensions in an IndexedTensor.Henning Baldersheim2024-01-192-4/+9
|
* Cache sizes.totalSize() in variable to prevent recomputation.Henning Baldersheim2024-01-181-20/+19
|
* Since both value and log(value) are monotonically increasing for value >= 1,Henning Baldersheim2024-01-181-8/+8
| | | | | we can just gather max(value) and do log at the end. Avoiding general Math.max which seems to have very costly NaN handling was quite benefiscal.
* Construct array right away instead of going via a single element list and ↵Henning Baldersheim2024-01-181-5/+15
| | | | the java stream api.
* Avoid generic reduce and keep PAD token embeddingJo Kristian Bergum2024-01-152-24/+47
|
* remove extra spaceJo Kristian Bergum2024-01-111-1/+1
|
* address reviewJo Kristian Bergum2024-01-112-43/+25
|
* Avoid generic reduce to reduce gc pressureJo Kristian Bergum2024-01-112-19/+61
|
* finalJo Kristian Bergum2024-01-061-1/+1
|
* handle multilingual models betterJo Kristian Bergum2024-01-063-65/+147
|
* Allow mapped 1d tensor for embed expressionsJo Kristian Bergum2023-12-172-13/+13
|
* Add a splade embedder implementationJo Kristian Bergum2023-12-155-0/+30962
|
* Move Jackson util from vespajlib to container-core.Henning Baldersheim2023-11-243-3/+3
|
* jackson 2.16 changes some of its default settings so we consolidate our use ↵Henning Baldersheim2023-11-233-8/+7
| | | | | | of the ObjectMapper. Unless special options are used, use a common instance, or create via factory metod.
* unpack_bits_from_int8 -> unpack_bitsArne Juul2023-11-101-2/+2
|
* add simple expandBitTensor functionArne Juul2023-11-102-9/+35
|
* Add support and upgrade opsetJo Kristian Bergum2023-10-264-7/+31
|
* Add support for bfloat16 and float16Jo Kristian Bergum2023-10-264-0/+82
|
* Less verbose logging when failing to find CUDA and it is optionalJo Kristian Bergum2023-10-262-2/+53
|
* Disable CPU arena allocator for ONNXBjørn Christian Seime2023-10-191-0/+1
| | | | | The arena memory allocator pre-allocates excessive of memory up front. Disabling matches the existing configuration in ONNX integration for backend.