Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Merge pull request #31049 from ↵ | Bjørn Christian Seime | 2024-04-26 | 2 | -1/+64 |
|\ | | | | | | | | | vespa-engine/jobergum/add-prepend-embedder-support add prepend support to embedder | ||||
| * | add prepend support | Jo Kristian Bergum | 2024-04-25 | 2 | -1/+64 |
| | | |||||
* | | Update defaults for local LLM config | Lester Solbakken | 2024-04-24 | 1 | -3/+3 |
|/ | |||||
* | Revert "Specifically set number of threads to use in llama unit test" | Harald Musum | 2024-04-22 | 1 | -4/+5 |
| | |||||
* | Specifically set number of threads to use in llama unit test | Lester Solbakken | 2024-04-22 | 1 | -5/+4 |
| | |||||
* | Remove unneccessary import | Lester Solbakken | 2024-04-22 | 1 | -1/+0 |
| | |||||
* | Set minimum number of threads to 1 | Lester Solbakken | 2024-04-22 | 1 | -1/+1 |
| | |||||
* | Disable local LLM unit tests | Lester Solbakken | 2024-04-16 | 1 | -1/+6 |
| | |||||
* | Reapply "Lesters/add local llms 2" | Lester Solbakken | 2024-04-16 | 13 | -0/+957 |
| | | | | This reverts commit ed62b750494822cc67a328390178754512baf032. | ||||
* | Revert "Lesters/add local llms 2" | Harald Musum | 2024-04-15 | 13 | -957/+0 |
| | |||||
* | Reapply "Lesters/add local llms" | Lester Solbakken | 2024-04-15 | 13 | -0/+957 |
| | | | | This reverts commit 7518d93961ac7c5c5da1cd41717d42f600dae647. | ||||
* | Revert "Lesters/add local llms" | Lester Solbakken | 2024-04-15 | 13 | -957/+0 |
| | |||||
* | Merge branch 'master' into lesters/add-local-llms | Lester Solbakken | 2024-04-12 | 9 | -23/+15 |
|\ | |||||
| * | Unify on List.of | Henning Baldersheim | 2024-04-11 | 7 | -17/+11 |
| | | |||||
| * | Unify on Map.of | Henning Baldersheim | 2024-04-11 | 1 | -3/+2 |
| | | |||||
* | | Move LLM client stuff from container-search to model-integration | Lester Solbakken | 2024-04-12 | 13 | -0/+958 |
|/ | |||||
* | cache more and re-factor | Jo Kristian Bergum | 2024-04-08 | 2 | -68/+109 |
| | |||||
* | Key by embedder id and don't recompute inputs | Jon Bratseth | 2024-04-07 | 2 | -65/+73 |
| | |||||
* | Add equivalent to `Map.computeIfAbsent()` to simplify typical usage of the cache | Bjørn Christian Seime | 2024-04-04 | 2 | -20/+3 |
| | | | | Current interface requires a lot of boilerplate code. | ||||
* | Add caching of onnx inference output using Context cache | Jo Kristian Bergum | 2024-04-04 | 2 | -18/+55 |
| | |||||
* | Support for dimensionality flexbility and caching onnx inference output ↵ | Jo Kristian Bergum | 2024-04-04 | 2 | -53/+131 |
| | | | | using Context cache | ||||
* | Add some more tests on the binarization | Jo Kristian Bergum | 2024-03-30 | 2 | -2/+39 |
| | |||||
* | relax testing on float strings due to small inference differences in platforms | Jo Kristian Bergum | 2024-03-29 | 1 | -5/+10 |
| | |||||
* | fix unwanted import | Jo Kristian Bergum | 2024-03-29 | 1 | -1/+0 |
| | |||||
* | Add support for binarization and matryoshka for hf-embedder | Jo Kristian Bergum | 2024-03-29 | 3 | -5/+140 |
| | |||||
* | All embedders are the same | Jon Bratseth | 2024-02-09 | 1 | -2/+2 |
| | | | | | This is to avoid a validation override from changed indexing expression when embedder details are changed. | ||||
* | Support embedding into rank 3 tensors | Jon Bratseth | 2024-02-02 | 3 | -29/+42 |
| | |||||
* | - Add alternative sparsify implementation using generic tensor.reduce/map. | Henning Baldersheim | 2024-01-31 | 2 | -9/+52 |
| | | | | | | | - Add options for specifying which one to use in tests and performance benchmark. Based on original implementation prior to custom reduce with the following improvements. - Apply Math.log after reduction which is the samp optimization as done in the custom implementation. - Join the 2 separate single dimension reduce statements into single 2 dimensional reduce. | ||||
* | - Put the inner loops in separate methods. This improves ability to inline. | Henning Baldersheim | 2024-01-20 | 2 | -54/+52 |
| | | | | | | | - Use Buffer.get(int index) instead of Buffer.get(). That avoids a write. - Use int as loop variable. - This brings the splade perfoamnce test down from 8s to 7s - TensorConverter.toVespaTensor more than doubled speed. | ||||
* | Rename getIndex => getDirectIndex | Henning Baldersheim | 2024-01-20 | 1 | -1/+1 |
| | |||||
* | Add a class for assist efficient traversal of dimensions in an IndexedTensor. | Henning Baldersheim | 2024-01-19 | 2 | -4/+9 |
| | |||||
* | Cache sizes.totalSize() in variable to prevent recomputation. | Henning Baldersheim | 2024-01-18 | 1 | -20/+19 |
| | |||||
* | Since both value and log(value) are monotonically increasing for value >= 1, | Henning Baldersheim | 2024-01-18 | 1 | -8/+8 |
| | | | | | we can just gather max(value) and do log at the end. Avoiding general Math.max which seems to have very costly NaN handling was quite benefiscal. | ||||
* | Construct array right away instead of going via a single element list and ↵ | Henning Baldersheim | 2024-01-18 | 1 | -5/+15 |
| | | | | the java stream api. | ||||
* | Avoid generic reduce and keep PAD token embedding | Jo Kristian Bergum | 2024-01-15 | 2 | -24/+47 |
| | |||||
* | remove extra space | Jo Kristian Bergum | 2024-01-11 | 1 | -1/+1 |
| | |||||
* | address review | Jo Kristian Bergum | 2024-01-11 | 2 | -43/+25 |
| | |||||
* | Avoid generic reduce to reduce gc pressure | Jo Kristian Bergum | 2024-01-11 | 2 | -19/+61 |
| | |||||
* | final | Jo Kristian Bergum | 2024-01-06 | 1 | -1/+1 |
| | |||||
* | handle multilingual models better | Jo Kristian Bergum | 2024-01-06 | 3 | -65/+147 |
| | |||||
* | Allow mapped 1d tensor for embed expressions | Jo Kristian Bergum | 2023-12-17 | 2 | -13/+13 |
| | |||||
* | Add a splade embedder implementation | Jo Kristian Bergum | 2023-12-15 | 5 | -0/+30962 |
| | |||||
* | Move Jackson util from vespajlib to container-core. | Henning Baldersheim | 2023-11-24 | 3 | -3/+3 |
| | |||||
* | jackson 2.16 changes some of its default settings so we consolidate our use ↵ | Henning Baldersheim | 2023-11-23 | 3 | -8/+7 |
| | | | | | | of the ObjectMapper. Unless special options are used, use a common instance, or create via factory metod. | ||||
* | unpack_bits_from_int8 -> unpack_bits | Arne Juul | 2023-11-10 | 1 | -2/+2 |
| | |||||
* | add simple expandBitTensor function | Arne Juul | 2023-11-10 | 2 | -9/+35 |
| | |||||
* | Add support and upgrade opset | Jo Kristian Bergum | 2023-10-26 | 4 | -7/+31 |
| | |||||
* | Add support for bfloat16 and float16 | Jo Kristian Bergum | 2023-10-26 | 4 | -0/+82 |
| | |||||
* | Less verbose logging when failing to find CUDA and it is optional | Jo Kristian Bergum | 2023-10-26 | 2 | -2/+53 |
| | |||||
* | Disable CPU arena allocator for ONNX | Bjørn Christian Seime | 2023-10-19 | 1 | -0/+1 |
| | | | | | The arena memory allocator pre-allocates excessive of memory up front. Disabling matches the existing configuration in ONNX integration for backend. |