vespa - An engine for low-latency computation over large data sets

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge pull request #31049 from ↵	Bjørn Christian Seime	2024-04-26	2	-1/+64
\|\ \| \| \| \| \| \| \| \|	vespa-engine/jobergum/add-prepend-embedder-support add prepend support to embedder
\| *	add prepend support	Jo Kristian Bergum	2024-04-25	2	-1/+64
\| \|
* \|	Update defaults for local LLM config	Lester Solbakken	2024-04-24	1	-3/+3
\|/
*	Revert "Specifically set number of threads to use in llama unit test"	Harald Musum	2024-04-22	1	-4/+5
\|
*	Specifically set number of threads to use in llama unit test	Lester Solbakken	2024-04-22	1	-5/+4
\|
*	Remove unneccessary import	Lester Solbakken	2024-04-22	1	-1/+0
\|
*	Set minimum number of threads to 1	Lester Solbakken	2024-04-22	1	-1/+1
\|
*	Disable local LLM unit tests	Lester Solbakken	2024-04-16	1	-1/+6
\|
*	Reapply "Lesters/add local llms 2"	Lester Solbakken	2024-04-16	13	-0/+957
\| \| \| \|	This reverts commit ed62b750494822cc67a328390178754512baf032.
*	Revert "Lesters/add local llms 2"	Harald Musum	2024-04-15	13	-957/+0
\|
*	Reapply "Lesters/add local llms"	Lester Solbakken	2024-04-15	13	-0/+957
\| \| \| \|	This reverts commit 7518d93961ac7c5c5da1cd41717d42f600dae647.
*	Revert "Lesters/add local llms"	Lester Solbakken	2024-04-15	13	-957/+0
\|
*	Merge branch 'master' into lesters/add-local-llms	Lester Solbakken	2024-04-12	9	-23/+15
\|\
\| *	Unify on List.of	Henning Baldersheim	2024-04-11	7	-17/+11
\| \|
\| *	Unify on Map.of	Henning Baldersheim	2024-04-11	1	-3/+2
\| \|
* \|	Move LLM client stuff from container-search to model-integration	Lester Solbakken	2024-04-12	13	-0/+958
\|/
*	cache more and re-factor	Jo Kristian Bergum	2024-04-08	2	-68/+109
\|
*	Key by embedder id and don't recompute inputs	Jon Bratseth	2024-04-07	2	-65/+73
\|
*	Add equivalent to `Map.computeIfAbsent()` to simplify typical usage of the cache	Bjørn Christian Seime	2024-04-04	2	-20/+3
\| \| \| \|	Current interface requires a lot of boilerplate code.
*	Add caching of onnx inference output using Context cache	Jo Kristian Bergum	2024-04-04	2	-18/+55
\|
*	Support for dimensionality flexbility and caching onnx inference output ↵	Jo Kristian Bergum	2024-04-04	2	-53/+131
\| \| \| \|	using Context cache
*	Add some more tests on the binarization	Jo Kristian Bergum	2024-03-30	2	-2/+39
\|
*	relax testing on float strings due to small inference differences in platforms	Jo Kristian Bergum	2024-03-29	1	-5/+10
\|
*	fix unwanted import	Jo Kristian Bergum	2024-03-29	1	-1/+0
\|
*	Add support for binarization and matryoshka for hf-embedder	Jo Kristian Bergum	2024-03-29	3	-5/+140
\|
*	All embedders are the same	Jon Bratseth	2024-02-09	1	-2/+2
\| \| \| \| \|	This is to avoid a validation override from changed indexing expression when embedder details are changed.
*	Support embedding into rank 3 tensors	Jon Bratseth	2024-02-02	3	-29/+42
\|
*	- Add alternative sparsify implementation using generic tensor.reduce/map.	Henning Baldersheim	2024-01-31	2	-9/+52
\| \| \| \| \| \| \|	- Add options for specifying which one to use in tests and performance benchmark. Based on original implementation prior to custom reduce with the following improvements. - Apply Math.log after reduction which is the samp optimization as done in the custom implementation. - Join the 2 separate single dimension reduce statements into single 2 dimensional reduce.
*	- Put the inner loops in separate methods. This improves ability to inline.	Henning Baldersheim	2024-01-20	2	-54/+52
\| \| \| \| \| \| \|	- Use Buffer.get(int index) instead of Buffer.get(). That avoids a write. - Use int as loop variable. - This brings the splade perfoamnce test down from 8s to 7s - TensorConverter.toVespaTensor more than doubled speed.
*	Rename getIndex => getDirectIndex	Henning Baldersheim	2024-01-20	1	-1/+1
\|
*	Add a class for assist efficient traversal of dimensions in an IndexedTensor.	Henning Baldersheim	2024-01-19	2	-4/+9
\|
*	Cache sizes.totalSize() in variable to prevent recomputation.	Henning Baldersheim	2024-01-18	1	-20/+19
\|
*	Since both value and log(value) are monotonically increasing for value >= 1,	Henning Baldersheim	2024-01-18	1	-8/+8
\| \| \| \| \|	we can just gather max(value) and do log at the end. Avoiding general Math.max which seems to have very costly NaN handling was quite benefiscal.
*	Construct array right away instead of going via a single element list and ↵	Henning Baldersheim	2024-01-18	1	-5/+15
\| \| \| \|	the java stream api.
*	Avoid generic reduce and keep PAD token embedding	Jo Kristian Bergum	2024-01-15	2	-24/+47
\|
*	remove extra space	Jo Kristian Bergum	2024-01-11	1	-1/+1
\|
*	address review	Jo Kristian Bergum	2024-01-11	2	-43/+25
\|
*	Avoid generic reduce to reduce gc pressure	Jo Kristian Bergum	2024-01-11	2	-19/+61
\|
*	final	Jo Kristian Bergum	2024-01-06	1	-1/+1
\|
*	handle multilingual models better	Jo Kristian Bergum	2024-01-06	3	-65/+147
\|
*	Allow mapped 1d tensor for embed expressions	Jo Kristian Bergum	2023-12-17	2	-13/+13
\|
*	Add a splade embedder implementation	Jo Kristian Bergum	2023-12-15	5	-0/+30962
\|
*	Move Jackson util from vespajlib to container-core.	Henning Baldersheim	2023-11-24	3	-3/+3
\|
*	jackson 2.16 changes some of its default settings so we consolidate our use ↵	Henning Baldersheim	2023-11-23	3	-8/+7
\| \| \| \| \| \|	of the ObjectMapper. Unless special options are used, use a common instance, or create via factory metod.
*	unpack_bits_from_int8 -> unpack_bits	Arne Juul	2023-11-10	1	-2/+2
\|
*	add simple expandBitTensor function	Arne Juul	2023-11-10	2	-9/+35
\|
*	Add support and upgrade opset	Jo Kristian Bergum	2023-10-26	4	-7/+31
\|
*	Add support for bfloat16 and float16	Jo Kristian Bergum	2023-10-26	4	-0/+82
\|
*	Less verbose logging when failing to find CUDA and it is optional	Jo Kristian Bergum	2023-10-26	2	-2/+53
\|
*	Disable CPU arena allocator for ONNX	Bjørn Christian Seime	2023-10-19	1	-0/+1
\| \| \| \| \|	The arena memory allocator pre-allocates excessive of memory up front. Disabling matches the existing configuration in ONNX integration for backend.