vespa - An engine for low-latency computation over large data sets

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add a 5x faster handcoded detection of legal feature names that does not ↵	Henning Baldersheim	2021-09-05	1	-3/+0
\| \| \| \|	require quoting.
*	Merge pull request #18922 from ↵	Henning Baldersheim	2021-08-31	1	-1/+1
\|\ \| \| \| \| \| \| \| \|	vespa-engine/toregge/enable-alternate-visited-nodes-trackers-for-hnsw-index Enable alternate visited nodes trackers for HNSW index.
\| *	Enable alternate visited nodes trackers for HNSW index.	Tor Egge	2021-08-31	1	-1/+1
\| \|
* \|	Lower limit for selecting BitVectorVisistedTracker.	Tor Egge	2021-08-31	1	-1/+1
\|/
*	Add class comments. Fix typo.	Tor Egge	2021-08-31	3	-1/+12
\|
*	Prepare for alternate visited nodes trackers for HNSW index.	Tor Egge	2021-08-30	9	-9/+186
\|
*	Merge pull request #18898 from ↵	Henning Baldersheim	2021-08-30	2	-18/+45
\|\ \| \| \| \| \| \| \| \|	vespa-engine/geirst/avoid-global-filter-calculation-when-not-needed The global filter is only needed when having a nearest neighbor index…
\| *	The global filter is only needed when having a nearest neighbor index (hnsw) ↵	Geir Storli	2021-08-30	2	-18/+45
\| \| \| \| \| \| \| \| \| \| \| \|	and doing approximate calculation. This avoids costly calculation of the global filter in cases it is not needed.
* \|	Handle when priorityQ goes from not full to full.	Henning Baldersheim	2021-08-30	1	-1/+5
\| \|
* \|	As doSeek is called alot more frequent than doUnpack just use locking of the ↵	Henning Baldersheim	2021-08-30	1	-8/+6
\|/ \| \| \| \| \| \|	heap in unpack. In addition to adjusting the priority Q also update the distance_threshold with a relaxed store to an atomic variable. On read the distance threshold can be read cheaply with a relaxed load.
*	Report address space usage for shared string repo for non-dense tensor ↵	Geir Storli	2021-08-23	5	-2/+18
\| \| \| \|	attributes.
*	Report address space usage for components in tensor attributes.	Geir Storli	2021-08-20	12	-2/+67
\|
*	Merge pull request #18783 from vespa-engine/toregge/compact-hnsw-index	Geir Storli	2021-08-20	10	-6/+300
\|\ \| \| \| \|	Compact HNSW index when ratio of dead bytes / address space is too high
\| *	Factor out common code.	Tor Egge	2021-08-18	1	-17/+21
\| \|
\| *	Compact HNSW index when ratio of dead bytes / address space is too high	Tor Egge	2021-08-18	10	-6/+296
\| \| \| \| \| \| \| \|	relative to used bytes / address space.
* \|	Track max address space usage among components in attributes vectors in all ↵	Geir Storli	2021-08-20	1	-0/+1
\| \| \| \| \| \| \| \|	sub databases.
* \|	Include limits when needed.	Tor Egge	2021-08-18	1	-0/+1
\|/
*	Merge pull request #18755 from ↵	Håvard Pettersen	2021-08-16	7	-192/+4
\|\ \| \| \| \| \| \| \| \|	vespa-engine/havardpe/move-feature-name-symbol-extractor move FeatureNameExtractor
\| *	move FeatureNameExtractor	Håvard Pettersen	2021-08-16	7	-192/+4
\| \| \| \| \| \| \| \|	to make it available for use in vespa-eval-expr
* \|	Merge pull request #18752 from ↵	Henning Baldersheim	2021-08-16	2	-4/+7
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \|	vespa-engine/toregge/use-4096-buffers-for-hnsw-index-link-array-store Use 4096 buffers for HNSW link array store.
\| * \|	Use 4096 buffers for HNSW link array store.	Tor Egge	2021-08-16	2	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Configure link array store to handle arrays of 193 elements or less without indirect storage.
* \| \|	Improve naming and readability	Henning Baldersheim	2021-08-16	1	-7/+8
\| \| \|
* \| \|	Instead of having one large array of individually allocated vectors use	Henning Baldersheim	2021-08-16	2	-15/+46
\|/ / \| \| \| \| \| \| \| \|	2 large, optionally mmapped, vectors where the first just points into the second. In order to avoid resizing, count first.
* \|	Minor code layout	Henning Baldersheim	2021-08-15	1	-2/+1
\| \|
* \|	Better naming.	Henning Baldersheim	2021-08-15	1	-2/+2
\| \|
* \|	Provide more details on memory usage.	Henning Baldersheim	2021-08-15	1	-1/+9
\| \|
* \|	Add a time budget of 100ms. If counting not complete by then, abort, and let ↵	Henning Baldersheim	2021-08-15	2	-7/+17
\| \| \| \| \| \| \| \|	the count be incomplete.
* \|	Use a simple std::vector<bool> for visited markin as most bits will be set.	Henning Baldersheim	2021-08-15	1	-4/+4
\| \|
* \|	Avoid starting a separate thread for completing index insert.	Henning Baldersheim	2021-08-13	1	-34/+73
\| \| \| \| \| \| \| \| \| \|	Use a queue and do completition in the forground. That ensures only a single thread modifying the attribute.
* \|	Notify when _pending reaches zero.	Henning Baldersheim	2021-08-13	1	-3/+6
\| \|
* \|	Refactor for readability and maintenance.	Henning Baldersheim	2021-08-13	2	-30/+89
\| \|
* \|	Use the executor for the part that can be parallell when rebuilding index on ↵	Henning Baldersheim	2021-08-13	2	-7/+64
\|/ \| \| \|	load.
*	Add an executor to the AttributeVector::load/onLoad interface so attributes ↵	Henning Baldersheim	2021-08-12	32	-39/+46
\| \| \| \|	can use multithread load if feasible.
*	swappable -> paged	Henning Baldersheim	2021-08-12	4	-7/+7
\|
*	A swappable attribute will use a file backed memory allocator.	Henning Baldersheim	2021-08-12	6	-12/+55
\|
*	swapable -> swappable	Henning Baldersheim	2021-08-12	2	-3/+3
\|
*	Control swappable	Henning Baldersheim	2021-08-12	2	-3/+3
\|
*	Add swapable attribute option.	Henning Baldersheim	2021-08-12	3	-85/+42
\|
*	Merge pull request #18716 from ↵	Henning Baldersheim	2021-08-11	4	-10/+20
\|\ \| \| \| \| \| \| \| \|	vespa-engine/havardpe/avoid-crash-on-runtime-onnx-errors avoid crash on run-time onnx errors
\| *	avoid crash on run-time onnx errors	Håvard Pettersen	2021-08-11	4	-10/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- warn about onnx model dry-run being disabled - catch and report onnx errors during ranking - zero-fill failed results to avoid re-using previous results - use explicit output size in fragile model (output became float[2] instead of float[batch] anyways)
* \|	Unify on using hex for hash values.	Henning Baldersheim	2021-08-11	1	-1/+1
\| \|
* \|	Remove outdated comment	Henning Baldersheim	2021-08-11	1	-4/+4
\| \|
* \|	Properly access the feature name for hashed edges.	Henning Baldersheim	2021-08-11	2	-5/+7
\| \|
* \|	Add unit test with comment of what is incorrect with hashed partiotion edges ↵	Henning Baldersheim	2021-08-11	1	-2/+30
\| \| \| \| \| \| \| \|	and feature generation.
* \|	Refactor to avoid multiple hash lookups and code bloat.	Henning Baldersheim	2021-08-11	2	-22/+26
\| \|
* \|	Unify code layout.	Henning Baldersheim	2021-08-10	2	-40/+30
\| \|
* \|	Unify on 'using'	Henning Baldersheim	2021-08-10	2	-7/+5
\| \|
* \|	Minor cleanup.	Henning Baldersheim	2021-08-10	1	-24/+14
\| \|
* \|	Simplify	Henning Baldersheim	2021-08-10	2	-11/+6
\|/
*	Split current global_filter_limit into global_filter.lower_limit/upper_limit.	Henning Baldersheim	2021-08-04	4	-11/+47
\| \| \| \| \| \| \| \| \|	If estimated_hits < lower_limit no filter is set which will cause fallback to bruteforce. If estimated_hits in [lower_limit, upper_limit] apply global filter. if estimated_hits > upper_limit an empty filter is set. This will avoid the filter setup cost. So if the filter has a huge setup cost, you can reduce upper_limit to a number below 1.0 and instead increase target_num_hits similarly. Setting target_num_hits to 1.0/upper_limit * 1.2 should give similar recall. This will add a 20% safety to handle correlation of filter and NearestNeightbor calculation.