vespa - An engine for low-latency computation over large data sets

	Commit message (Collapse)	Author	Age	Files	Lines
*	tag blueprints with strictness	Håvard Pettersen	2024-03-20	7	-44/+52
\| \| \| \| \| \|	The strict-aware sort function is responsible for propagating and tagging strictness throughout the blueprint tree. Use pre-tagged strictness in fetchPostings, createSearch and createFilterSearch.
*	Merge pull request #30611 from ↵	Tor Brede Vekterli	2024-03-13	2	-67/+112
\|\ \| \| \| \| \| \| \| \|	vespa-engine/vekterli/handle-imported-attributes-in-doc-select-fallback-path Use attributes when evaluating selection expression on full documents
\| *	Use attributes when evaluating selection expression on full documents	Tor Brede Vekterli	2024-03-12	2	-67/+112
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This addresses an unintended shortcoming in our handling of imported fields, as these are exposed _only_ through attributes. Document selection evaluation is automatically optimized in the backend by pre-filtering documents that can be fully evaluated by exclusively looking at attribute values (this goes for both selection matching and mismatching). This is done by cloning the selection AST and replacing all applicable field value nodes with corresponding attribute references. However, if a document _cannot_ be evaluated from attributes alone, we fall back to reading it fully from the doc store, after which the original selection is evaluated on it. This is the crux of the problem, and prior to this commit an expression using both an imported field and a non-attribute field would fail to be evaluated since the full document evaluation would not have any knowledge of the attribute. This commit makes it so that also the full document evaluation will use a "patched" AST with all possible field references replaced with attribute lookups. Since we reuse an existing patched AST that was not otherwise used in this code path, there is no added overhead with this approach.
* \|	Merge pull request #30580 from ↵	Geir Storli	2024-03-12	2	-69/+75
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \|	vespa-engine/toregge/rewrite-searchcore-fusion-runner-unit-test-to-gtest Rewrite searchcore fusion runner unit test to gtest.
\| * \|	Rewrite searchcore fusion runner unit test to gtest.	Tor Egge	2024-03-11	2	-69/+75
\| \|/
* \|	Merge pull request #30579 from ↵	Geir Storli	2024-03-12	2	-38/+35
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \|	vespa-engine/toregge/rewrite-searchcore-disk-index-cleaner-unit-test-to-gtest Rewrite searchcore DiskIndexCleaner unit test to gtest.
\| * \|	Rewrite searchcore DiskIndexCleaner unit test to gtest.	Tor Egge	2024-03-11	2	-38/+35
\| \|/
* \|	Merge pull request #30578 from ↵	Geir Storli	2024-03-12	2	-31/+9
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \|	vespa-engine/toregge/rewrite-searchcore-feed-token-unit-test-to-gtest Rewrite searchcore FeedToken unit test to gtest.
\| * \|	Rewrite searchcore FeedToken unit test to gtest.	Tor Egge	2024-03-11	2	-31/+9
\| \|/
* \|	Merge pull request #30577 from ↵	Geir Storli	2024-03-12	1	-42/+17
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \|	vespa-engine/toregge/rewrite-searchcore-feed-and-search-unit-test-to-gtest Rewrite searchcore feed and search unit test to gtest.
\| * \|	Rewrite searchcore feed and search unit test to gtest.	Tor Egge	2024-03-11	1	-42/+17
\| \|/
* /	Rewrite searchcore attribute flush unit test to gtest.	Tor Egge	2024-03-11	2	-151/+78
\|/
*	- Complete dumping of 1 index field before progressing to the next.	Henning Baldersheim	2024-02-08	2	-16/+17
\| \| \| \| \|	- This prevents allocating memory buffers, and file descriptors for all fields concurrently. - It will reduce memory footprint during flush if there are many fields.
*	Add low-level benchmark program for search iterators.	Geir Storli	2024-02-08	2	-7/+7
\| \| \| \| \|	Currently, it can benchmark the following query operators over an attribute vector: Single term, In, WeightedSet, DotProduct, Or.
*	make default flow stats more explicit	Håvard Pettersen	2024-02-06	1	-0/+3
\| \| \| \| \| \|	for both simple and complex leafs account for number of inner children in complex leafs account for seek nesting for complex leafs with children
*	Merge pull request #29976 from ↵	Arne H Juul	2024-01-24	1	-10/+59
\|\ \| \| \| \| \| \| \| \|	vespa-engine/arnej/unit-test-verify-ranksetup-streaming write vsmfields.cfg and add smoke test
\| *	unit test streaming mode where possible	Arne Juul	2024-01-19	1	-15/+22
\| \|
\| *	write vsmfields.cfg and add smoke test	Arne Juul	2024-01-19	1	-6/+48
\| \|
* \|	wire in strict flow analysis and strict-aware sorting	Håvard Pettersen	2024-01-22	1	-3/+4
\|/ \| \| \| \| \| \| \| \| \|	strict_cost added to all blueprints separate top-down sort step after optimize move relative estimate out of blueprint state optimize all children; to calculate flow stats leaf defaults: matching>0.9: est: 0.5, cost: 1.0, strict_cost: 1.0 matching<=0.9: est: rel_est, cost: 1.0, strict_cost: rel_est
*	Add feature flag for allow sorting blueprints by cost estimate instead of ↵	Henning Baldersheim	2023-12-19	1	-232/+119
\| \| \| \|	est_hits.
*	Remove most of the now void clock indirection.	Henning Baldersheim	2023-12-15	7	-20/+18
\|
*	Unify on using reference where possible.	Henning Baldersheim	2023-12-12	1	-3/+3
\|
*	Wire in thread bundle to execute info and request context.	Henning Baldersheim	2023-12-12	2	-9/+12
\|
*	Revert "Revert "relative estimate""	Henning Baldersheim	2023-12-11	1	-4/+19
\|
*	Revert "relative estimate"	Henning Baldersheim	2023-12-09	1	-19/+4
\|
*	relative estimate	Håvard Pettersen	2023-12-08	1	-4/+19
\|
*	Merge pull request #29551 from ↵	Henning Baldersheim	2023-12-05	2	-2/+2
\|\ \| \| \| \| \| \| \| \|	vespa-engine/balder/gc-use-shared-executor-for-warmup Use shared executor for warmup and GC warmup executor.
\| *	Use shared executor for warmup and GC warmup executor.	Henning Baldersheim	2023-12-05	2	-2/+2
\| \|
* \|	- Control creation of temporary postinglists during fetchPostings for ↵	Henning Baldersheim	2023-12-04	1	-3/+3
\|/ \| \| \|	non-strict iterators.
*	Avoid timeout during grouping leaving distributionKey unset. Populate it ↵	Henning Baldersheim	2023-11-30	1	-3/+3
\| \| \| \|	right after completing grouping.
*	Add InTerm to backend.	Tor Egge	2023-11-24	2	-0/+3
\|
*	Avoid dereferencing first item in an empty vector.	Henning Baldersheim	2023-11-20	1	-1/+1
\|
*	- We are now always nesting multivalue grouping for indexed search.	Henning Baldersheim	2023-11-20	1	-14/+15
\|
*	Add flag for marking phrase always expensive.	Henning Baldersheim	2023-11-19	1	-1/+33
\|
*	Merge pull request #29369 from vespa-engine/balder/gc-unused-split-parameter	Henning Baldersheim	2023-11-17	1	-24/+8
\|\ \| \| \| \|	Fully GC unused parameter as we now always split phrases.
\| *	Fully GC unused parameter as we now always split phrases.	Henning Baldersheim	2023-11-17	1	-24/+8
\| \|
* \|	If hit_rate is below 1% drop match phase limiting. It has too high fixed ↵	Henning Baldersheim	2023-11-16	1	-0/+1
\|/ \| \| \|	cost and will liklely make things worse.
*	Merge pull request #29284 from ↵	Tor Brede Vekterli	2023-11-09	1	-10/+25
\|\ \| \| \| \| \| \| \| \|	vespa-engine/vekterli/include-doctype-and-gid-with-metadata-doc-entries Include doc type name and GID in metadata iteration results
\| *	Simplify by passing in and storing the `DocTypeName` verbatim	Tor Brede Vekterli	2023-11-08	1	-42/+4
\| \|
\| *	Include doc type name and GID in metadata iteration results	Tor Brede Vekterli	2023-11-08	1	-9/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Document type is fetched from the associated `IPersistenceHandler` on-demand; it is assumed the lifetime of the pointer must be valid for the entire lifetime of the iterator itself, as the latter holds a valid handler snapshot. For simplicity, it's possible to _not_ pass in a handler, in which case the doc type name will be implicitly empty. Some expected `DocEntry` sizes have been adjusted, as we now report the size of the document type and GID alongside the base type size.
* \|	Sameelement behaves like an and with extra constraints.	Henning Baldersheim	2023-11-07	2	-6/+5
\|/ \| \| \|	So it should behave the sameway during fetchPostings too.
*	Merge pull request #29269 from ↵	Geir Storli	2023-11-07	5	-47/+75
\|\ \| \| \| \| \| \| \| \|	vespa-engine/geirst/control-resource-usage-when-in-maintenance Control resource usage when node in maintenance
\| *	Also tune or turn off background jobs when content node is in maintenance.	Geir Storli	2023-11-07	4	-19/+56
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously the following has been adjusted when the node is retired: 1) Lid space compaction - turned off. 2) Flush engine strategy - tuned to reduce disk and CPU usage. 3) Attribute vector compaction - tuned to reduce memory allocations and CPU usage. In a node retirement scenario documents are being removed from the node, and eventually the node is deleted. Without the adjustments above a lot of resources are spent "fixing" the results of removing documents, and the process just takes a lot longer. A similar set of challenges can occur when a node is set in maintenance, especially if the node transitions from retired to maintenance. E.g. this happens when the Vespa version is upgraded in Vespa Cloud. With this change the resource usage of background jobs are kept in check for both a retired node and a node in maintenance.
\| *	Rewrite to use GTest.	Geir Storli	2023-11-07	2	-35/+26
\| \|
* \|	Merge pull request #29266 from ↵	Henning Baldersheim	2023-11-07	2	-4/+11
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \|	vespa-engine/vekterli/expose-doc-type-name-from-persistence-handler Expose document type name from `IPersistenceHandler` interface
\| * \|	Expose document type name from `IPersistenceHandler` interface	Tor Brede Vekterli	2023-11-07	2	-4/+11
\| \|/
* \|	Merge pull request #29264 from ↵	Henning Baldersheim	2023-11-07	1	-0/+3
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \|	vespa-engine/toregge/extend-persistence-conformance-test-wrt-remove-by-gid Test remove by gid for nonexisting gid and for gid with tombstone.
\| * \|	Test remove by gid for nonexisting gid and for gid with tombstone.	Tor Egge	2023-11-07	1	-0/+3
\| \|/
* /	If match-phase limiting has concluded that a post filter is most efficient,	Henning Baldersheim	2023-11-07	1	-2/+3
\|/ \| \| \|	we must only generate posting lists if it is actually benefiscal. If not the fixed cost is too high.
*	Add removeByGidAsync() to spi.	Tor Egge	2023-11-06	2	-0/+10
\|