vespa - An engine for low-latency computation over large data sets

	Commit message (Collapse)	Author	Age	Files	Lines
*	Add missing includes, avoid shadow warning and skip including file not	Tor Egge	2023-09-29	3	-1/+4
\| \| \| \| \| \|	present in llvm 17. Issues detected when compiling with clang++ 17 / libc++ 17 / llvm 17.
*	Merge pull request #28714 from ↵	Håvard Pettersen	2023-09-29	2	-41/+133
\|\ \| \| \| \| \| \| \| \|	vespa-engine/havardpe/better-graphviz-for-table-dfa dump table_dfa as actual dfa in graphviz
\| *	dump table_dfa as actual dfa in graphviz	Håvard Pettersen	2023-09-28	2	-41/+133
\| \| \| \| \| \| \| \| \| \|	enumerate states based on best-edge-first '*' means any character without its own edge
* \|	Preserve prefix of input DFA successor string	Tor Brede Vekterli	2023-09-27	3	-13/+51
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a non-empty string is passed as a successor to the DFA, the contents of the string will be preserved, i.e. the successor will always be _appended_ to any existing data. This allows for less manual fiddling when implementing prefix locking by the caller (no need to concatenate a prefix with the generated successor string). Note: this has some added cognitive cost where the caller now has the entire responsibility of resetting the successor between calls. The existing fuzzy matcher has been updated to no longer require a separation between successor prefix and suffix; it can now safely reuse the successor prefix between calls.
*	Merge pull request #28677 from vespa-engine/havardpe/inline-table-dfa	Håvard Pettersen	2023-09-27	4	-44/+227
\|\ \| \| \| \|	use inline pre-generated tables
\| *	Update vespalib/src/vespa/vespalib/fuzzy/inline_tfa.hpp	Håvard Pettersen	2023-09-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	add copyright notice Co-authored-by: Tor Brede Vekterli <vekterli@yahooinc.com>
\| *	use inline pre-generated tables	Håvard Pettersen	2023-09-26	4	-44/+226
\| \|
* \|	Merge pull request #28674 from vespa-engine/balder/minor-code-health	Geir Storli	2023-09-26	2	-4/+5
\|\ \ \| \| \| \| \| \|	Minor code health
\| * \|	Minor code health	Henning Baldersheim	2023-09-26	2	-4/+5
\| \| \|
* \| \|	Make DFA table algorithm selectable at query time.	Geir Storli	2023-09-26	2	-1/+7
\| \|/ \|/\|
* \|	Merge pull request #28652 from vespa-engine/havardpe/table-dfa	Tor Brede Vekterli	2023-09-26	10	-8/+900
\|\ \ \| \|/ \|/\|	table dfa
\| *	table dfa	Håvard Pettersen	2023-09-25	10	-8/+900
\| \|
* \|	Merge pull request #28654 from vespa-engine/balder/return-early-on-match	Henning Baldersheim	2023-09-26	3	-19/+17
\|\ \ \| \| \| \| \| \|	- Return early in doSeek if docId found.
\| * \|	Add noexcept	Henning Baldersheim	2023-09-25	3	-19/+17
\| \|/
* \|	Merge pull request #28653 from vespa-engine/balder/use-stash	Henning Baldersheim	2023-09-26	12	-218/+64
\|\ \ \| \| \| \| \| \|	- Use stash instead of the single use of VariableSizeVector.
\| * \|	Reorder members to reflect required lifetime, and remove incorrect noexcept.	Henning Baldersheim	2023-09-26	3	-7/+9
\| \| \|
\| * \|	- Use stash instead of the single use of VariableSizeVector.	Henning Baldersheim	2023-09-25	12	-217/+61
\| \|/
* \|	Fix printing of sparse states	Tor Brede Vekterli	2023-09-25	1	-3/+3
\| \|
* \|	Generate Levenshtein successor prefix "as we go" during match loop	Tor Brede Vekterli	2023-09-25	2	-75/+29
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \|	Moving from an on-demand generation means that we only decode (and possibly normalize) UTF-8 characters _once_ instead of twice in the case of UTF-32 output or non-normalized input. These changes also make it much more easy to add future support for preserving a caller-supplied successor prefix, which would be used for prefix-locked dictionary matching. Note: this subtly changes the API to potentially _always_ mutate the input successor string. The API documentation has been updated to reflect this. No current users of the API should be affected.
*	Add prefix_size constructor argument to DfaFuzzyMatcher.	Tor Egge	2023-09-22	1	-0/+1
\|
*	Merge pull request #28606 from ↵	Geir Storli	2023-09-21	3	-0/+78
\|\ \| \| \| \| \| \| \| \|	vespa-engine/geirst/fuzzy-matching-algorithm-query-property Add query property to control fuzzy matching algorithm.
\| *	Add query property to control fuzzy matching algorithm.	Geir Storli	2023-09-21	3	-0/+78
\| \|
* \|	Split core DFA match loop into match-only and successor-emitting specializations	Tor Brede Vekterli	2023-09-21	7	-43/+65
\|/ \| \| \| \| \| \| \| \|	This allows for "hybrid" schemes where raw matching (without successor generation) is done with a dedicated matcher implementation that is faster for that particular purpose. Also gives a much tighter loop for the match-only case and removes some branches from the successor-emitting case.
*	Use the Guard when testing bundle pool	Henning Baldersheim	2023-09-20	3	-33/+31
\|
*	Refactor code to make object lifetime easier to follow.	Henning Baldersheim	2023-09-20	1	-0/+12
\|
*	Add UTF-32 exact suffix output to DFA concept	Tor Brede Vekterli	2023-09-18	1	-1/+6
\|
*	Support raw UTF-32 successor string output	Tor Brede Vekterli	2023-09-18	11	-41/+154
\| \| \| \| \| \| \| \| \| \|	Avoids need to encode UTF-32 characters to UTF-8 internally, as this is likely to be subsequently reversed by a caller that itself operates on UTF-32 code points. Change the `match()` API by introducing a separate overload that does not produce a successor, and add two explicit successor string type overloads that take the string by ref, not pointer.
*	Optimize successor generation of exact match suffix	Tor Brede Vekterli	2023-09-18	8	-16/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoids explicitly stepping the DFA states and handling each transition character separately by detecting the case where the only way the generated suffix can possibly match is for it to _exactly_ match the remaining target string suffix. This is the case when the sparse cost matrix row only has 1 column remaining, and this column is equal to the max edit distance. I.e. no more edits can possibly be done. In local synthetic benchmarks this speeds up successor generation 4x when the target string is 64 characters.
*	Use make_for_lookup() member function on existing comparator	Tor Egge	2023-09-18	5	-28/+32
\| \| \| \|	to make a new comparator which is used for lookup.
*	Add comparator to unique store.	Tor Egge	2023-09-18	3	-55/+51
\|
*	Rename fallback_value to lookup_value in UniqueStoreComparator,	Tor Egge	2023-09-18	2	-10/+10
\| \| \| \| \|	UniqueStoreStringComparator, EnumStoreComparator and EnumStoreStringComparator.
*	Add DfaStringComparator.	Tor Egge	2023-09-15	1	-0/+1
\|
*	Add support for case-insensitive matching to Levenshtein DFAs	Tor Brede Vekterli	2023-09-15	11	-104/+344
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds matching modes `Cased` and `Uncased`. `Cased` requires UTF-32 code points to match exactly, and successor strings are guaranteed to be strictly higher than the source (candidate) string in `memcmp` order. This mirrors the behavior of the current DFA implementation. `Uncased` treats all characters as if they were lowercased, both for the target and source strings. The target (query) string is explicitly lowercased at DFA build-time to avoid duplicate work. Source strings are implicitly lowercased character by character on-demand during matching. Important ordering note: Successor strings for `Uncased` are generated _as if_ the source string was originally all in lowercase form. This requires some extra added handling when emitting successor prefixes, as we can't just blindly copy UTF-8 bytes from the source string as we do when matching in `Cased` mode. A new casing-dimension has been added to most parameterized unit tests.
*	Use generated header for sanitizer detection macros	Tor Brede Vekterli	2023-09-06	1	-1/+1
\| \| \| \|	Needed to properly detect UBSan-instrumented compilation.
*	Add detailed state explorer for field writer SequencedTaskExecutor.	Geir Storli	2023-09-05	2	-0/+18
\| \| \| \|	This exposes the raw statistics for each underlying executor.
*	MADV_DONTDUMP is specific for linux.	Tor Egge	2023-09-04	1	-0/+2
\|
*	Remove FastOS_DirectoryScan	Tor Egge	2023-09-01	6	-259/+3
\|
*	Merge pull request #28321 from vespa-engine/toregge/bump-default-small-limit	Henning Baldersheim	2023-08-31	1	-1/+1
\|\ \| \| \| \|	Adjust limit for when mmap file allocator uses separate mmaps.
\| *	Adjust limit for when mmap file allocator uses separate mmaps.	Tor Egge	2023-08-31	1	-1/+1
\| \|
* \|	Add saturation metric for executors.	Geir Storli	2023-08-31	2	-2/+29
\|/ \| \| \| \|	This should make it easier to observe bottlenecks in one of the underlying executor threads used in the "field writer" SequencedTaskExecutor.
*	Let node info for cluster controller be explicit, and not a metric consumer.	Henning Baldersheim	2023-08-29	1	-1/+1
\|
*	added pop_back function to SmallVector	Håvard Pettersen	2023-08-28	2	-0/+27
\| \| \| \|	follow std::vector by making it undefined for empty vectors
*	Use 128 bytes alignment for small allocations in MmapFileAllocator.	Tor Egge	2023-08-25	2	-9/+9
\|
*	Extend class comment.	Tor Egge	2023-08-24	1	-0/+3
\|
*	Extend test for reusing file offset.	Tor Egge	2023-08-24	1	-2/+15
\|
*	Use premmapped areas for smaller allocations than _small_limit.	Tor Egge	2023-08-24	3	-19/+159
\|
*	Add premmapped areas to file area freelist.	Tor Egge	2023-08-24	3	-12/+89
\|
*	Merge pull request #28116 from vespa-engine/balder/avoid-dynamic_cast	Henning Baldersheim	2023-08-23	2	-11/+14
\|\ \| \| \| \|	Avoid dynamic_cast by adding an interface to get allocated size
\| *	Add final	Henning Baldersheim	2023-08-23	1	-2/+4
\| \|
\| *	Avoid dynamic_cast by adding an interface to get allocated size	Henning Baldersheim	2023-08-22	2	-9/+10
\| \|