vespa - An engine for low-latency computation over large data sets

	Commit message (Collapse)	Author	Age	Files	Lines
*	Simplify and avoid default arguments.	Henning Baldersheim	2023-11-04	3	-2/+8
\|
*	No need to specify your own namespace.	Henning Baldersheim	2023-11-04	4	-8/+9
\|
*	Revert "No need to specify your own namespace."	Henning Baldersheim	2023-11-04	7	-17/+10
\|
*	Simplify and avoid default arguments.	Henning Baldersheim	2023-11-03	3	-2/+8
\|
*	No need to specify your own namespace.	Henning Baldersheim	2023-11-03	4	-8/+9
\|
*	Deinline foreach also for internal nodes	Henning Baldersheim	2023-11-02	2	-46/+68
\|
*	- deinline foreach in btree leaf nodes.	Henning Baldersheim	2023-11-02	2	-30/+45
\|
*	Test that OpenSSL mTLS integration is not vulnerable to certificate stuffing	Tor Brede Vekterli	2023-11-02	1	-2/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a test that our OpenSSL mTLS integration is not vulnerable to CVE-2023-2422-style certificate credential stuffing. Spoiler alert: we're not, and never have been vulnerable. But this test shall help to ensure we also never accidentally will be in the future. If a server is vulnerable to certificate stuffing, a sneaky client may include both a valid certificate chain (containing credential set A) as well as a self-signed peer certificate (containing credential set B). The vulnerable server thinks the latter cert has been verified, even though the mTLS implementation only verifies the first (actual) client cert as being signed by the CA. The server may then wrongfully choose to include set B as the client's credentials. We explicitly only consider certificates in the chain at OpenSSL "error depth zero", which means the "end entity certificate", i.e. the client peer.
*	Add `noexcept` and minor cleanups	Tor Brede Vekterli	2023-11-01	1	-22/+22
\|
*	Move `HwInfo` from `proton` namespace to `vespalib`	Tor Brede Vekterli	2023-11-01	1	-0/+80
\| \| \| \| \|	This is information that is valuable to many different components, not just the search core internals.
*	Move xxh3_64 methods to vespalib. That also removes the need for workarounds ↵	Henning Baldersheim	2023-10-17	4	-29/+27
\| \| \| \|	for GCC false positives.
*	Relaxed store is sufficient.	Henning Baldersheim	2023-10-17	2	-6/+7
\|
*	Since the cached size can be updated by many threads, it must be an atomic ↵	Henning Baldersheim	2023-10-17	2	-10/+13
\| \| \| \|	since there can be many readers.
*	Avoid incorrect gcc warning compiling inlined XXH3 code. Also stick to ↵	Henning Baldersheim	2023-10-16	1	-1/+1
\| \| \| \|	including official interface.
*	Correct copyright headers	Jon Bratseth	2023-10-09	2	-10/+10
\|
*	Update copyright	Jon Bratseth	2023-10-09	1420	-1426/+1426
\|
*	Use ConstBufferRef and add some noexcept	Henning Baldersheim	2023-10-05	1	-17/+17
\|
*	Add missing includes, avoid shadow warning and skip including file not	Tor Egge	2023-09-29	3	-1/+4
\| \| \| \| \| \|	present in llvm 17. Issues detected when compiling with clang++ 17 / libc++ 17 / llvm 17.
*	Merge pull request #28714 from ↵	Håvard Pettersen	2023-09-29	2	-41/+133
\|\ \| \| \| \| \| \| \| \|	vespa-engine/havardpe/better-graphviz-for-table-dfa dump table_dfa as actual dfa in graphviz
\| *	dump table_dfa as actual dfa in graphviz	Håvard Pettersen	2023-09-28	2	-41/+133
\| \| \| \| \| \| \| \| \| \|	enumerate states based on best-edge-first '*' means any character without its own edge
* \|	Preserve prefix of input DFA successor string	Tor Brede Vekterli	2023-09-27	3	-13/+51
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a non-empty string is passed as a successor to the DFA, the contents of the string will be preserved, i.e. the successor will always be _appended_ to any existing data. This allows for less manual fiddling when implementing prefix locking by the caller (no need to concatenate a prefix with the generated successor string). Note: this has some added cognitive cost where the caller now has the entire responsibility of resetting the successor between calls. The existing fuzzy matcher has been updated to no longer require a separation between successor prefix and suffix; it can now safely reuse the successor prefix between calls.
*	Merge pull request #28677 from vespa-engine/havardpe/inline-table-dfa	Håvard Pettersen	2023-09-27	4	-44/+227
\|\ \| \| \| \|	use inline pre-generated tables
\| *	Update vespalib/src/vespa/vespalib/fuzzy/inline_tfa.hpp	Håvard Pettersen	2023-09-27	1	-0/+1
\| \| \| \| \| \| \| \| \| \|	add copyright notice Co-authored-by: Tor Brede Vekterli <vekterli@yahooinc.com>
\| *	use inline pre-generated tables	Håvard Pettersen	2023-09-26	4	-44/+226
\| \|
* \|	Merge pull request #28674 from vespa-engine/balder/minor-code-health	Geir Storli	2023-09-26	2	-4/+5
\|\ \ \| \| \| \| \| \|	Minor code health
\| * \|	Minor code health	Henning Baldersheim	2023-09-26	2	-4/+5
\| \| \|
* \| \|	Make DFA table algorithm selectable at query time.	Geir Storli	2023-09-26	2	-1/+7
\| \|/ \|/\|
* \|	Merge pull request #28652 from vespa-engine/havardpe/table-dfa	Tor Brede Vekterli	2023-09-26	10	-8/+900
\|\ \ \| \|/ \|/\|	table dfa
\| *	table dfa	Håvard Pettersen	2023-09-25	10	-8/+900
\| \|
* \|	Merge pull request #28654 from vespa-engine/balder/return-early-on-match	Henning Baldersheim	2023-09-26	3	-19/+17
\|\ \ \| \| \| \| \| \|	- Return early in doSeek if docId found.
\| * \|	Add noexcept	Henning Baldersheim	2023-09-25	3	-19/+17
\| \|/
* \|	Merge pull request #28653 from vespa-engine/balder/use-stash	Henning Baldersheim	2023-09-26	12	-218/+64
\|\ \ \| \| \| \| \| \|	- Use stash instead of the single use of VariableSizeVector.
\| * \|	Reorder members to reflect required lifetime, and remove incorrect noexcept.	Henning Baldersheim	2023-09-26	3	-7/+9
\| \| \|
\| * \|	- Use stash instead of the single use of VariableSizeVector.	Henning Baldersheim	2023-09-25	12	-217/+61
\| \|/
* \|	Fix printing of sparse states	Tor Brede Vekterli	2023-09-25	1	-3/+3
\| \|
* \|	Generate Levenshtein successor prefix "as we go" during match loop	Tor Brede Vekterli	2023-09-25	2	-75/+29
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \|	Moving from an on-demand generation means that we only decode (and possibly normalize) UTF-8 characters _once_ instead of twice in the case of UTF-32 output or non-normalized input. These changes also make it much more easy to add future support for preserving a caller-supplied successor prefix, which would be used for prefix-locked dictionary matching. Note: this subtly changes the API to potentially _always_ mutate the input successor string. The API documentation has been updated to reflect this. No current users of the API should be affected.
*	Add prefix_size constructor argument to DfaFuzzyMatcher.	Tor Egge	2023-09-22	1	-0/+1
\|
*	Merge pull request #28606 from ↵	Geir Storli	2023-09-21	3	-0/+78
\|\ \| \| \| \| \| \| \| \|	vespa-engine/geirst/fuzzy-matching-algorithm-query-property Add query property to control fuzzy matching algorithm.
\| *	Add query property to control fuzzy matching algorithm.	Geir Storli	2023-09-21	3	-0/+78
\| \|
* \|	Split core DFA match loop into match-only and successor-emitting specializations	Tor Brede Vekterli	2023-09-21	7	-43/+65
\|/ \| \| \| \| \| \| \| \|	This allows for "hybrid" schemes where raw matching (without successor generation) is done with a dedicated matcher implementation that is faster for that particular purpose. Also gives a much tighter loop for the match-only case and removes some branches from the successor-emitting case.
*	Use the Guard when testing bundle pool	Henning Baldersheim	2023-09-20	3	-33/+31
\|
*	Refactor code to make object lifetime easier to follow.	Henning Baldersheim	2023-09-20	1	-0/+12
\|
*	Add UTF-32 exact suffix output to DFA concept	Tor Brede Vekterli	2023-09-18	1	-1/+6
\|
*	Support raw UTF-32 successor string output	Tor Brede Vekterli	2023-09-18	11	-41/+154
\| \| \| \| \| \| \| \| \| \|	Avoids need to encode UTF-32 characters to UTF-8 internally, as this is likely to be subsequently reversed by a caller that itself operates on UTF-32 code points. Change the `match()` API by introducing a separate overload that does not produce a successor, and add two explicit successor string type overloads that take the string by ref, not pointer.
*	Optimize successor generation of exact match suffix	Tor Brede Vekterli	2023-09-18	8	-16/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoids explicitly stepping the DFA states and handling each transition character separately by detecting the case where the only way the generated suffix can possibly match is for it to _exactly_ match the remaining target string suffix. This is the case when the sparse cost matrix row only has 1 column remaining, and this column is equal to the max edit distance. I.e. no more edits can possibly be done. In local synthetic benchmarks this speeds up successor generation 4x when the target string is 64 characters.
*	Use make_for_lookup() member function on existing comparator	Tor Egge	2023-09-18	5	-28/+32
\| \| \| \|	to make a new comparator which is used for lookup.
*	Add comparator to unique store.	Tor Egge	2023-09-18	3	-55/+51
\|
*	Rename fallback_value to lookup_value in UniqueStoreComparator,	Tor Egge	2023-09-18	2	-10/+10
\| \| \| \| \|	UniqueStoreStringComparator, EnumStoreComparator and EnumStoreStringComparator.
*	Add DfaStringComparator.	Tor Egge	2023-09-15	1	-0/+1
\|
*	Add support for case-insensitive matching to Levenshtein DFAs	Tor Brede Vekterli	2023-09-15	11	-104/+344
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds matching modes `Cased` and `Uncased`. `Cased` requires UTF-32 code points to match exactly, and successor strings are guaranteed to be strictly higher than the source (candidate) string in `memcmp` order. This mirrors the behavior of the current DFA implementation. `Uncased` treats all characters as if they were lowercased, both for the target and source strings. The target (query) string is explicitly lowercased at DFA build-time to avoid duplicate work. Source strings are implicitly lowercased character by character on-demand during matching. Important ordering note: Successor strings for `Uncased` are generated _as if_ the source string was originally all in lowercase form. This requires some extra added handling when emitting successor prefixes, as we can't just blindly copy UTF-8 bytes from the source string as we do when matching in `Cased` mode. A new casing-dimension has been added to most parameterized unit tests.