vespa - An engine for low-latency computation over large data sets

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fix format strings.	Tor Egge	2023-08-21	1	-1/+1
\|
*	Merge pull request #28037 from ↵	Henning Baldersheim	2023-08-16	2	-2/+4
\|\ \| \| \| \| \| \| \| \|	vespa-engine/balder/use-interfaces-for-looking-up-index-from-node - Avoid going via a temporary IdealNodesList.
\| *	Rename methods to follow style in class	Henning Baldersheim	2023-08-16	1	-2/+3
\| \|
\| *	- Avoid going via a temporary IdealNodesList.	Henning Baldersheim	2023-08-14	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Use ConstArrayRef to hide implementation. - Store all 3 node categories in a single vector. - Use a small_vector that can handle redundancy up to 5 without requiring extra memory allocation. - Build a hash_map if redundancy/groups > 32 for constant lookup time.
* \|	GC unused direct IO support	Henning Baldersheim	2023-08-15	3	-73/+4
\| \|
* \|	GC and clean up more unused code	Henning Baldersheim	2023-08-15	3	-194/+45
\| \|
* \|	Assert that you actually got memory you allocated.	Henning Baldersheim	2023-08-15	1	-0/+4
\| \|
* \|	GC unused File code and other fallout.	Henning Baldersheim	2023-08-15	5	-513/+75
\|/
*	Avoid eating memory on repeated insert.	Henning Baldersheim	2023-08-10	1	-2/+1
\|
*	Merge pull request #27989 from vespa-engine/balder/faster-bucketdb-metrics	Henning Baldersheim	2023-08-09	2	-0/+3
\|\ \| \| \| \|	Move where possible
\| *	Unify on a single definition of MinReplicaMap	Henning Baldersheim	2023-08-08	2	-0/+3
\| \|
* \|	Merge pull request #27988 from ↵	Henning Baldersheim	2023-08-09	2	-0/+25
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \|	vespa-engine/balder/prepare-for-better-stats-reset-code Balder/prepare for better stats reset code
\| * \|	Add support for creating ConstArrayRef from std::array	Henning Baldersheim	2023-08-08	2	-0/+25
\| \|/
* /	Use vespalib hash_set since it is significantly faster and than ↵	Henning Baldersheim	2023-08-08	1	-0/+2
\|/ \| \| \|	std::unordered_set
*	Provide more information when failing to mmap files	Henning Baldersheim	2023-08-02	1	-6/+10
\|
*	Deinline BufferTypeBase move constructors.	Henning Baldersheim	2023-07-31	2	-2/+5
\|
*	Reduce number of checks and asserts as proper precondition check with ↵	Henning Baldersheim	2023-07-27	2	-8/+1
\| \| \| \|	validFirstByte has always been conducted.
*	- Return double for computation.	Henning Baldersheim	2023-07-27	2	-4/+4
\| \| \| \| \|	- Do not hide narrowing to 32 bit. - Use enum class.
*	- Pack data closer to let config fit in 2 cache lines instead of 4.	Henning Baldersheim	2023-07-27	10	-95/+80
\| \| \| \| \|	- Avoid plt indirection and allow more inlining of frequently called code. - Reapplication of #27646
*	Merge pull request #27817 from ↵	Henning Baldersheim	2023-07-27	2	-10/+10
\|\ \| \| \| \| \| \| \| \|	vespa-engine/revert-27773-revert-27643-balder/use-direct-weighted-set-also-for-filter-fields Revert "Revert "- Consolidate on isFilter.""
\| *	Revert "Revert "- Consolidate on isFilter.""	Henning Baldersheim	2023-07-19	2	-10/+10
\| \|
* \|	Suppress GCC false positive compiler warning when compiling with sanitizers	Tor Brede Vekterli	2023-07-26	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Looks like GCC gets confused during compilation of inlined xxhash functions when the input buffer is backed by a `std::array`, even when the length argument is a runtime value. An xxhash branch that can only kick in when the input length is >= 9 bytes triggers a compilation warning (and thus error) that 8 bytes at the start of the buffer is being read, with GCC staunchly insisting that the buffer size is only 4 bytes. We only see this warning when compiling with UBSan instrumentation, so presumably it injects enough changes into the GCC intermediate representation to thoroughly confuse it. For now, suppress the warning when compiling with sanitizers. Revisit on GCC 13 to see if the warning is gone.
* \|	Merge pull request #27883 from vespa-engine/balder/less-fastos-statinfo	Henning Baldersheim	2023-07-25	3	-31/+20
\|\ \ \| \| \| \| \| \|	Prefer std::filesystem::exists over FastOS_StatInfo
\| * \|	Prefer std::filesystem::exists over FastOS_StatInfo	Henning Baldersheim	2023-07-25	3	-31/+20
\| \| \|
* \| \|	Use uint32_t as ucs4_t	Henning Baldersheim	2023-07-25	1	-1/+3
\|/ /
* \|	Use std::filesystem::current_path	Tor Egge	2023-07-21	5	-77/+0
\| \|
* \|	Remove vespalib::stat and vespalib::getFileSize.	Tor Egge	2023-07-20	3	-41/+4
\| \|
* \|	Remove declaration of vespalib::isDirectory.	Tor Egge	2023-07-20	1	-8/+0
\| \|
* \|	Use std::filesystem::is_directory and std::filesystem::exists	Tor Egge	2023-07-20	7	-40/+19
\| \|
* \|	Remove vespalib::pathExists, vespalib::isPlainFile and vespalib::isSymLink.	Tor Egge	2023-07-20	2	-36/+0
\| \|
* \|	Remove vespalib::symlink and vespalib::readLink	Tor Egge	2023-07-20	3	-115/+0
\| \|
* \|	Remove vespalib::unlink.	Tor Egge	2023-07-20	3	-59/+5
\| \|
* \|	Remove vespalib::copy and vespalib::rename.	Tor Egge	2023-07-20	3	-248/+0
\| \|
* \|	Use std::filesystem::rename instead of vespalib::rename.	Tor Egge	2023-07-19	1	-2/+3
\| \|
* \|	- Add noexcept and some constexpr.	Henning Baldersheim	2023-07-19	8	-90/+88
\| \| \| \| \| \| \| \|	- Use BitWord as helper class instead of inheriting in many static methods.
* \|	Backport to clang 15.	Tor Egge	2023-07-19	3	-7/+7
\|/
*	Drop non ancient non const GetSize/GetPosition	Henning Baldersheim	2023-07-18	9	-46/+37
\|
*	GC unused OpenExisting	Henning Baldersheim	2023-07-18	3	-37/+1
\|
*	GC unused SetFileName	Henning Baldersheim	2023-07-18	3	-32/+11
\|
*	Merge pull request #27810 from vespa-engine/vekterli/levenshtein-dfa	Tor Egge	2023-07-18	18	-3/+2663
\|\ \| \| \| \|	Implement Levenshtein DFA with successor string generation
\| *	Implement Levenshtein DFAs with successor string generation	Tor Brede Vekterli	2023-07-18	18	-3/+2663
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements code for building and evaluating Levenshtein Deterministic Finite Automata, where the resulting DFA efficiently matches all possible source strings that can be transformed to the target string within k max edits. This allows for O(n) matching of strings. We currently support k in {1, 2}. Additionally, when matching using a DFA, in the case where the source string does _not_ match, we can generate the _successor_ string; the next matching string that is lexicographically _greater_ than the source string. This string has the invariant that there are no possibly matching strings within k edits ordered after the source string but before the successor. This lets us do possibly massive leaps forward in an ordered dictionary, turning a scan for matches into a sublinear operation. Matching and successor generation is fully Unicode-aware. All input strings are expected to be in UTF-8 (without nulls), and the generated successor is also encoded as UTF-8. Internally, matching is done on UTF-32 code points and the DFA itself is built around UTF-32. UTF-8 decoding of source strings is done in a streaming fashion and does not require any allocations. This commit includes a templated core Levenshtein DFA matching (and successor generation) algorithm and two separate DFA implementations that can be used; one explicit and one implicit. The explicit DFA is an immutable DAG built up-front that represents all DFA states and transitions as explicit nodes and edges in a graph. This is currently the fastest to evaluate, but the build time and memory usage means its usage should be preferred for shorter strings (up to a few hundred chars). The implicit DFA does not build any graph up-front, but rather evaluates state transitions on-demand for any given source string. This is currently slower than the explicit DFA, but its O(1) memory usage (aside from the memory used by the target string itself) means that it can be used for arbitrary string lengths. This code currently exists as a freestanding vespalib utility, and is not yet wired to any production code (fuzzy matching or similar). Future optimizations: * Redesign sparse state representation and stepping logic to be much less branching, in turn making the code much less likely to stall the CPU pipeline. * Emit as much as possible of the successor string suffix by copying directly from the target string UTF-8 representation instead of following the DFA and encoding UTF-32 to UTF-8 chars.
* \|	Merge pull request #27809 from ↵	Tor Egge	2023-07-18	3	-78/+38
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \|	vespa-engine/balder/gc-some-obscure-fastos-file-stdout-stderr-support GC some obscure support in FastOS_File for stderr and stdout.
\| * \|	GC some obscure support in FastOS_File for stderr and stdout.	Henning Baldersheim	2023-07-18	3	-78/+38
\| \|/
* /	GC unused Rename interface.	Henning Baldersheim	2023-07-18	4	-41/+0
\|/
*	Remove FastOS_File::Delete().	Tor Egge	2023-07-17	6	-32/+2
\|
*	Use std::filesystem::remove in unit tests.	Tor Egge	2023-07-14	1	-12/+19
\|
*	Use std::filesystem in buffered file unit test.	Tor Egge	2023-07-14	1	-10/+15
\|
*	Revert "- Consolidate on isFilter."	Tor Egge	2023-07-14	2	-10/+10
\|
*	Revert "- Pack data closer to let config fit in 2 cache lines instead of 4."	Tor Egge	2023-07-14	10	-80/+95
\|
*	Fail when unable to open file.	Tor Egge	2023-07-13	1	-3/+3
\|