vespa - An engine for low-latency computation over large data sets

	Commit message (Collapse)	Author	Age	Files	Lines
*	Remove vespalib::pathExists, vespalib::isPlainFile and vespalib::isSymLink.	Tor Egge	2023-07-20	1	-2/+0
\|
*	Remove vespalib::symlink and vespalib::readLink	Tor Egge	2023-07-20	1	-62/+0
\|
*	Remove vespalib::unlink.	Tor Egge	2023-07-20	1	-33/+4
\|
*	Remove vespalib::copy and vespalib::rename.	Tor Egge	2023-07-20	1	-127/+0
\|
*	Use std::filesystem::rename instead of vespalib::rename.	Tor Egge	2023-07-19	1	-2/+3
\|
*	Drop non ancient non const GetSize/GetPosition	Henning Baldersheim	2023-07-18	1	-13/+13
\|
*	GC unused OpenExisting	Henning Baldersheim	2023-07-18	1	-1/+1
\|
*	Implement Levenshtein DFAs with successor string generation	Tor Brede Vekterli	2023-07-18	2	-0/+516
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This implements code for building and evaluating Levenshtein Deterministic Finite Automata, where the resulting DFA efficiently matches all possible source strings that can be transformed to the target string within k max edits. This allows for O(n) matching of strings. We currently support k in {1, 2}. Additionally, when matching using a DFA, in the case where the source string does _not_ match, we can generate the _successor_ string; the next matching string that is lexicographically _greater_ than the source string. This string has the invariant that there are no possibly matching strings within k edits ordered after the source string but before the successor. This lets us do possibly massive leaps forward in an ordered dictionary, turning a scan for matches into a sublinear operation. Matching and successor generation is fully Unicode-aware. All input strings are expected to be in UTF-8 (without nulls), and the generated successor is also encoded as UTF-8. Internally, matching is done on UTF-32 code points and the DFA itself is built around UTF-32. UTF-8 decoding of source strings is done in a streaming fashion and does not require any allocations. This commit includes a templated core Levenshtein DFA matching (and successor generation) algorithm and two separate DFA implementations that can be used; one explicit and one implicit. The explicit DFA is an immutable DAG built up-front that represents all DFA states and transitions as explicit nodes and edges in a graph. This is currently the fastest to evaluate, but the build time and memory usage means its usage should be preferred for shorter strings (up to a few hundred chars). The implicit DFA does not build any graph up-front, but rather evaluates state transitions on-demand for any given source string. This is currently slower than the explicit DFA, but its O(1) memory usage (aside from the memory used by the target string itself) means that it can be used for arbitrary string lengths. This code currently exists as a freestanding vespalib utility, and is not yet wired to any production code (fuzzy matching or similar). Future optimizations: * Redesign sparse state representation and stepping logic to be much less branching, in turn making the code much less likely to stall the CPU pipeline. * Emit as much as possible of the successor string suffix by copying directly from the target string UTF-8 representation instead of following the DFA and encoding UTF-32 to UTF-8 chars.
*	Remove FastOS_File::Delete().	Tor Egge	2023-07-17	1	-2/+2
\|
*	Use std::filesystem::remove in unit tests.	Tor Egge	2023-07-14	1	-12/+19
\|
*	Use std::filesystem in buffered file unit test.	Tor Egge	2023-07-14	1	-10/+15
\|
*	Avoid livelock when running rcu vector unit test with valgrind.	Tor Egge	2023-07-10	1	-0/+17
\|
*	Use provided memory allocator for large arrays.	Tor Egge	2023-07-05	1	-0/+10
\|
*	Merge pull request #27547 from vespa-engine/havardpe/est-80-percentile-as-result	Henning Baldersheim	2023-06-26	1	-39/+84
\|\ \| \| \| \|	use estimated 80 percentile as benchmark result
\| *	add comment	Håvard Pettersen	2023-06-26	1	-0/+17
\| \|
\| *	use estimated 80 percentile as benchmark result	Håvard Pettersen	2023-06-26	1	-39/+67
\| \| \| \| \| \| \| \|	also simplify somewhat
* \|	Add max buffer size parameter to array store dynamic type mapper.	Tor Egge	2023-06-26	2	-18/+30
\| \|
* \|	Merge pull request #27542 from vespa-engine/toregge/limit-64-byte-alignment	Geir Storli	2023-06-26	2	-5/+13
\|\ \ \| \| \| \| \| \|	Limit 64-byte dynamic array buffer type alignment based on element type.
\| * \|	Limit 64-byte dynamic array buffer type alignment based on element type.	Tor Egge	2023-06-24	2	-5/+13
\| \| \|
* \| \|	Remove use of std::min.	Tor Egge	2023-06-23	1	-1/+1
\| \| \|
* \| \|	Cap number of entries in a buffer to avoid very large buffers.	Tor Egge	2023-06-23	2	-22/+53
\|/ /
* \|	Use 64 bytes alignment for large arrays.	Tor Egge	2023-06-22	1	-9/+9
\| \|
* \|	Avoid shadowing.	Tor Egge	2023-06-22	1	-1/+1
\| \|
* \|	Allocate space for allowed buffer underflow.	Tor Egge	2023-06-22	2	-28/+32
\|/
*	Merge pull request #27508 from ↵	Henning Baldersheim	2023-06-21	1	-17/+104
\|\ \| \| \| \| \| \| \| \|	vespa-engine/havardpe/benchmark-cmp-exch-vs-fetch-add benchmark compare exchange vs fetch add with contention
\| *	benchmark compare exchange vs fetch add with contention	Håvard Pettersen	2023-06-21	1	-17/+104
\| \|
* \|	Cleanup array store unit test.	Tor Egge	2023-06-21	1	-3/+3
\| \|
* \|	Merge pull request #27499 from ↵	Tor Egge	2023-06-21	1	-2/+2
\|\ \ \| \|/ \|/\| \| \| \| \|	vespa-engine/toregge/store-dynamic-array-size-at-start-of-entry Store dynamic array size size at start of entry.
\| *	Store dynamic array size size at start of entry.	Tor Egge	2023-06-21	1	-2/+2
\| \|
* \|	Nexus::run is now static	Håvard Pettersen	2023-06-20	2	-22/+17
\|/ \| \| \|	No change for thread lambdas, but they now get separate Nexus objects.
*	Merge pull request #27484 from ↵	Geir Storli	2023-06-19	3	-9/+9
\|\ \| \| \| \| \| \| \| \|	vespa-engine/toregge/rename-max-small-array-type-id-to-max-type-id Rename maxSmallArrayTypeId to max_type_id.
\| *	Rename maxSmallArrayTypeId to max_type_id.	Tor Egge	2023-06-19	3	-9/+9
\| \|
* \|	Merge pull request #27462 from vespa-engine/havardpe/rw-spin-lock-2	Håvard Pettersen	2023-06-19	6	-38/+438
\|\ \ \| \|/ \|/\|	rw spin lock
\| *	add comment and init atomics	Håvard Pettersen	2023-06-19	1	-3/+9
\| \|
\| *	Update vespalib/src/tests/nexus/nexus_test.cpp	Håvard Pettersen	2023-06-19	1	-1/+1
\| \| \| \| \| \|	Co-authored-by: Tor Brede Vekterli <vekterli@yahooinc.com>
\| *	fix typo	Håvard Pettersen	2023-06-19	1	-1/+1
\| \|
\| *	rw spin lock	Håvard Pettersen	2023-06-16	6	-38/+432
\| \| \| \| \| \| \| \| \| \|	still only experimental; both the lock itself and its benchmarking spin-off: Nexus utility for multi-threaded testing and benchmarking
* \|	Merge pull request #27453 from ↵	Tor Egge	2023-06-16	1	-117/+205
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \|	vespa-engine/toregge/wire-in-use-of-dynamic-buffer-type-as-needed-in-array-store Wire in use of dynamic array buffer type as needed in ArrayStore.
\| * \|	Wire in use of dynamic array buffer type as needed in ArrayStore.	Tor Egge	2023-06-16	1	-117/+205
\| \|/
* /	- Add explicit test that onInsert/onRemove is called correctly when cache is ↵	Henning Baldersheim	2023-06-16	1	-0/+60
\|/ \| \| \| \| \| \|	full, do not rely on monitoring cache size. - Call correct method for properly erasing an element, even if it is old :)
*	Adjust DynamicArrayBufferType constructor signature to match	Tor Egge	2023-06-15	1	-2/+3
\| \| \| \|	SmallArrayBufferType constructor signature.
*	Adjust local variable name in get_entry_sizes member function.	Tor Egge	2023-06-14	1	-3/+3
\|
*	Add ArrayStoreDynamicTypeMapper.	Tor Egge	2023-06-14	2	-0/+178
\|
*	Add DynamicArrayBufferType.	Tor Egge	2023-06-14	2	-0/+258
\|
*	Add get_entry_size member function in array store type mappers that maps	Tor Egge	2023-06-13	1	-2/+2
\| \| \| \|	from type id to entry size.
*	Merge pull request #27404 from ↵	Tor Egge	2023-06-13	3	-6/+6
\|\ \| \| \| \| \| \| \| \|	vespa-engine/toregge/store-entry-size-in-buffer-type Store entry size in BufferTypeBase.
\| *	Store entry size in BufferTypeBase.	Tor Egge	2023-06-13	3	-6/+6
\| \|
* \|	Revert "Pass array size to allocArray member function."	Tor Egge	2023-06-13	1	-2/+2
\|/
*	Revert "rw spin lock"	Arnstein Ressem	2023-06-12	3	-367/+37
\|
*	Merge pull request #27383 from vespa-engine/havardpe/rw-spin-lock	Henning Baldersheim	2023-06-12	3	-37/+367
\|\ \| \| \| \|	rw spin lock