aboutsummaryrefslogtreecommitdiffstats
path: root/vespalib/src/tests
Commit message (Collapse)AuthorAgeFilesLines
* Merge pull request #28714 from ↵Håvard Pettersen2023-09-291-8/+8
|\ | | | | | | | | vespa-engine/havardpe/better-graphviz-for-table-dfa dump table_dfa as actual dfa in graphviz
| * dump table_dfa as actual dfa in graphvizHåvard Pettersen2023-09-281-8/+8
| | | | | | | | | | enumerate states based on best-edge-first '*' means any character without its own edge
* | Preserve prefix of input DFA successor stringTor Brede Vekterli2023-09-271-4/+37
|/ | | | | | | | | | | | | | | | If a non-empty string is passed as a successor to the DFA, the contents of the string will be preserved, i.e. the successor will always be _appended_ to any existing data. This allows for less manual fiddling when implementing prefix locking by the caller (no need to concatenate a prefix with the generated successor string). Note: this has some added cognitive cost where the caller now has the entire responsibility of resetting the successor between calls. The existing fuzzy matcher has been updated to no longer require a separation between successor prefix and suffix; it can now safely reuse the successor prefix between calls.
* Merge pull request #28677 from vespa-engine/havardpe/inline-table-dfaHåvard Pettersen2023-09-271-6/+103
|\ | | | | use inline pre-generated tables
| * use inline pre-generated tablesHåvard Pettersen2023-09-261-6/+103
| |
* | Merge pull request #28674 from vespa-engine/balder/minor-code-healthGeir Storli2023-09-261-1/+1
|\ \ | |/ |/| Minor code health
| * Minor code healthHenning Baldersheim2023-09-261-1/+1
| |
* | Merge pull request #28652 from vespa-engine/havardpe/table-dfaTor Brede Vekterli2023-09-263-3/+313
|\ \ | |/ |/| table dfa
| * table dfaHåvard Pettersen2023-09-253-3/+313
| |
* | - Use stash instead of the single use of VariableSizeVector.Henning Baldersheim2023-09-252-50/+12
|/
* Use the Guard when testing bundle poolHenning Baldersheim2023-09-201-20/+15
|
* Support raw UTF-32 successor string outputTor Brede Vekterli2023-09-181-8/+19
| | | | | | | | | | Avoids need to encode UTF-32 characters to UTF-8 internally, as this is likely to be subsequently reversed by a caller that itself operates on UTF-32 code points. Change the `match()` API by introducing a separate overload that does not produce a successor, and add two explicit successor string type overloads that take the string by ref, not pointer.
* Use make_for_lookup() member function on existing comparatorTor Egge2023-09-182-15/+17
| | | | to make a new comparator which is used for lookup.
* Add comparator to unique store.Tor Egge2023-09-181-4/+4
|
* Add support for case-insensitive matching to Levenshtein DFAsTor Brede Vekterli2023-09-151-57/+165
| | | | | | | | | | | | | | | | | | | | | | | | Adds matching modes `Cased` and `Uncased`. `Cased` requires UTF-32 code points to match exactly, and successor strings are guaranteed to be strictly higher than the source (candidate) string in `memcmp` order. This mirrors the behavior of the current DFA implementation. `Uncased` treats all characters as if they were lowercased, both for the target and source strings. The target (query) string is explicitly lowercased at DFA build-time to avoid duplicate work. Source strings are implicitly lowercased character by character on-demand during matching. Important ordering note: Successor strings for `Uncased` are generated _as if_ the source string was originally all in lowercase form. This requires some extra added handling when emitting successor prefixes, as we can't just blindly copy UTF-8 bytes from the source string as we do when matching in `Cased` mode. A new casing-dimension has been added to most parameterized unit tests.
* Remove FastOS_DirectoryScanTor Egge2023-09-011-10/+0
|
* Add saturation metric for executors.Geir Storli2023-08-311-0/+21
| | | | | This should make it easier to observe bottlenecks in one of the underlying executor threads used in the "field writer" SequencedTaskExecutor.
* added pop_back function to SmallVectorHåvard Pettersen2023-08-281-0/+26
| | | | follow std::vector by making it undefined for empty vectors
* Use 128 bytes alignment for small allocations in MmapFileAllocator.Tor Egge2023-08-251-8/+8
|
* Extend test for reusing file offset.Tor Egge2023-08-241-2/+15
|
* Use premmapped areas for smaller allocations than _small_limit.Tor Egge2023-08-241-14/+39
|
* Add premmapped areas to file area freelist.Tor Egge2023-08-241-0/+29
|
* Revert "Sample datastore stash memory usage in write thread."Tor Egge2023-08-222-4/+4
|
* Sample datastore stash memory usage in write thread.Tor Egge2023-08-222-4/+4
|
* Disable two alloc unit tests when using any sanitizer.Tor Egge2023-08-211-3/+3
|
* GC unused direct IO supportHenning Baldersheim2023-08-151-9/+0
|
* GC and clean up more unused codeHenning Baldersheim2023-08-151-33/+0
|
* GC unused File code and other fallout.Henning Baldersheim2023-08-152-189/+4
|
* Add support for creating ConstArrayRef from std::arrayHenning Baldersheim2023-08-081-0/+20
|
* Prefer std::filesystem::exists over FastOS_StatInfoHenning Baldersheim2023-07-251-31/+11
|
* Use std::filesystem::current_pathTor Egge2023-07-211-10/+0
|
* Remove vespalib::stat and vespalib::getFileSize.Tor Egge2023-07-201-10/+4
|
* Use std::filesystem::is_directory and std::filesystem::existsTor Egge2023-07-203-7/+13
|
* Remove vespalib::pathExists, vespalib::isPlainFile and vespalib::isSymLink.Tor Egge2023-07-201-2/+0
|
* Remove vespalib::symlink and vespalib::readLinkTor Egge2023-07-201-62/+0
|
* Remove vespalib::unlink.Tor Egge2023-07-201-33/+4
|
* Remove vespalib::copy and vespalib::rename.Tor Egge2023-07-201-127/+0
|
* Use std::filesystem::rename instead of vespalib::rename.Tor Egge2023-07-191-2/+3
|
* Drop non ancient non const GetSize/GetPositionHenning Baldersheim2023-07-181-13/+13
|
* GC unused OpenExistingHenning Baldersheim2023-07-181-1/+1
|
* Implement Levenshtein DFAs with successor string generationTor Brede Vekterli2023-07-182-0/+516
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This implements code for building and evaluating Levenshtein Deterministic Finite Automata, where the resulting DFA efficiently matches all possible source strings that can be transformed to the target string within k max edits. This allows for O(n) matching of strings. We currently support k in {1, 2}. Additionally, when matching using a DFA, in the case where the source string does _not_ match, we can generate the _successor_ string; the next matching string that is lexicographically _greater_ than the source string. This string has the invariant that there are no possibly matching strings within k edits ordered after the source string but before the successor. This lets us do possibly massive leaps forward in an ordered dictionary, turning a scan for matches into a sublinear operation. Matching and successor generation is fully Unicode-aware. All input strings are expected to be in UTF-8 (without nulls), and the generated successor is also encoded as UTF-8. Internally, matching is done on UTF-32 code points and the DFA itself is built around UTF-32. UTF-8 decoding of source strings is done in a streaming fashion and does not require any allocations. This commit includes a templated core Levenshtein DFA matching (and successor generation) algorithm and two separate DFA implementations that can be used; one explicit and one implicit. The explicit DFA is an immutable DAG built up-front that represents all DFA states and transitions as explicit nodes and edges in a graph. This is currently the fastest to evaluate, but the build time and memory usage means its usage should be preferred for shorter strings (up to a few hundred chars). The implicit DFA does not build any graph up-front, but rather evaluates state transitions on-demand for any given source string. This is currently slower than the explicit DFA, but its O(1) memory usage (aside from the memory used by the target string itself) means that it can be used for arbitrary string lengths. This code currently exists as a freestanding vespalib utility, and is not yet wired to any production code (fuzzy matching or similar). Future optimizations: * Redesign sparse state representation and stepping logic to be much less branching, in turn making the code much less likely to stall the CPU pipeline. * Emit as much as possible of the successor string suffix by copying directly from the target string UTF-8 representation instead of following the DFA and encoding UTF-32 to UTF-8 chars.
* Remove FastOS_File::Delete().Tor Egge2023-07-171-2/+2
|
* Use std::filesystem::remove in unit tests.Tor Egge2023-07-141-12/+19
|
* Use std::filesystem in buffered file unit test.Tor Egge2023-07-141-10/+15
|
* Avoid livelock when running rcu vector unit test with valgrind.Tor Egge2023-07-101-0/+17
|
* Use provided memory allocator for large arrays.Tor Egge2023-07-051-0/+10
|
* Merge pull request #27547 from vespa-engine/havardpe/est-80-percentile-as-resultHenning Baldersheim2023-06-261-39/+84
|\ | | | | use estimated 80 percentile as benchmark result
| * add commentHåvard Pettersen2023-06-261-0/+17
| |
| * use estimated 80 percentile as benchmark resultHåvard Pettersen2023-06-261-39/+67
| | | | | | | | also simplify somewhat
* | Add max buffer size parameter to array store dynamic type mapper.Tor Egge2023-06-262-18/+30
| |