summaryrefslogtreecommitdiffstats
path: root/vespalib
Commit message (Collapse)AuthorAgeFilesLines
* Use std::filesystem::current_pathTor Egge2023-07-215-77/+0
|
* Remove vespalib::stat and vespalib::getFileSize.Tor Egge2023-07-203-41/+4
|
* Remove declaration of vespalib::isDirectory.Tor Egge2023-07-201-8/+0
|
* Use std::filesystem::is_directory and std::filesystem::existsTor Egge2023-07-207-40/+19
|
* Remove vespalib::pathExists, vespalib::isPlainFile and vespalib::isSymLink.Tor Egge2023-07-202-36/+0
|
* Remove vespalib::symlink and vespalib::readLinkTor Egge2023-07-203-115/+0
|
* Remove vespalib::unlink.Tor Egge2023-07-203-59/+5
|
* Remove vespalib::copy and vespalib::rename.Tor Egge2023-07-203-248/+0
|
* Use std::filesystem::rename instead of vespalib::rename.Tor Egge2023-07-191-2/+3
|
* - Add noexcept and some constexpr.Henning Baldersheim2023-07-198-90/+88
| | | | - Use BitWord as helper class instead of inheriting in many static methods.
* Backport to clang 15.Tor Egge2023-07-193-7/+7
|
* Drop non ancient non const GetSize/GetPositionHenning Baldersheim2023-07-189-46/+37
|
* GC unused OpenExistingHenning Baldersheim2023-07-183-37/+1
|
* GC unused SetFileNameHenning Baldersheim2023-07-183-32/+11
|
* Merge pull request #27810 from vespa-engine/vekterli/levenshtein-dfaTor Egge2023-07-1818-3/+2663
|\ | | | | Implement Levenshtein DFA with successor string generation
| * Implement Levenshtein DFAs with successor string generationTor Brede Vekterli2023-07-1818-3/+2663
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This implements code for building and evaluating Levenshtein Deterministic Finite Automata, where the resulting DFA efficiently matches all possible source strings that can be transformed to the target string within k max edits. This allows for O(n) matching of strings. We currently support k in {1, 2}. Additionally, when matching using a DFA, in the case where the source string does _not_ match, we can generate the _successor_ string; the next matching string that is lexicographically _greater_ than the source string. This string has the invariant that there are no possibly matching strings within k edits ordered after the source string but before the successor. This lets us do possibly massive leaps forward in an ordered dictionary, turning a scan for matches into a sublinear operation. Matching and successor generation is fully Unicode-aware. All input strings are expected to be in UTF-8 (without nulls), and the generated successor is also encoded as UTF-8. Internally, matching is done on UTF-32 code points and the DFA itself is built around UTF-32. UTF-8 decoding of source strings is done in a streaming fashion and does not require any allocations. This commit includes a templated core Levenshtein DFA matching (and successor generation) algorithm and two separate DFA implementations that can be used; one explicit and one implicit. The explicit DFA is an immutable DAG built up-front that represents all DFA states and transitions as explicit nodes and edges in a graph. This is currently the fastest to evaluate, but the build time and memory usage means its usage should be preferred for shorter strings (up to a few hundred chars). The implicit DFA does not build any graph up-front, but rather evaluates state transitions on-demand for any given source string. This is currently slower than the explicit DFA, but its O(1) memory usage (aside from the memory used by the target string itself) means that it can be used for arbitrary string lengths. This code currently exists as a freestanding vespalib utility, and is not yet wired to any production code (fuzzy matching or similar). Future optimizations: * Redesign sparse state representation and stepping logic to be much less branching, in turn making the code much less likely to stall the CPU pipeline. * Emit as much as possible of the successor string suffix by copying directly from the target string UTF-8 representation instead of following the DFA and encoding UTF-32 to UTF-8 chars.
* | Merge pull request #27809 from ↵Tor Egge2023-07-183-78/+38
|\ \ | | | | | | | | | | | | vespa-engine/balder/gc-some-obscure-fastos-file-stdout-stderr-support GC some obscure support in FastOS_File for stderr and stdout.
| * | GC some obscure support in FastOS_File for stderr and stdout.Henning Baldersheim2023-07-183-78/+38
| |/
* / GC unused Rename interface.Henning Baldersheim2023-07-184-41/+0
|/
* Remove FastOS_File::Delete().Tor Egge2023-07-176-32/+2
|
* Use std::filesystem::remove in unit tests.Tor Egge2023-07-141-12/+19
|
* Use std::filesystem in buffered file unit test.Tor Egge2023-07-141-10/+15
|
* Revert "- Consolidate on isFilter."Tor Egge2023-07-142-10/+10
|
* Revert "- Pack data closer to let config fit in 2 cache lines instead of 4."Tor Egge2023-07-1410-80/+95
|
* Fail when unable to open file.Tor Egge2023-07-131-3/+3
|
* Avoid livelock when running rcu vector unit test with valgrind.Tor Egge2023-07-101-0/+17
|
* - Pack data closer to let config fit in 2 cache lines instead of 4.Henning Baldersheim2023-07-0610-95/+80
| | | | - Avoid plt indirection and allow more inlining of frequently called code.
* Merge pull request #27644 from ↵Henning Baldersheim2023-07-053-2/+15
|\ | | | | | | | | vespa-engine/toregge/use-provided-memory-allocator-for-large-arrays Use provided memory allocator for large arrays.
| * Use provided memory allocator for large arrays.Tor Egge2023-07-053-2/+15
| |
* | - Consolidate on isFilter.Henning Baldersheim2023-07-052-10/+10
|/ | | | | - Add has_weight_iterator to IDocumentWeightAttribute to allow fallback to bitvector. - Allow filter attributes to enjoy IDirectWeightedSet optimization.
* Add sort blob writers.Tor Egge2023-07-041-0/+1
|
* Make address sanitizer happyHenning Baldersheim2023-06-292-2/+3
|
* Add noexceptHenning Baldersheim2023-06-299-70/+67
|
* Merge pull request #27547 from vespa-engine/havardpe/est-80-percentile-as-resultHenning Baldersheim2023-06-263-41/+129
|\ | | | | use estimated 80 percentile as benchmark result
| * add commentHåvard Pettersen2023-06-261-0/+17
| |
| * use estimated 80 percentile as benchmark resultHåvard Pettersen2023-06-263-41/+112
| | | | | | | | also simplify somewhat
* | Add max buffer size parameter to array store dynamic type mapper.Tor Egge2023-06-264-24/+37
| |
* | Merge pull request #27542 from vespa-engine/toregge/limit-64-byte-alignmentGeir Storli2023-06-264-6/+16
|\ \ | | | | | | Limit 64-byte dynamic array buffer type alignment based on element type.
| * | Limit 64-byte dynamic array buffer type alignment based on element type.Tor Egge2023-06-244-6/+16
| | |
* | | Remove use of std::min.Tor Egge2023-06-231-1/+1
| | |
* | | Cap number of entries in a buffer to avoid very large buffers.Tor Egge2023-06-236-31/+84
|/ /
* | Use 64 bytes alignment for large arrays.Tor Egge2023-06-222-9/+12
| |
* | Avoid shadowing.Tor Egge2023-06-223-3/+3
| |
* | Allocate space for allowed buffer underflow.Tor Egge2023-06-2210-57/+88
|/
* Merge pull request #27520 from ↵Henning Baldersheim2023-06-229-10/+33
|\ | | | | | | | | vespa-engine/toregge/use-faster-way-to-get-entry-size Use faster way to get entry size.
| * Use faster way to get entry size.Tor Egge2023-06-229-10/+33
| |
* | Merge pull request #27509 from ↵Henning Baldersheim2023-06-222-3/+1
|\ \ | |/ |/| | | | | vespa-engine/balder/move-count-internal-strucures-cache-structures-correctly Balder/move count internal strucures cache structures correctly
| * The _lid2Id _id2KeySet structures are not static, they follow the size of ↵Henning Baldersheim2023-06-211-1/+1
| | | | | | | | the cache.
| * Obly count static memory as static.Henning Baldersheim2023-06-211-2/+0
| |
* | Merge pull request #27508 from ↵Henning Baldersheim2023-06-211-17/+104
|\ \ | | | | | | | | | | | | vespa-engine/havardpe/benchmark-cmp-exch-vs-fetch-add benchmark compare exchange vs fetch add with contention