summaryrefslogtreecommitdiffstats
path: root/vespalib
Commit message (Collapse)AuthorAgeFilesLines
* Add missing includes, avoid shadow warning and skip including file notTor Egge2023-09-293-1/+4
| | | | | | present in llvm 17. Issues detected when compiling with clang++ 17 / libc++ 17 / llvm 17.
* Merge pull request #28714 from ↵Håvard Pettersen2023-09-292-41/+133
|\ | | | | | | | | vespa-engine/havardpe/better-graphviz-for-table-dfa dump table_dfa as actual dfa in graphviz
| * dump table_dfa as actual dfa in graphvizHåvard Pettersen2023-09-282-41/+133
| | | | | | | | | | enumerate states based on best-edge-first '*' means any character without its own edge
* | Preserve prefix of input DFA successor stringTor Brede Vekterli2023-09-273-13/+51
|/ | | | | | | | | | | | | | | | If a non-empty string is passed as a successor to the DFA, the contents of the string will be preserved, i.e. the successor will always be _appended_ to any existing data. This allows for less manual fiddling when implementing prefix locking by the caller (no need to concatenate a prefix with the generated successor string). Note: this has some added cognitive cost where the caller now has the entire responsibility of resetting the successor between calls. The existing fuzzy matcher has been updated to no longer require a separation between successor prefix and suffix; it can now safely reuse the successor prefix between calls.
* Merge pull request #28677 from vespa-engine/havardpe/inline-table-dfaHåvard Pettersen2023-09-274-44/+227
|\ | | | | use inline pre-generated tables
| * Update vespalib/src/vespa/vespalib/fuzzy/inline_tfa.hpp Håvard Pettersen2023-09-271-0/+1
| | | | | | | | | | add copyright notice Co-authored-by: Tor Brede Vekterli <vekterli@yahooinc.com>
| * use inline pre-generated tablesHåvard Pettersen2023-09-264-44/+226
| |
* | Merge pull request #28674 from vespa-engine/balder/minor-code-healthGeir Storli2023-09-262-4/+5
|\ \ | | | | | | Minor code health
| * | Minor code healthHenning Baldersheim2023-09-262-4/+5
| | |
* | | Make DFA table algorithm selectable at query time.Geir Storli2023-09-262-1/+7
| |/ |/|
* | Merge pull request #28652 from vespa-engine/havardpe/table-dfaTor Brede Vekterli2023-09-2610-8/+900
|\ \ | |/ |/| table dfa
| * table dfaHåvard Pettersen2023-09-2510-8/+900
| |
* | Merge pull request #28654 from vespa-engine/balder/return-early-on-matchHenning Baldersheim2023-09-263-19/+17
|\ \ | | | | | | - Return early in doSeek if docId found.
| * | Add noexceptHenning Baldersheim2023-09-253-19/+17
| |/
* | Merge pull request #28653 from vespa-engine/balder/use-stashHenning Baldersheim2023-09-2612-218/+64
|\ \ | | | | | | - Use stash instead of the single use of VariableSizeVector.
| * | Reorder members to reflect required lifetime, and remove incorrect noexcept.Henning Baldersheim2023-09-263-7/+9
| | |
| * | - Use stash instead of the single use of VariableSizeVector.Henning Baldersheim2023-09-2512-217/+61
| |/
* | Fix printing of sparse statesTor Brede Vekterli2023-09-251-3/+3
| |
* | Generate Levenshtein successor prefix "as we go" during match loopTor Brede Vekterli2023-09-252-75/+29
|/ | | | | | | | | | | | | | Moving from an on-demand generation means that we only decode (and possibly normalize) UTF-8 characters _once_ instead of twice in the case of UTF-32 output or non-normalized input. These changes also make it much more easy to add future support for preserving a caller-supplied successor prefix, which would be used for prefix-locked dictionary matching. Note: this subtly changes the API to potentially _always_ mutate the input successor string. The API documentation has been updated to reflect this. No current users of the API should be affected.
* Add prefix_size constructor argument to DfaFuzzyMatcher.Tor Egge2023-09-221-0/+1
|
* Merge pull request #28606 from ↵Geir Storli2023-09-213-0/+78
|\ | | | | | | | | vespa-engine/geirst/fuzzy-matching-algorithm-query-property Add query property to control fuzzy matching algorithm.
| * Add query property to control fuzzy matching algorithm.Geir Storli2023-09-213-0/+78
| |
* | Split core DFA match loop into match-only and successor-emitting specializationsTor Brede Vekterli2023-09-217-43/+65
|/ | | | | | | | | This allows for "hybrid" schemes where raw matching (without successor generation) is done with a dedicated matcher implementation that is faster for that particular purpose. Also gives a much tighter loop for the match-only case and removes some branches from the successor-emitting case.
* Use the Guard when testing bundle poolHenning Baldersheim2023-09-203-33/+31
|
* Refactor code to make object lifetime easier to follow.Henning Baldersheim2023-09-201-0/+12
|
* Add UTF-32 exact suffix output to DFA conceptTor Brede Vekterli2023-09-181-1/+6
|
* Support raw UTF-32 successor string outputTor Brede Vekterli2023-09-1811-41/+154
| | | | | | | | | | Avoids need to encode UTF-32 characters to UTF-8 internally, as this is likely to be subsequently reversed by a caller that itself operates on UTF-32 code points. Change the `match()` API by introducing a separate overload that does not produce a successor, and add two explicit successor string type overloads that take the string by ref, not pointer.
* Optimize successor generation of exact match suffixTor Brede Vekterli2023-09-188-16/+93
| | | | | | | | | | | | | | Avoids explicitly stepping the DFA states and handling each transition character separately by detecting the case where the only way the generated suffix can possibly match is for it to _exactly_ match the remaining target string suffix. This is the case when the sparse cost matrix row only has 1 column remaining, and this column is equal to the max edit distance. I.e. no more edits can possibly be done. In local synthetic benchmarks this speeds up successor generation 4x when the target string is 64 characters.
* Use make_for_lookup() member function on existing comparatorTor Egge2023-09-185-28/+32
| | | | to make a new comparator which is used for lookup.
* Add comparator to unique store.Tor Egge2023-09-183-55/+51
|
* Rename fallback_value to lookup_value in UniqueStoreComparator,Tor Egge2023-09-182-10/+10
| | | | | UniqueStoreStringComparator, EnumStoreComparator and EnumStoreStringComparator.
* Add DfaStringComparator.Tor Egge2023-09-151-0/+1
|
* Add support for case-insensitive matching to Levenshtein DFAsTor Brede Vekterli2023-09-1511-104/+344
| | | | | | | | | | | | | | | | | | | | | | | | Adds matching modes `Cased` and `Uncased`. `Cased` requires UTF-32 code points to match exactly, and successor strings are guaranteed to be strictly higher than the source (candidate) string in `memcmp` order. This mirrors the behavior of the current DFA implementation. `Uncased` treats all characters as if they were lowercased, both for the target and source strings. The target (query) string is explicitly lowercased at DFA build-time to avoid duplicate work. Source strings are implicitly lowercased character by character on-demand during matching. Important ordering note: Successor strings for `Uncased` are generated _as if_ the source string was originally all in lowercase form. This requires some extra added handling when emitting successor prefixes, as we can't just blindly copy UTF-8 bytes from the source string as we do when matching in `Cased` mode. A new casing-dimension has been added to most parameterized unit tests.
* Use generated header for sanitizer detection macrosTor Brede Vekterli2023-09-061-1/+1
| | | | Needed to properly detect UBSan-instrumented compilation.
* Add detailed state explorer for field writer SequencedTaskExecutor.Geir Storli2023-09-052-0/+18
| | | | This exposes the raw statistics for each underlying executor.
* MADV_DONTDUMP is specific for linux.Tor Egge2023-09-041-0/+2
|
* Remove FastOS_DirectoryScanTor Egge2023-09-016-259/+3
|
* Merge pull request #28321 from vespa-engine/toregge/bump-default-small-limitHenning Baldersheim2023-08-311-1/+1
|\ | | | | Adjust limit for when mmap file allocator uses separate mmaps.
| * Adjust limit for when mmap file allocator uses separate mmaps.Tor Egge2023-08-311-1/+1
| |
* | Add saturation metric for executors.Geir Storli2023-08-312-2/+29
|/ | | | | This should make it easier to observe bottlenecks in one of the underlying executor threads used in the "field writer" SequencedTaskExecutor.
* Let node info for cluster controller be explicit, and not a metric consumer.Henning Baldersheim2023-08-291-1/+1
|
* added pop_back function to SmallVectorHåvard Pettersen2023-08-282-0/+27
| | | | follow std::vector by making it undefined for empty vectors
* Use 128 bytes alignment for small allocations in MmapFileAllocator.Tor Egge2023-08-252-9/+9
|
* Extend class comment.Tor Egge2023-08-241-0/+3
|
* Extend test for reusing file offset.Tor Egge2023-08-241-2/+15
|
* Use premmapped areas for smaller allocations than _small_limit.Tor Egge2023-08-243-19/+159
|
* Add premmapped areas to file area freelist.Tor Egge2023-08-243-12/+89
|
* Merge pull request #28116 from vespa-engine/balder/avoid-dynamic_castHenning Baldersheim2023-08-232-11/+14
|\ | | | | Avoid dynamic_cast by adding an interface to get allocated size
| * Add finalHenning Baldersheim2023-08-231-2/+4
| |
| * Avoid dynamic_cast by adding an interface to get allocated sizeHenning Baldersheim2023-08-222-9/+10
| |