aboutsummaryrefslogtreecommitdiffstats
path: root/vespalib
Commit message (Collapse)AuthorAgeFilesLines
* Simplify and avoid default arguments.Henning Baldersheim2023-11-043-2/+8
|
* No need to specify your own namespace.Henning Baldersheim2023-11-044-8/+9
|
* Revert "No need to specify your own namespace."Henning Baldersheim2023-11-047-17/+10
|
* Simplify and avoid default arguments.Henning Baldersheim2023-11-033-2/+8
|
* No need to specify your own namespace.Henning Baldersheim2023-11-034-8/+9
|
* Deinline foreach also for internal nodesHenning Baldersheim2023-11-022-46/+68
|
* - deinline foreach in btree leaf nodes.Henning Baldersheim2023-11-022-30/+45
|
* Test that OpenSSL mTLS integration is not vulnerable to certificate stuffingTor Brede Vekterli2023-11-021-2/+45
| | | | | | | | | | | | | | | | | | | | | | | This adds a test that our OpenSSL mTLS integration is not vulnerable to CVE-2023-2422-style certificate credential stuffing. Spoiler alert: we're not, and never have been vulnerable. But this test shall help to ensure we also never accidentally will be in the future. If a server is vulnerable to certificate stuffing, a sneaky client may include both a valid certificate chain (containing credential set A) as well as a self-signed peer certificate (containing credential set B). The vulnerable server thinks the latter cert has been verified, even though the mTLS implementation only verifies the first (actual) client cert as being signed by the CA. The server may then wrongfully choose to include set B as the client's credentials. We explicitly only consider certificates in the chain at OpenSSL "error depth zero", which means the "end entity certificate", i.e. the client peer.
* Add `noexcept` and minor cleanupsTor Brede Vekterli2023-11-011-22/+22
|
* Move `HwInfo` from `proton` namespace to `vespalib`Tor Brede Vekterli2023-11-011-0/+80
| | | | | This is information that is valuable to many different components, not just the search core internals.
* Move xxh3_64 methods to vespalib. That also removes the need for workarounds ↵Henning Baldersheim2023-10-174-29/+27
| | | | for GCC false positives.
* Relaxed store is sufficient.Henning Baldersheim2023-10-172-6/+7
|
* Since the cached size can be updated by many threads, it must be an atomic ↵Henning Baldersheim2023-10-172-10/+13
| | | | since there can be many readers.
* Avoid incorrect gcc warning compiling inlined XXH3 code. Also stick to ↵Henning Baldersheim2023-10-161-1/+1
| | | | including official interface.
* Correct copyright headersJon Bratseth2023-10-092-10/+10
|
* Update copyrightJon Bratseth2023-10-091420-1426/+1426
|
* Use ConstBufferRef and add some noexceptHenning Baldersheim2023-10-051-17/+17
|
* Add missing includes, avoid shadow warning and skip including file notTor Egge2023-09-293-1/+4
| | | | | | present in llvm 17. Issues detected when compiling with clang++ 17 / libc++ 17 / llvm 17.
* Merge pull request #28714 from ↵Håvard Pettersen2023-09-292-41/+133
|\ | | | | | | | | vespa-engine/havardpe/better-graphviz-for-table-dfa dump table_dfa as actual dfa in graphviz
| * dump table_dfa as actual dfa in graphvizHåvard Pettersen2023-09-282-41/+133
| | | | | | | | | | enumerate states based on best-edge-first '*' means any character without its own edge
* | Preserve prefix of input DFA successor stringTor Brede Vekterli2023-09-273-13/+51
|/ | | | | | | | | | | | | | | | If a non-empty string is passed as a successor to the DFA, the contents of the string will be preserved, i.e. the successor will always be _appended_ to any existing data. This allows for less manual fiddling when implementing prefix locking by the caller (no need to concatenate a prefix with the generated successor string). Note: this has some added cognitive cost where the caller now has the entire responsibility of resetting the successor between calls. The existing fuzzy matcher has been updated to no longer require a separation between successor prefix and suffix; it can now safely reuse the successor prefix between calls.
* Merge pull request #28677 from vespa-engine/havardpe/inline-table-dfaHåvard Pettersen2023-09-274-44/+227
|\ | | | | use inline pre-generated tables
| * Update vespalib/src/vespa/vespalib/fuzzy/inline_tfa.hpp Håvard Pettersen2023-09-271-0/+1
| | | | | | | | | | add copyright notice Co-authored-by: Tor Brede Vekterli <vekterli@yahooinc.com>
| * use inline pre-generated tablesHåvard Pettersen2023-09-264-44/+226
| |
* | Merge pull request #28674 from vespa-engine/balder/minor-code-healthGeir Storli2023-09-262-4/+5
|\ \ | | | | | | Minor code health
| * | Minor code healthHenning Baldersheim2023-09-262-4/+5
| | |
* | | Make DFA table algorithm selectable at query time.Geir Storli2023-09-262-1/+7
| |/ |/|
* | Merge pull request #28652 from vespa-engine/havardpe/table-dfaTor Brede Vekterli2023-09-2610-8/+900
|\ \ | |/ |/| table dfa
| * table dfaHåvard Pettersen2023-09-2510-8/+900
| |
* | Merge pull request #28654 from vespa-engine/balder/return-early-on-matchHenning Baldersheim2023-09-263-19/+17
|\ \ | | | | | | - Return early in doSeek if docId found.
| * | Add noexceptHenning Baldersheim2023-09-253-19/+17
| |/
* | Merge pull request #28653 from vespa-engine/balder/use-stashHenning Baldersheim2023-09-2612-218/+64
|\ \ | | | | | | - Use stash instead of the single use of VariableSizeVector.
| * | Reorder members to reflect required lifetime, and remove incorrect noexcept.Henning Baldersheim2023-09-263-7/+9
| | |
| * | - Use stash instead of the single use of VariableSizeVector.Henning Baldersheim2023-09-2512-217/+61
| |/
* | Fix printing of sparse statesTor Brede Vekterli2023-09-251-3/+3
| |
* | Generate Levenshtein successor prefix "as we go" during match loopTor Brede Vekterli2023-09-252-75/+29
|/ | | | | | | | | | | | | | Moving from an on-demand generation means that we only decode (and possibly normalize) UTF-8 characters _once_ instead of twice in the case of UTF-32 output or non-normalized input. These changes also make it much more easy to add future support for preserving a caller-supplied successor prefix, which would be used for prefix-locked dictionary matching. Note: this subtly changes the API to potentially _always_ mutate the input successor string. The API documentation has been updated to reflect this. No current users of the API should be affected.
* Add prefix_size constructor argument to DfaFuzzyMatcher.Tor Egge2023-09-221-0/+1
|
* Merge pull request #28606 from ↵Geir Storli2023-09-213-0/+78
|\ | | | | | | | | vespa-engine/geirst/fuzzy-matching-algorithm-query-property Add query property to control fuzzy matching algorithm.
| * Add query property to control fuzzy matching algorithm.Geir Storli2023-09-213-0/+78
| |
* | Split core DFA match loop into match-only and successor-emitting specializationsTor Brede Vekterli2023-09-217-43/+65
|/ | | | | | | | | This allows for "hybrid" schemes where raw matching (without successor generation) is done with a dedicated matcher implementation that is faster for that particular purpose. Also gives a much tighter loop for the match-only case and removes some branches from the successor-emitting case.
* Use the Guard when testing bundle poolHenning Baldersheim2023-09-203-33/+31
|
* Refactor code to make object lifetime easier to follow.Henning Baldersheim2023-09-201-0/+12
|
* Add UTF-32 exact suffix output to DFA conceptTor Brede Vekterli2023-09-181-1/+6
|
* Support raw UTF-32 successor string outputTor Brede Vekterli2023-09-1811-41/+154
| | | | | | | | | | Avoids need to encode UTF-32 characters to UTF-8 internally, as this is likely to be subsequently reversed by a caller that itself operates on UTF-32 code points. Change the `match()` API by introducing a separate overload that does not produce a successor, and add two explicit successor string type overloads that take the string by ref, not pointer.
* Optimize successor generation of exact match suffixTor Brede Vekterli2023-09-188-16/+93
| | | | | | | | | | | | | | Avoids explicitly stepping the DFA states and handling each transition character separately by detecting the case where the only way the generated suffix can possibly match is for it to _exactly_ match the remaining target string suffix. This is the case when the sparse cost matrix row only has 1 column remaining, and this column is equal to the max edit distance. I.e. no more edits can possibly be done. In local synthetic benchmarks this speeds up successor generation 4x when the target string is 64 characters.
* Use make_for_lookup() member function on existing comparatorTor Egge2023-09-185-28/+32
| | | | to make a new comparator which is used for lookup.
* Add comparator to unique store.Tor Egge2023-09-183-55/+51
|
* Rename fallback_value to lookup_value in UniqueStoreComparator,Tor Egge2023-09-182-10/+10
| | | | | UniqueStoreStringComparator, EnumStoreComparator and EnumStoreStringComparator.
* Add DfaStringComparator.Tor Egge2023-09-151-0/+1
|
* Add support for case-insensitive matching to Levenshtein DFAsTor Brede Vekterli2023-09-1511-104/+344
| | | | | | | | | | | | | | | | | | | | | | | | Adds matching modes `Cased` and `Uncased`. `Cased` requires UTF-32 code points to match exactly, and successor strings are guaranteed to be strictly higher than the source (candidate) string in `memcmp` order. This mirrors the behavior of the current DFA implementation. `Uncased` treats all characters as if they were lowercased, both for the target and source strings. The target (query) string is explicitly lowercased at DFA build-time to avoid duplicate work. Source strings are implicitly lowercased character by character on-demand during matching. Important ordering note: Successor strings for `Uncased` are generated _as if_ the source string was originally all in lowercase form. This requires some extra added handling when emitting successor prefixes, as we can't just blindly copy UTF-8 bytes from the source string as we do when matching in `Cased` mode. A new casing-dimension has been added to most parameterized unit tests.