| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a test that our OpenSSL mTLS integration is not
vulnerable to CVE-2023-2422-style certificate credential
stuffing.
Spoiler alert: we're not, and never have been vulnerable.
But this test shall help to ensure we also never accidentally
will be in the future.
If a server is vulnerable to certificate stuffing, a sneaky
client may include both a valid certificate chain (containing
credential set A) as well as a self-signed peer certificate
(containing credential set B). The vulnerable server thinks
the latter cert has been verified, even though the mTLS
implementation only verifies the first (actual) client cert
as being signed by the CA. The server may then wrongfully
choose to include set B as the client's credentials.
We explicitly only consider certificates in the chain at
OpenSSL "error depth zero", which means the "end entity
certificate", i.e. the client peer.
|
| |
|
|
|
|
|
| |
This is information that is valuable to many different components,
not just the search core internals.
|
|
|
|
| |
for GCC false positives.
|
| |
|
|
|
|
| |
since there can be many readers.
|
|
|
|
| |
including official interface.
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
present in llvm 17.
Issues detected when compiling with clang++ 17 / libc++ 17 / llvm 17.
|
|\
| |
| |
| |
| | |
vespa-engine/havardpe/better-graphviz-for-table-dfa
dump table_dfa as actual dfa in graphviz
|
| |
| |
| |
| |
| | |
enumerate states based on best-edge-first
'*' means any character without its own edge
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a non-empty string is passed as a successor to the DFA,
the contents of the string will be preserved, i.e. the successor
will always be _appended_ to any existing data. This allows
for less manual fiddling when implementing prefix locking by the
caller (no need to concatenate a prefix with the generated successor
string).
Note: this has some added cognitive cost where the caller now has
the entire responsibility of resetting the successor between calls.
The existing fuzzy matcher has been updated to no longer require
a separation between successor prefix and suffix; it can now
safely reuse the successor prefix between calls.
|
|\
| |
| | |
use inline pre-generated tables
|
| |
| |
| |
| |
| | |
add copyright notice
Co-authored-by: Tor Brede Vekterli <vekterli@yahooinc.com>
|
| | |
|
|\ \
| | |
| | | |
Minor code health
|
| | | |
|
| |/
|/| |
|
|\ \
| |/
|/| |
table dfa
|
| | |
|
|\ \
| | |
| | | |
- Return early in doSeek if docId found.
|
| |/ |
|
|\ \
| | |
| | | |
- Use stash instead of the single use of VariableSizeVector.
|
| | | |
|
| |/ |
|
| | |
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
| |
Moving from an on-demand generation means that we only decode
(and possibly normalize) UTF-8 characters _once_ instead of
twice in the case of UTF-32 output or non-normalized input.
These changes also make it much more easy to add future support
for preserving a caller-supplied successor prefix, which would
be used for prefix-locked dictionary matching.
Note: this subtly changes the API to potentially _always_ mutate
the input successor string. The API documentation has been updated
to reflect this. No current users of the API should be affected.
|
| |
|
|\
| |
| |
| |
| | |
vespa-engine/geirst/fuzzy-matching-algorithm-query-property
Add query property to control fuzzy matching algorithm.
|
| | |
|
|/
|
|
|
|
|
|
|
| |
This allows for "hybrid" schemes where raw matching (without
successor generation) is done with a dedicated matcher implementation
that is faster for that particular purpose.
Also gives a much tighter loop for the match-only case and
removes some branches from the successor-emitting case.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Avoids need to encode UTF-32 characters to UTF-8 internally, as
this is likely to be subsequently reversed by a caller that itself
operates on UTF-32 code points.
Change the `match()` API by introducing a separate overload that
does not produce a successor, and add two explicit successor string
type overloads that take the string by ref, not pointer.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoids explicitly stepping the DFA states and handling each transition
character separately by detecting the case where the only way the
generated suffix can possibly match is for it to _exactly_ match the
remaining target string suffix.
This is the case when the sparse cost matrix row only has 1 column
remaining, and this column is equal to the max edit distance. I.e.
no more edits can possibly be done.
In local synthetic benchmarks this speeds up successor generation
4x when the target string is 64 characters.
|
|
|
|
| |
to make a new comparator which is used for lookup.
|
| |
|
|
|
|
|
| |
UniqueStoreStringComparator, EnumStoreComparator and
EnumStoreStringComparator.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds matching modes `Cased` and `Uncased`.
`Cased` requires UTF-32 code points to match exactly, and successor
strings are guaranteed to be strictly higher than the source (candidate)
string in `memcmp` order. This mirrors the behavior of the current
DFA implementation.
`Uncased` treats all characters as if they were lowercased, both
for the target and source strings. The target (query) string is
explicitly lowercased at DFA build-time to avoid duplicate work.
Source strings are implicitly lowercased character by character
on-demand during matching.
Important ordering note: Successor strings for `Uncased` are generated
_as if_ the source string was originally all in lowercase form.
This requires some extra added handling when emitting successor
prefixes, as we can't just blindly copy UTF-8 bytes from the source
string as we do when matching in `Cased` mode.
A new casing-dimension has been added to most parameterized unit
tests.
|