summaryrefslogtreecommitdiffstats
path: root/streamingvisitors
Commit message (Collapse)AuthorAgeFilesLines
* Treat regex and fuzzy whole-field matching as 1 logical wordTor Brede Vekterli2024-01-222-2/+18
| | | | | | We have concluded that this is the most semantically correct way of reporting the count, and as a bonus it avoids having to do a separate pass over the string buffer.
* Adjust search::streaming::Hit to better matchTor Egge2024-01-225-6/+7
| | | | search::fef::TermFieldMatchDataPosition.
* Merge pull request #29969 from ↵Tor Brede Vekterli2024-01-196-33/+215
|\ | | | | | | | | vespa-engine/vekterli/support-fuzzy-matching-in-streaming-search Support fuzzy term matching in streaming search
| * Support fuzzy term matching in streaming searchTor Brede Vekterli2024-01-186-33/+215
| | | | | | | | | | | | | | | | | | | | Uses a DFA-based matcher for max edits in {1, 2} and falls back to the legacy non-DFA matcher for all other values (including 0). Currently only supports fuzzy matching across the full field string, i.e. there's no implicit tokenization or whitespace removal. This matches the semantics we currently have for fuzzy search over attributes in a non-streaming case
* | Rename search::streaming::Hit member function context() to field_id().Tor Egge2024-01-181-1/+1
|/
* refactor for re-useArne Juul2024-01-172-16/+36
|
* Propagate normalizing mode and max field length to new searcherTor Brede Vekterli2024-01-163-5/+25
| | | | | Needed to avoid default normalizing mode/max field length being used in the reconfigured searcher instance.
* Merge pull request #29913 from ↵Henning Baldersheim2024-01-165-9/+75
|\ | | | | | | | | vespa-engine/vekterli/streaming-search-regex-support Add regular expression support to streaming search
| * Add regular expression support to streaming searchTor Brede Vekterli2024-01-155-9/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduces an explicit regex query term node (which wraps an RE2 regex instance internally) and extends the existing UTF-8 flexible string searcher to use this query node. Regex matching is optionally case (in)sensitive depending on the normalization mode used. Note on `searcher/searcher_test.cpp`: this adds a magic sentinel `#` char prefix to query term parsing in the test to let a query term be interpreted as a regex rather than exact/prefix/suffix/substring match.
* | Support matched-elements-only for WeightedSetTerm.Tor Egge2024-01-151-0/+9
|/
* Just use normalize_mode directly from searcher.Henning Baldersheim2024-01-122-5/+3
|
* Also handle different normalization during query time.Henning Baldersheim2024-01-123-14/+21
|
* Revert "Revert "Balder/unify attributes over streaming indexed""Henning Baldersheim2024-01-124-6/+5
|
* Revert "Balder/unify attributes over streaming indexed"Henning Baldersheim2024-01-124-5/+6
|
* local includeHenning Baldersheim2024-01-114-6/+5
|
* Add brief class documentation.Henning Baldersheim2024-01-111-0/+4
|
* Split out tokenizer and test it explicit.Henning Baldersheim2024-01-118-57/+96
|
* Use the normalize_mode config.Henning Baldersheim2024-01-108-59/+45
|
* Simplify ancient carefully hand optimized code in favour of simple readable codeHenning Baldersheim2024-01-1012-186/+180
|
* Code cleanupHenning Baldersheim2024-01-1024-90/+81
|
* - Fold query for streaming search based on either query item type, or field ↵Henning Baldersheim2024-01-056-14/+43
| | | | | | | | definition. - This ensures that query processing and document processing is symmetric for streaming search. No longer rely on java query processing being symmetric with backend c++ variant. - Indexed search does no normalization in backend and uses query as is.
* GC unused data members.Henning Baldersheim2024-01-042-56/+31
|
* - Modernize codeHenning Baldersheim2024-01-047-155/+100
| | | | - Unify some conversion tables.
* - Must resolve index and check all fields if any require text matching.Henning Baldersheim2024-01-034-115/+90
| | | | | | - Make methods const if possible. - Return results instead of modifying a reference. - Varoius code unification.
* Revert "Revert "Balder/only rewrite numeric terms for text fields""Henning Baldersheim2024-01-035-15/+48
|
* Revert "Balder/only rewrite numeric terms for text fields"Henning Baldersheim2024-01-035-48/+15
|
* Only rewrite numeric terms when searching text fields.Henning Baldersheim2024-01-025-15/+48
|
* - Avoid inefficient generic template.Henning Baldersheim2023-12-291-1/+1
| | | | - Add explicit implementations for the types needed.
* - Separate methods for lowercasing, and lowercasing and folding.Henning Baldersheim2023-12-211-13/+13
| | | | | - Hide implementations and use accessors. - Minor code cleanup.
* Add MultiTerm and InTerm for streaming search.Tor Egge2023-12-072-2/+19
|
* Use emplace_back.Tor Egge2023-12-044-4/+4
|
* Use templated getRange() member function to get range.Tor Egge2023-12-042-8/+4
|
* Don't switch lower and upper bound.Tor Egge2023-12-042-2/+2
|
* Standard plural of leaf is leaves.Tor Egge2023-11-305-10/+10
|
* Add linguistics tokens document field writer.Tor Egge2023-10-161-0/+7
|
* Enable passing search::docsummary::IStringFieldConverter pointer toTor Egge2023-10-122-7/+9
| | | | | search::docsummary::IDocsumStoreDocument::insert_summary_field member function.
* Update copyrightJon Bratseth2023-10-09124-122/+124
|
* Use "_test" suffix for unit test cpp files.Geir Storli2023-08-308-4/+4
|
* Use uint32_t as ucs4_tHenning Baldersheim2023-07-251-1/+1
|
* Use WordFolder as helper instead of inheriting static stuff.Henning Baldersheim2023-07-259-29/+31
|
* Unpack interleaved features for streaming search.Tor Egge2023-07-193-3/+78
|
* Modernize C++ code with auto and range-based loops.Geir Storli2023-07-069-37/+32
|
* Handle sorting on multivalue attributes.Tor Egge2023-07-041-14/+10
|
* Add flag for controling nested multivalue grouping.Henning Baldersheim2023-06-281-1/+1
|
* Setup distance metrics for streaming search.Tor Egge2023-06-054-4/+27
| | | | Add range checks when converting to internal distance threshold.
* Use DistanceMetricUtils for converting string value to distance metric.Tor Egge2023-05-241-14/+14
|
* GC unused assert includesHenning Baldersheim2023-05-171-0/+1
|
* Remove unused field/attribute access hinting.Tor Egge2023-05-132-8/+0
|
* Add attribute access recorder for streaming search mode. Use it toTor Egge2023-05-1210-37/+121
| | | | determine which attributes to populate during a streaming search.
* Setup ranking assets repo for streaming search.Tor Egge2023-05-106-40/+124
|