summaryrefslogtreecommitdiffstats
path: root/indexinglanguage
Commit message (Collapse)AuthorAgeFilesLines
* Enable setting max-token-length in field match.Tor Egge2024-05-079-14/+108
|
* Rename max token length to max tokenize length in linguistics annotatorTor Egge2024-05-064-5/+5
| | | | config.
* Replace all usages of Arrays.asList with List.of where possible.Henning Baldersheim2024-04-1211-30/+27
|
* Unify on List.ofHenning Baldersheim2024-04-111-2/+1
|
* Fix failing annotator test and singletonMap => Map.ofHenning Baldersheim2024-04-115-41/+43
|
* Unify on Map.ofHenning Baldersheim2024-04-111-2/+2
|
* Merge pull request #30809 from vespa-engine/jobergum/add-context-cachingJo Kristian Bergum2024-04-101-3/+3
|\ | | | | Add onnx output caching to embedder (allow different post-processing of model outputs)
| * Key by embedder id and don't recompute inputsJon Bratseth2024-04-071-3/+3
| |
* | - No need for final on locals and params.Henning Baldersheim2024-04-041-2/+2
| | | | | | | | | | - Log with lambda to avoid explicit guard. - Log old localMaxThroughput, not new value twice.
* | Add 'sleep' and 'busy_wait' as index expressions for testing of feed throttling.Henning Baldersheim2024-04-035-11/+118
|/
* Expose cache to embeddersJon Bratseth2024-04-013-7/+13
|
* Support values cached during execution of a scriptJon Bratseth2024-03-312-0/+76
|
* Cleanup onlyJon Bratseth2024-03-312-23/+16
|
* for_each always needs its inputArne Juul2024-03-191-1/+1
|
* fill variable types of VerificationContextArne Juul2024-03-193-0/+8
|
* Attempt of supporting mapping array to mapped 2d tensor for sparse modelsJo Kristian Bergum2024-03-152-3/+109
|
* preserve value typeJo Kristian Bergum2024-02-121-1/+1
|
* All embedders are the sameJon Bratseth2024-02-091-6/+2
| | | | | This is to avoid a validation override from changed indexing expression when embedder details are changed.
* Pass embedder argumentsJon Bratseth2024-02-082-4/+12
|
* Support embedding into rank 3 tensorsJon Bratseth2024-02-023-65/+270
|
* - Use numericLabel over label for address manipulation.Henning Baldersheim2024-02-011-1/+1
| | | | - Only use label when actual string representation is needed.
* - Require non-null inner expression in foreach.Henning Baldersheim2024-01-131-2/+5
| | | | - Let null conversion buble up.
* Revert "Revert "Drop tokenize expressions from ilscript for streaming mode.""Henning Baldersheim2024-01-121-2/+1
|
* Revert "Drop tokenize expressions from ilscript for streaming mode."Henning Baldersheim2024-01-121-1/+2
|
* Drop tokenize expressions from ilscript for streaming mode.Henning Baldersheim2024-01-121-2/+1
|
* Merge pull request #29667 from vespa-engine/jobergum/splade-embedderJo Kristian Bergum2024-01-042-11/+66
|\ | | | | Add SPLADE embedder
| * Add test coverage of mapped tensor in indexing embedJo Kristian Bergum2023-12-191-6/+61
| |
| * Allow mapped 1d tensor for embed expressionsJo Kristian Bergum2023-12-171-5/+5
| |
* | Enable setting max-occurrences in field match.Tor Egge2024-01-044-0/+20
|/
* If we index the original in addition to stems, lowercase itJon Bratseth2023-11-202-5/+6
|
* Revert "Merge pull request #29328 from ↵Jon Bratseth2023-11-145-68/+83
| | | | | | | vespa-engine/revert-29314-bratseth/casing-take-2" This reverts commit a72e949533a46d665440a9c72ca2b8fb58f3a9c3, reversing changes made to 944d635d00e165166508ef23399e9ed65a87a9c8.
* Revert "Bratseth/casing take 2"Harald Musum2023-11-135-83/+68
|
* CleanupJon Bratseth2023-11-102-5/+1
|
* Prefer first stem to original if non equalJon Bratseth2023-11-103-47/+65
|
* Revert "Revert "Don't lowercase linguistics annotations""Jon Bratseth2023-11-094-23/+24
| | | | This reverts commit 0dfd4fe4c6ddbded490da36e71f27c4b70aa4226.
* Revert "Don't lowercase linguistics annotations"Jon Bratseth2023-11-094-24/+23
|
* Test that casing is preservedJon Bratseth2023-11-091-3/+3
|
* Don't lowercase linguistics annotationsJon Bratseth2023-11-094-21/+22
| | | | | | Tokens are already lowercased by our bundled linguistics components. Lowercasing again when annotating precludes plugging in a lingustics component which preserves casing.
* Take config to mean number of code points in match max lengthjonmv2023-10-201-1/+1
|
* Avoid cutting surrogate pairs when tokenisingjonmv2023-10-201-1/+2
|
* Non-functional changes onlyJon Bratseth2023-10-181-3/+3
|
* Update copyrightJon Bratseth2023-10-09196-195/+196
|
* Repair parserJon Bratseth2023-09-271-1/+0
|
* Return the expected outputJon Bratseth2023-09-2789-372/+369
| | | | | | | | | | | In if-else expressions, return the output of the executed branch rather than the input. The current behavior was undocumented and quite unexpected, so I suggest we treat that as a bug. Also return the last executed expression in a script as its output (rather than nothing. In addition, improve some error messages.
* - Add utility to do substring extraction by codepoints, instead of java ↵Henning Baldersheim2023-09-152-12/+6
| | | | | | char index. - Test and use it in SubstringExpression in indeing language.
* Merge pull request #27969 from vespa-engine/bjorncs/embedder-metricsJon Bratseth2023-08-311-1/+1
|\ | | | | Add generic metrics for embedders
| * Add generic metrics for embeddersBjørn Christian Seime2023-08-041-1/+1
| |
* | remove test duplicateJo Kristian Bergum2023-08-161-6/+0
| |
* | Add support for converting iso-8601 date strings to epoch timeJo Kristian Bergum2023-08-163-0/+116
|/
* Resolve parent before childrenJon Bratseth2023-04-143-2/+27
|