Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Fix CR comments | MariusArhaug | 2024-04-30 | 7 | -46/+34 |
| | |||||
* | Update significance model field and logic from architect meeting | MariusArhaug | 2024-04-24 | 11 | -117/+246 |
| | |||||
* | Merge pull request #30871 from vespa-engine/marius/add-significance-searcher | Marius Arhaug | 2024-04-24 | 4 | -11/+18 |
|\ | | | | | Add significance searcher | ||||
| * | update abi-spec | MariusArhaug | 2024-04-16 | 1 | -1/+1 |
| | | |||||
| * | fix cr failures | MariusArhaug | 2024-04-16 | 3 | -10/+17 |
| | | |||||
* | | Replace all usages of Arrays.asList with List.of where possible. | Henning Baldersheim | 2024-04-12 | 6 | -30/+25 |
| | | |||||
* | | Merge pull request #30809 from vespa-engine/jobergum/add-context-caching | Jo Kristian Bergum | 2024-04-10 | 2 | -9/+17 |
|\ \ | |/ |/| | Add onnx output caching to embedder (allow different post-processing of model outputs) | ||||
| * | Key by embedder id and don't recompute inputs | Jon Bratseth | 2024-04-07 | 2 | -10/+11 |
| | | |||||
| * | Add equivalent to `Map.computeIfAbsent()` to simplify typical usage of the cache | Bjørn Christian Seime | 2024-04-04 | 2 | -2/+9 |
| | | | | | | | | Current interface requires a lot of boilerplate code. | ||||
* | | Merge pull request #30816 from ↵ | Marius Arhaug | 2024-04-09 | 12 | -1/+385 |
|\ \ | | | | | | | | | | | | | vespa-engine/marius/add-significance-model-registry Add significance model registry to linguistics | ||||
| * | | add missing beta annotation | MariusArhaug | 2024-04-09 | 1 | -0/+4 |
| | | | |||||
| * | | add illegal arg exception for languages not registered | MariusArhaug | 2024-04-09 | 2 | -1/+8 |
| | | | |||||
| * | | fix cr failures | MariusArhaug | 2024-04-09 | 12 | -52/+104 |
| | | | |||||
| * | | add significance model registry to linguistics | MariusArhaug | 2024-04-04 | 10 | -1/+322 |
| |/ | |||||
* | | add comment for intention in determineScript function | MariusArhaug | 2024-04-04 | 1 | -0/+1 |
| | | |||||
* | | Add SimpleTokenScript to SimpleTokenizer | MariusArhaug | 2024-04-03 | 4 | -1/+124 |
|/ | | | | | | | | When parsing datasets such as WikiDumps to a significance model, we want to only keep characters of that language script within our model. So when adding the script value to our tokenizer we are able to use this to filter out non-latin words when creating an english significance model for example. | ||||
* | Expose cache to embedders | Jon Bratseth | 2024-04-01 | 2 | -1/+27 |
| | |||||
* | Update ABI spec | Jon Bratseth | 2024-02-16 | 1 | -3/+1 |
| | |||||
* | Pass context when resolving properties | Jon Bratseth | 2024-02-15 | 1 | -9/+0 |
| | |||||
* | ChainedMap can't be copied | Jon Bratseth | 2024-01-20 | 1 | -1/+1 |
| | |||||
* | Revert "Merge pull request #29905 from ↵ | Jon Bratseth | 2024-01-20 | 2 | -1/+13 |
| | | | | | | | vespa-engine/revert-29884-bratseth/param-refs-in-embed" This reverts commit c6b547c0c2898a324983356aa677ea3082533f7d, reversing changes made to 8c7f8c17ad5e1de5adcc71ee34f2a3c1cd36d6bd. | ||||
* | Revert "Support parameter references in embed" | Henning Baldersheim | 2024-01-15 | 2 | -13/+1 |
| | |||||
* | Support parameter references in embed | Jon Bratseth | 2024-01-12 | 2 | -1/+13 |
| | | | | Support embed(@myParameter) in addition to embed('text to embed') | ||||
* | Revert "Merge pull request #29328 from ↵ | Jon Bratseth | 2023-11-14 | 4 | -13/+30 |
| | | | | | | | vespa-engine/revert-29314-bratseth/casing-take-2" This reverts commit a72e949533a46d665440a9c72ca2b8fb58f3a9c3, reversing changes made to 944d635d00e165166508ef23399e9ed65a87a9c8. | ||||
* | Revert "Bratseth/casing take 2" | Harald Musum | 2023-11-13 | 4 | -30/+13 |
| | |||||
* | Prefer first stem to original if non equal | Jon Bratseth | 2023-11-10 | 2 | -11/+28 |
| | |||||
* | Revert "Revert "Don't lowercase linguistics annotations"" | Jon Bratseth | 2023-11-09 | 2 | -2/+2 |
| | | | | This reverts commit 0dfd4fe4c6ddbded490da36e71f27c4b70aa4226. | ||||
* | Revert "Don't lowercase linguistics annotations" | Jon Bratseth | 2023-11-09 | 2 | -2/+2 |
| | |||||
* | Don't lowercase linguistics annotations | Jon Bratseth | 2023-11-09 | 2 | -2/+2 |
| | | | | | | Tokens are already lowercased by our bundled linguistics components. Lowercasing again when annotating precludes plugging in a lingustics component which preserves casing. | ||||
* | Avoid cutting surrogate pairs when tokenising | jonmv | 2023-10-20 | 1 | -1/+1 |
| | |||||
* | Update copyright | Jon Bratseth | 2023-10-09 | 73 | -73/+73 |
| | |||||
* | Use Guice 6.0 | Bjørn Christian Seime | 2023-09-04 | 1 | -1/+1 |
| | | | | | | https://github.com/google/guice/wiki/Guice600 We cannot upgrade to 7.x as we export javax.inject from container. 6.x supports both the old javax.inject and the new jakarta.inject replacement. | ||||
* | Allow sampling of fractional millis | Bjørn Christian Seime | 2023-08-25 | 2 | -4/+3 |
| | |||||
* | Add generic metrics for embedders | Bjørn Christian Seime | 2023-08-04 | 2 | -1/+56 |
| | |||||
* | Add necessary options to use failOnWarnings | gjoranv | 2023-06-05 | 1 | -0/+1 |
| | |||||
* | Don't remove indexable symbols when stemming | Jon Bratseth | 2023-06-02 | 5 | -8/+17 |
| | |||||
* | Add bundle type to all CORE bundles. | gjoranv | 2023-05-25 | 1 | -0/+3 |
| | |||||
* | Update ABI spec | Jon Bratseth | 2023-05-22 | 1 | -0/+1 |
| | |||||
* | Always treat each symbol as a separate token | Jon Bratseth | 2023-05-22 | 4 | -20/+56 |
| | |||||
* | Threat 'other symbols' as letters | Jon Bratseth | 2023-05-22 | 2 | -2/+10 |
| | | | | | The unicode class 'other symbols' contains emojis, math symbols, etc. Treat these as letter characters to support searching for them. | ||||
* | Use dollar and hour base units | Jon Bratseth | 2023-05-19 | 1 | -2/+2 |
| | |||||
* | Use metric enums everywhere | Jon Bratseth | 2023-03-06 | 1 | -1/+1 |
| | |||||
* | Add abi spec | Lester Solbakken | 2023-02-10 | 1 | -0/+1 |
| | |||||
* | Add decoding of sentencepiece token sequence to text | Lester Solbakken | 2023-02-10 | 1 | -0/+11 |
| | |||||
* | Compute code points in whole string only when needed | jonmv | 2022-12-06 | 2 | -6/+17 |
| | |||||
* | Split out opennlp-linguistics | Henning Baldersheim | 2022-11-26 | 14 | -783/+0 |
| | |||||
* | Update ABI spec format, and update all specs | jonmv | 2022-10-25 | 1 | -198/+198 |
| | |||||
* | much simpler CharSequenceNormalizer | Arne Juul | 2022-10-06 | 3 | -9/+100 |
| | |||||
* | Merge pull request #24007 from vespa-engine/bratseth/cleanup-082 | Jon Bratseth | 2022-09-25 | 2 | -13/+11 |
|\ | | | | | No functional changes | ||||
| * | No functional changes | Jon Bratseth | 2022-09-11 | 2 | -13/+11 |
| | |