Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Allow sampling of fractional millis | Bjørn Christian Seime | 2023-08-25 | 1 | -3/+2 |
| | |||||
* | Add generic metrics for embedders | Bjørn Christian Seime | 2023-08-04 | 1 | -0/+37 |
| | |||||
* | Don't remove indexable symbols when stemming | Jon Bratseth | 2023-06-02 | 4 | -8/+16 |
| | |||||
* | Always treat each symbol as a separate token | Jon Bratseth | 2023-05-22 | 4 | -20/+56 |
| | |||||
* | Threat 'other symbols' as letters | Jon Bratseth | 2023-05-22 | 2 | -2/+10 |
| | | | | | The unicode class 'other symbols' contains emojis, math symbols, etc. Treat these as letter characters to support searching for them. | ||||
* | Use dollar and hour base units | Jon Bratseth | 2023-05-19 | 1 | -2/+2 |
| | |||||
* | Use metric enums everywhere | Jon Bratseth | 2023-03-06 | 1 | -1/+1 |
| | |||||
* | Add decoding of sentencepiece token sequence to text | Lester Solbakken | 2023-02-10 | 1 | -0/+11 |
| | |||||
* | Compute code points in whole string only when needed | jonmv | 2022-12-06 | 2 | -6/+17 |
| | |||||
* | Split out opennlp-linguistics | Henning Baldersheim | 2022-11-26 | 13 | -779/+0 |
| | |||||
* | much simpler CharSequenceNormalizer | Arne Juul | 2022-10-06 | 3 | -9/+100 |
| | |||||
* | Merge pull request #24007 from vespa-engine/bratseth/cleanup-082 | Jon Bratseth | 2022-09-25 | 2 | -13/+11 |
|\ | | | | | No functional changes | ||||
| * | No functional changes | Jon Bratseth | 2022-09-11 | 2 | -13/+11 |
| | | |||||
* | | Make validation messages clearer given multiple instances | Jon Bratseth | 2022-09-15 | 1 | -2/+0 |
|/ | |||||
* | Determine token types considering all characters | Jon Bratseth | 2022-08-16 | 6 | -119/+133 |
| | |||||
* | Remove on Vespa 8 | Jon Bratseth | 2022-06-08 | 1 | -8/+0 |
| | |||||
* | Use '@Inject' from 'annotations' in multiple bundles | Bjørn Christian Seime | 2022-05-06 | 2 | -2/+2 |
| | |||||
* | Resolve rank profile inputs | Jon Bratseth | 2022-04-21 | 1 | -1/+1 |
| | |||||
* | Rename defaultEmbedderName to defaultEmbedderId | Lester Solbakken | 2022-03-22 | 1 | -2/+2 |
| | |||||
* | Add convenience function to represent embedder as map | Lester Solbakken | 2022-03-21 | 1 | -3/+26 |
| | |||||
* | Stem by linguistics in rule bases | Jon Bratseth | 2022-01-10 | 1 | -3/+20 |
| | | | | Also add a @language directive to stem in other languages than english. | ||||
* | annotate intentional switch fallthrough | Arne H Juul | 2022-01-06 | 1 | -0/+1 |
| | |||||
* | Specify how the class is actually loaded | Jon Marius Venstad | 2021-12-21 | 1 | -1/+1 |
| | |||||
* | Provide array of correct size. | Jon Marius Venstad | 2021-12-20 | 1 | -1/+1 |
| | |||||
* | Override ngram creation with something less silly | Jon Marius Venstad | 2021-12-20 | 2 | -1/+32 |
| | |||||
* | Use smaller chunks for faster detection | Jon Marius Venstad | 2021-12-20 | 1 | -2/+2 |
| | |||||
* | Expand test case for language detection | Jon Marius Venstad | 2021-12-20 | 1 | -3/+28 |
| | |||||
* | Upper bound on input size, and use opennlp before simple detector | Jon Marius Venstad | 2021-12-20 | 1 | -6/+3 |
| | |||||
* | Avoid putting nulls in languange map | Jon Marius Venstad | 2021-12-20 | 1 | -2/+5 |
| | |||||
* | Revert "Merge pull request #20578 from ↵ | Jon Marius Venstad | 2021-12-20 | 12 | -172/+245 |
| | | | | | | | vespa-engine/revert-20568-jonmv/replace-optimaize-with-lingua" This reverts commit 5476504932cd90eb2dad82dbab633e3ffa2034c3, reversing changes made to 235a78cc4707f78d18c6818a577de1b7507f5e40. | ||||
* | Revert "Replace optimaize with OpenNLP language detector [run-systemtest]" | Jon Marius Venstad | 2021-12-18 | 12 | -245/+172 |
| | |||||
* | Re-add files | Jon Marius Venstad | 2021-12-18 | 5 | -0/+142 |
| | |||||
* | Move model to module where it is needed, to simplify, at the cost of larger ↵ | Jon Marius Venstad | 2021-12-18 | 3 | -22/+21 |
| | | | | bundles | ||||
* | Replace UrlcharSequenceNormalizer with one with an improved regex | Jon Marius Venstad | 2021-12-17 | 1 | -6/+0 |
| | |||||
* | Add some javadoc, and no need to handle null return for model | Jon Marius Venstad | 2021-12-17 | 2 | -2/+4 |
| | |||||
* | Replace optimaize with OpenNLP language detector | Jon Marius Venstad | 2021-12-17 | 7 | -166/+102 |
| | |||||
* | Add a BERT embedder | Jon Bratseth | 2021-12-16 | 1 | -2/+3 |
| | |||||
* | Time out requests after 200s | Jon Marius Venstad | 2021-12-13 | 1 | -1/+0 |
| | |||||
* | Update 2020 Oath copyrights. | gjoranv | 2021-10-27 | 2 | -2/+2 |
| | |||||
* | Update Verizon Media copyright notices. | gjoranv | 2021-10-07 | 3 | -3/+3 |
| | |||||
* | Update 2018 copyright notices. | gjoranv | 2021-10-07 | 3 | -3/+3 |
| | |||||
* | Update 2017 copyright notices. | gjoranv | 2021-10-07 | 69 | -69/+69 |
| | |||||
* | Encapsulate in a context | Jon Bratseth | 2021-10-01 | 1 | -12/+46 |
| | |||||
* | Pass destination | Jon Bratseth | 2021-09-30 | 1 | -4/+10 |
| | | | | | This allows embedders to switch on it to enable bucket testing and similar. | ||||
* | encode -> embed | Jon Bratseth | 2021-09-28 | 2 | -56/+56 |
| | |||||
* | Separate component from linguistics | Jon Bratseth | 2021-09-25 | 15 | -1015/+0 |
| | |||||
* | Linguistics cleanup | Jon Bratseth | 2021-09-21 | 17 | -34/+29 |
| | |||||
* | Add 'encode' expression | Jon Bratseth | 2021-09-19 | 1 | -0/+17 |
| | |||||
* | Provide a (non-working) encoder by default | Jon Bratseth | 2021-09-17 | 1 | -1/+1 |
| | |||||
* | Cleanup | Jon Bratseth | 2021-09-17 | 5 | -9/+2 |
| |