aboutsummaryrefslogtreecommitdiffstats
path: root/linguistics/src/main
Commit message (Expand)AuthorAgeFilesLines
* Merge pull request #30871 from vespa-engine/marius/add-significance-searcherMarius Arhaug2024-04-242-4/+7
|\
| * fix cr failuresMariusArhaug2024-04-162-4/+7
* | Merge pull request #30809 from vespa-engine/jobergum/add-context-cachingJo Kristian Bergum2024-04-101-7/+14
|\ \ | |/ |/|
| * Key by embedder id and don't recompute inputsJon Bratseth2024-04-071-7/+8
| * Add equivalent to `Map.computeIfAbsent()` to simplify typical usage of the cacheBjørn Christian Seime2024-04-041-1/+7
* | Merge pull request #30816 from vespa-engine/marius/add-significance-model-reg...Marius Arhaug2024-04-096-0/+211
|\ \
| * | add missing beta annotationMariusArhaug2024-04-091-0/+4
| * | add illegal arg exception for languages not registeredMariusArhaug2024-04-091-1/+5
| * | fix cr failuresMariusArhaug2024-04-096-38/+35
| * | add significance model registry to linguisticsMariusArhaug2024-04-045-0/+206
| |/
* | add comment for intention in determineScript functionMariusArhaug2024-04-041-0/+1
* | Add SimpleTokenScript to SimpleTokenizerMariusArhaug2024-04-032-1/+85
|/
* Expose cache to embeddersJon Bratseth2024-04-011-0/+23
* Pass context when resolving propertiesJon Bratseth2024-02-151-9/+0
* ChainedMap can't be copiedJon Bratseth2024-01-201-1/+1
* Revert "Merge pull request #29905 from vespa-engine/revert-29884-bratseth/par...Jon Bratseth2024-01-201-0/+10
* Revert "Support parameter references in embed"Henning Baldersheim2024-01-151-10/+0
* Support parameter references in embedJon Bratseth2024-01-121-0/+10
* Revert "Merge pull request #29328 from vespa-engine/revert-29314-bratseth/cas...Jon Bratseth2023-11-144-13/+30
* Revert "Bratseth/casing take 2"Harald Musum2023-11-134-30/+13
* Prefer first stem to original if non equalJon Bratseth2023-11-102-11/+28
* Revert "Revert "Don't lowercase linguistics annotations""Jon Bratseth2023-11-092-2/+2
* Revert "Don't lowercase linguistics annotations"Jon Bratseth2023-11-092-2/+2
* Don't lowercase linguistics annotationsJon Bratseth2023-11-092-2/+2
* Avoid cutting surrogate pairs when tokenisingjonmv2023-10-201-1/+1
* Update copyrightJon Bratseth2023-10-0951-51/+51
* Allow sampling of fractional millisBjørn Christian Seime2023-08-251-3/+2
* Add generic metrics for embeddersBjørn Christian Seime2023-08-041-0/+37
* Don't remove indexable symbols when stemmingJon Bratseth2023-06-023-6/+14
* Always treat each symbol as a separate tokenJon Bratseth2023-05-222-17/+31
* Threat 'other symbols' as lettersJon Bratseth2023-05-221-2/+2
* Use dollar and hour base unitsJon Bratseth2023-05-191-2/+2
* Use metric enums everywhereJon Bratseth2023-03-061-1/+1
* Add decoding of sentencepiece token sequence to textLester Solbakken2023-02-101-0/+11
* Compute code points in whole string only when neededjonmv2022-12-061-5/+3
* Split out opennlp-linguisticsHenning Baldersheim2022-11-2610-414/+0
* much simpler CharSequenceNormalizerArne Juul2022-10-063-9/+100
* Merge pull request #24007 from vespa-engine/bratseth/cleanup-082Jon Bratseth2022-09-251-11/+9
|\
| * No functional changesJon Bratseth2022-09-111-11/+9
* | Make validation messages clearer given multiple instancesJon Bratseth2022-09-151-2/+0
|/
* Determine token types considering all charactersJon Bratseth2022-08-164-108/+89
* Remove on Vespa 8Jon Bratseth2022-06-081-8/+0
* Use '@Inject' from 'annotations' in multiple bundlesBjørn Christian Seime2022-05-062-2/+2
* Resolve rank profile inputsJon Bratseth2022-04-211-1/+1
* Rename defaultEmbedderName to defaultEmbedderIdLester Solbakken2022-03-221-2/+2
* Add convenience function to represent embedder as mapLester Solbakken2022-03-211-3/+26
* Stem by linguistics in rule basesJon Bratseth2022-01-101-3/+20
* annotate intentional switch fallthroughArne H Juul2022-01-061-0/+1
* Specify how the class is actually loadedJon Marius Venstad2021-12-211-1/+1
* Provide array of correct size.Jon Marius Venstad2021-12-201-1/+1