aboutsummaryrefslogtreecommitdiffstats
path: root/linguistics/src
Commit message (Expand)AuthorAgeFilesLines
* Merge pull request #31194 from vespa-engine/bratseth/stemming-traceJon Bratseth2024-05-161-1/+1
|\
| * Trace no stemming due to language=UNKNOWNJon Bratseth2024-05-121-1/+1
* | Merge pull request #31098 from vespa-engine/marius/add-significance-model-toolBjørn Christian Seime2024-05-152-4/+7
|\ \
| * | Add significance model generator cliMariusArhaug2024-05-142-4/+7
| |/
* | Update linguistics/src/main/java/com/yahoo/language/process/Segmenter.javaJon Bratseth2024-05-151-1/+1
* | Improve javadocJon Bratseth2024-05-141-9/+10
|/
* Fix CR commentsMariusArhaug2024-04-307-46/+34
* Update significance model field and logic from architect meetingMariusArhaug2024-04-2410-116/+244
* Merge pull request #30871 from vespa-engine/marius/add-significance-searcherMarius Arhaug2024-04-243-10/+17
|\
| * fix cr failuresMariusArhaug2024-04-163-10/+17
* | Replace all usages of Arrays.asList with List.of where possible.Henning Baldersheim2024-04-126-30/+25
* | Merge pull request #30809 from vespa-engine/jobergum/add-context-cachingJo Kristian Bergum2024-04-101-7/+14
|\ \ | |/ |/|
| * Key by embedder id and don't recompute inputsJon Bratseth2024-04-071-7/+8
| * Add equivalent to `Map.computeIfAbsent()` to simplify typical usage of the cacheBjørn Christian Seime2024-04-041-1/+7
* | Merge pull request #30816 from vespa-engine/marius/add-significance-model-reg...Marius Arhaug2024-04-0910-0/+321
|\ \
| * | add missing beta annotationMariusArhaug2024-04-091-0/+4
| * | add illegal arg exception for languages not registeredMariusArhaug2024-04-092-1/+8
| * | fix cr failuresMariusArhaug2024-04-0910-48/+52
| * | add significance model registry to linguisticsMariusArhaug2024-04-049-0/+306
| |/
* | add comment for intention in determineScript functionMariusArhaug2024-04-041-0/+1
* | Add SimpleTokenScript to SimpleTokenizerMariusArhaug2024-04-034-1/+124
|/
* Expose cache to embeddersJon Bratseth2024-04-011-0/+23
* Pass context when resolving propertiesJon Bratseth2024-02-151-9/+0
* ChainedMap can't be copiedJon Bratseth2024-01-201-1/+1
* Revert "Merge pull request #29905 from vespa-engine/revert-29884-bratseth/par...Jon Bratseth2024-01-201-0/+10
* Revert "Support parameter references in embed"Henning Baldersheim2024-01-151-10/+0
* Support parameter references in embedJon Bratseth2024-01-121-0/+10
* Revert "Merge pull request #29328 from vespa-engine/revert-29314-bratseth/cas...Jon Bratseth2023-11-144-13/+30
* Revert "Bratseth/casing take 2"Harald Musum2023-11-134-30/+13
* Prefer first stem to original if non equalJon Bratseth2023-11-102-11/+28
* Revert "Revert "Don't lowercase linguistics annotations""Jon Bratseth2023-11-092-2/+2
* Revert "Don't lowercase linguistics annotations"Jon Bratseth2023-11-092-2/+2
* Don't lowercase linguistics annotationsJon Bratseth2023-11-092-2/+2
* Avoid cutting surrogate pairs when tokenisingjonmv2023-10-201-1/+1
* Update copyrightJon Bratseth2023-10-0971-71/+71
* Allow sampling of fractional millisBjørn Christian Seime2023-08-251-3/+2
* Add generic metrics for embeddersBjørn Christian Seime2023-08-041-0/+37
* Don't remove indexable symbols when stemmingJon Bratseth2023-06-024-8/+16
* Always treat each symbol as a separate tokenJon Bratseth2023-05-224-20/+56
* Threat 'other symbols' as lettersJon Bratseth2023-05-222-2/+10
* Use dollar and hour base unitsJon Bratseth2023-05-191-2/+2
* Use metric enums everywhereJon Bratseth2023-03-061-1/+1
* Add decoding of sentencepiece token sequence to textLester Solbakken2023-02-101-0/+11
* Compute code points in whole string only when neededjonmv2022-12-062-6/+17
* Split out opennlp-linguisticsHenning Baldersheim2022-11-2613-779/+0
* much simpler CharSequenceNormalizerArne Juul2022-10-063-9/+100
* Merge pull request #24007 from vespa-engine/bratseth/cleanup-082Jon Bratseth2022-09-252-13/+11
|\
| * No functional changesJon Bratseth2022-09-112-13/+11
* | Make validation messages clearer given multiple instancesJon Bratseth2022-09-151-2/+0
|/
* Determine token types considering all charactersJon Bratseth2022-08-166-119/+133