aboutsummaryrefslogtreecommitdiffstats
path: root/linguistics
Commit message (Collapse)AuthorAgeFilesLines
* Update ABI specJon Bratseth2024-02-161-3/+1
|
* Pass context when resolving propertiesJon Bratseth2024-02-151-9/+0
|
* ChainedMap can't be copiedJon Bratseth2024-01-201-1/+1
|
* Revert "Merge pull request #29905 from ↵Jon Bratseth2024-01-202-1/+13
| | | | | | | vespa-engine/revert-29884-bratseth/param-refs-in-embed" This reverts commit c6b547c0c2898a324983356aa677ea3082533f7d, reversing changes made to 8c7f8c17ad5e1de5adcc71ee34f2a3c1cd36d6bd.
* Revert "Support parameter references in embed"Henning Baldersheim2024-01-152-13/+1
|
* Support parameter references in embedJon Bratseth2024-01-122-1/+13
| | | | Support embed(@myParameter) in addition to embed('text to embed')
* Revert "Merge pull request #29328 from ↵Jon Bratseth2023-11-144-13/+30
| | | | | | | vespa-engine/revert-29314-bratseth/casing-take-2" This reverts commit a72e949533a46d665440a9c72ca2b8fb58f3a9c3, reversing changes made to 944d635d00e165166508ef23399e9ed65a87a9c8.
* Revert "Bratseth/casing take 2"Harald Musum2023-11-134-30/+13
|
* Prefer first stem to original if non equalJon Bratseth2023-11-102-11/+28
|
* Revert "Revert "Don't lowercase linguistics annotations""Jon Bratseth2023-11-092-2/+2
| | | | This reverts commit 0dfd4fe4c6ddbded490da36e71f27c4b70aa4226.
* Revert "Don't lowercase linguistics annotations"Jon Bratseth2023-11-092-2/+2
|
* Don't lowercase linguistics annotationsJon Bratseth2023-11-092-2/+2
| | | | | | Tokens are already lowercased by our bundled linguistics components. Lowercasing again when annotating precludes plugging in a lingustics component which preserves casing.
* Avoid cutting surrogate pairs when tokenisingjonmv2023-10-201-1/+1
|
* Update copyrightJon Bratseth2023-10-0973-73/+73
|
* Use Guice 6.0Bjørn Christian Seime2023-09-041-1/+1
| | | | | | https://github.com/google/guice/wiki/Guice600 We cannot upgrade to 7.x as we export javax.inject from container. 6.x supports both the old javax.inject and the new jakarta.inject replacement.
* Allow sampling of fractional millisBjørn Christian Seime2023-08-252-4/+3
|
* Add generic metrics for embeddersBjørn Christian Seime2023-08-042-1/+56
|
* Add necessary options to use failOnWarningsgjoranv2023-06-051-0/+1
|
* Don't remove indexable symbols when stemmingJon Bratseth2023-06-025-8/+17
|
* Add bundle type to all CORE bundles.gjoranv2023-05-251-0/+3
|
* Update ABI specJon Bratseth2023-05-221-0/+1
|
* Always treat each symbol as a separate tokenJon Bratseth2023-05-224-20/+56
|
* Threat 'other symbols' as lettersJon Bratseth2023-05-222-2/+10
| | | | | The unicode class 'other symbols' contains emojis, math symbols, etc. Treat these as letter characters to support searching for them.
* Use dollar and hour base unitsJon Bratseth2023-05-191-2/+2
|
* Use metric enums everywhereJon Bratseth2023-03-061-1/+1
|
* Add abi specLester Solbakken2023-02-101-0/+1
|
* Add decoding of sentencepiece token sequence to textLester Solbakken2023-02-101-0/+11
|
* Compute code points in whole string only when neededjonmv2022-12-062-6/+17
|
* Split out opennlp-linguisticsHenning Baldersheim2022-11-2614-783/+0
|
* Update ABI spec format, and update all specsjonmv2022-10-251-198/+198
|
* much simpler CharSequenceNormalizerArne Juul2022-10-063-9/+100
|
* Merge pull request #24007 from vespa-engine/bratseth/cleanup-082Jon Bratseth2022-09-252-13/+11
|\ | | | | No functional changes
| * No functional changesJon Bratseth2022-09-112-13/+11
| |
* | Make validation messages clearer given multiple instancesJon Bratseth2022-09-151-2/+0
|/
* bump protoc versionArne Juul2022-08-271-4/+0
|
* Determine token types considering all charactersJon Bratseth2022-08-166-119/+133
|
* Set project version to 8-SNAPSHOTgjoranv2022-06-081-2/+2
|
* Remove on Vespa 8Jon Bratseth2022-06-082-10/+1
|
* Use '@Inject' from 'annotations' in multiple bundlesBjørn Christian Seime2022-05-062-2/+2
|
* Resolve rank profile inputsJon Bratseth2022-04-211-1/+1
|
* Update abi-specLester Solbakken2022-03-221-1/+1
|
* Rename defaultEmbedderName to defaultEmbedderIdLester Solbakken2022-03-221-2/+2
|
* Add convenience function to represent embedder as mapLester Solbakken2022-03-212-3/+30
|
* Stem by linguistics in rule basesJon Bratseth2022-01-102-3/+21
| | | | Also add a @language directive to stem in other languages than english.
* unify java warnings (use compiler args from parent)Arne H Juul2022-01-061-8/+0
|
* annotate intentional switch fallthroughArne H Juul2022-01-061-0/+1
|
* Specify how the class is actually loadedJon Marius Venstad2021-12-211-1/+1
|
* Provide array of correct size.Jon Marius Venstad2021-12-201-1/+1
|
* Override ngram creation with something less sillyJon Marius Venstad2021-12-202-1/+32
|
* Use smaller chunks for faster detectionJon Marius Venstad2021-12-201-2/+2
|