vespa - An engine for low-latency computation over large data sets

	Commit message (Collapse)	Author	Age	Files	Lines
*	add comment for intention in determineScript function	MariusArhaug	2024-04-04	1	-0/+1
\|
*	Add SimpleTokenScript to SimpleTokenizer	MariusArhaug	2024-04-03	2	-1/+85
\| \| \| \| \| \| \| \|	When parsing datasets such as WikiDumps to a significance model, we want to only keep characters of that language script within our model. So when adding the script value to our tokenizer we are able to use this to filter out non-latin words when creating an english significance model for example.
*	Expose cache to embedders	Jon Bratseth	2024-04-01	1	-0/+23
\|
*	Pass context when resolving properties	Jon Bratseth	2024-02-15	1	-9/+0
\|
*	ChainedMap can't be copied	Jon Bratseth	2024-01-20	1	-1/+1
\|
*	Revert "Merge pull request #29905 from ↵	Jon Bratseth	2024-01-20	1	-0/+10
\| \| \| \| \| \| \|	vespa-engine/revert-29884-bratseth/param-refs-in-embed" This reverts commit c6b547c0c2898a324983356aa677ea3082533f7d, reversing changes made to 8c7f8c17ad5e1de5adcc71ee34f2a3c1cd36d6bd.
*	Revert "Support parameter references in embed"	Henning Baldersheim	2024-01-15	1	-10/+0
\|
*	Support parameter references in embed	Jon Bratseth	2024-01-12	1	-0/+10
\| \| \| \|	Support embed(@myParameter) in addition to embed('text to embed')
*	Revert "Merge pull request #29328 from ↵	Jon Bratseth	2023-11-14	4	-13/+30
\| \| \| \| \| \| \|	vespa-engine/revert-29314-bratseth/casing-take-2" This reverts commit a72e949533a46d665440a9c72ca2b8fb58f3a9c3, reversing changes made to 944d635d00e165166508ef23399e9ed65a87a9c8.
*	Revert "Bratseth/casing take 2"	Harald Musum	2023-11-13	4	-30/+13
\|
*	Prefer first stem to original if non equal	Jon Bratseth	2023-11-10	2	-11/+28
\|
*	Revert "Revert "Don't lowercase linguistics annotations""	Jon Bratseth	2023-11-09	2	-2/+2
\| \| \| \|	This reverts commit 0dfd4fe4c6ddbded490da36e71f27c4b70aa4226.
*	Revert "Don't lowercase linguistics annotations"	Jon Bratseth	2023-11-09	2	-2/+2
\|
*	Don't lowercase linguistics annotations	Jon Bratseth	2023-11-09	2	-2/+2
\| \| \| \| \| \|	Tokens are already lowercased by our bundled linguistics components. Lowercasing again when annotating precludes plugging in a lingustics component which preserves casing.
*	Avoid cutting surrogate pairs when tokenising	jonmv	2023-10-20	1	-1/+1
\|
*	Update copyright	Jon Bratseth	2023-10-09	51	-51/+51
\|
*	Allow sampling of fractional millis	Bjørn Christian Seime	2023-08-25	1	-3/+2
\|
*	Add generic metrics for embedders	Bjørn Christian Seime	2023-08-04	1	-0/+37
\|
*	Don't remove indexable symbols when stemming	Jon Bratseth	2023-06-02	3	-6/+14
\|
*	Always treat each symbol as a separate token	Jon Bratseth	2023-05-22	2	-17/+31
\|
*	Threat 'other symbols' as letters	Jon Bratseth	2023-05-22	1	-2/+2
\| \| \| \| \|	The unicode class 'other symbols' contains emojis, math symbols, etc. Treat these as letter characters to support searching for them.
*	Use dollar and hour base units	Jon Bratseth	2023-05-19	1	-2/+2
\|
*	Use metric enums everywhere	Jon Bratseth	2023-03-06	1	-1/+1
\|
*	Add decoding of sentencepiece token sequence to text	Lester Solbakken	2023-02-10	1	-0/+11
\|
*	Compute code points in whole string only when needed	jonmv	2022-12-06	1	-5/+3
\|
*	Split out opennlp-linguistics	Henning Baldersheim	2022-11-26	9	-414/+0
\|
*	much simpler CharSequenceNormalizer	Arne Juul	2022-10-06	3	-9/+100
\|
*	Merge pull request #24007 from vespa-engine/bratseth/cleanup-082	Jon Bratseth	2022-09-25	1	-11/+9
\|\ \| \| \| \|	No functional changes
\| *	No functional changes	Jon Bratseth	2022-09-11	1	-11/+9
\| \|
* \|	Make validation messages clearer given multiple instances	Jon Bratseth	2022-09-15	1	-2/+0
\|/
*	Determine token types considering all characters	Jon Bratseth	2022-08-16	4	-108/+89
\|
*	Remove on Vespa 8	Jon Bratseth	2022-06-08	1	-8/+0
\|
*	Use '@Inject' from 'annotations' in multiple bundles	Bjørn Christian Seime	2022-05-06	2	-2/+2
\|
*	Resolve rank profile inputs	Jon Bratseth	2022-04-21	1	-1/+1
\|
*	Rename defaultEmbedderName to defaultEmbedderId	Lester Solbakken	2022-03-22	1	-2/+2
\|
*	Add convenience function to represent embedder as map	Lester Solbakken	2022-03-21	1	-3/+26
\|
*	Stem by linguistics in rule bases	Jon Bratseth	2022-01-10	1	-3/+20
\| \| \| \|	Also add a @language directive to stem in other languages than english.
*	annotate intentional switch fallthrough	Arne H Juul	2022-01-06	1	-0/+1
\|
*	Specify how the class is actually loaded	Jon Marius Venstad	2021-12-21	1	-1/+1
\|
*	Provide array of correct size.	Jon Marius Venstad	2021-12-20	1	-1/+1
\|
*	Override ngram creation with something less silly	Jon Marius Venstad	2021-12-20	2	-1/+32
\|
*	Use smaller chunks for faster detection	Jon Marius Venstad	2021-12-20	1	-2/+2
\|
*	Upper bound on input size, and use opennlp before simple detector	Jon Marius Venstad	2021-12-20	1	-6/+3
\|
*	Avoid putting nulls in languange map	Jon Marius Venstad	2021-12-20	1	-2/+5
\|
*	Revert "Merge pull request #20578 from ↵	Jon Marius Venstad	2021-12-20	7	-131/+163
\| \| \| \| \| \| \|	vespa-engine/revert-20568-jonmv/replace-optimaize-with-lingua" This reverts commit 5476504932cd90eb2dad82dbab633e3ffa2034c3, reversing changes made to 235a78cc4707f78d18c6818a577de1b7507f5e40.
*	Revert "Replace optimaize with OpenNLP language detector [run-systemtest]"	Jon Marius Venstad	2021-12-18	7	-163/+131
\|
*	Re-add files	Jon Marius Venstad	2021-12-18	2	-0/+60
\|
*	Move model to module where it is needed, to simplify, at the cost of larger ↵	Jon Marius Venstad	2021-12-18	3	-22/+21
\| \| \| \|	bundles
*	Add some javadoc, and no need to handle null return for model	Jon Marius Venstad	2021-12-17	2	-2/+4
\|
*	Replace optimaize with OpenNLP language detector	Jon Marius Venstad	2021-12-17	6	-131/+102
\|