Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Always treat each symbol as a separate token | Jon Bratseth | 2023-05-22 | 1 | -1/+3 |
| | |||||
* | Revert "- HashMap over TreeMap when order des not matter." | Bjørn Christian Seime | 2023-04-11 | 1 | -5/+9 |
| | | | | This reverts commit b1733875a7303d71abfe384da2d6589af742d779. | ||||
* | - HashMap over TreeMap when order des not matter. | Henning Baldersheim | 2023-03-28 | 1 | -9/+5 |
| | | | | | | - Avoid creating mutable maps when not necessary. - Moderize iteration for readability. - Unify on Set.of instead of Collections.emptySet. | ||||
* | Cleanup only | Jon Bratseth | 2022-08-15 | 1 | -1/+0 |
| | |||||
* | Update 2017 copyright notices. | gjoranv | 2021-10-07 | 1 | -1/+1 |
| | |||||
* | Revert "Merge pull request #17754 from ↵ | Jon Bratseth | 2021-05-05 | 1 | -13/+21 |
| | | | | | | | vespa-engine/revert-17747-bratseth/special-tokens-take-2" This reverts commit a2c9cd4bc04f1a3eaa31524b3970b96be5c2eda9, reversing changes made to 8c61a373af0066fbdf1cca354c24b197c7347321. | ||||
* | Revert "Reapply "Bratseth/special tokens"" | Jon Bratseth | 2021-05-05 | 1 | -21/+13 |
| | |||||
* | Revert "Merge pull request #17746 from ↵ | Jon Bratseth | 2021-05-05 | 1 | -13/+21 |
| | | | | | | | vespa-engine/revert-17738-revert-17737-revert-17736-bratseth/special-tokens" This reverts commit 491856b396d003885e159345fe3f533f0fa35933, reversing changes made to 3720186303f4aef1d185525eaf61092097a64ec9. | ||||
* | Revert "Revert "Revert "Bratseth/special tokens""" | Jon Bratseth | 2021-05-05 | 1 | -21/+13 |
| | |||||
* | Revert "Revert "Bratseth/special tokens"" | Jon Bratseth | 2021-05-04 | 1 | -13/+21 |
| | |||||
* | Revert "Bratseth/special tokens" | Jon Bratseth | 2021-05-04 | 1 | -21/+13 |
| | |||||
* | Move specialtokens to linguistics | Jon Bratseth | 2021-05-04 | 1 | -0/+1 |
| | |||||
* | Make immutable | Jon Bratseth | 2021-05-04 | 1 | -12/+19 |
| | |||||
* | No functional changes | Jon Bratseth | 2021-05-04 | 1 | -4/+4 |
| | |||||
* | Non-functional changes only | Jon Bratseth | 2021-01-14 | 1 | -6/+3 |
| | |||||
* | Non-functional changes only | Jon Bratseth | 2020-03-09 | 1 | -18/+15 |
| | |||||
* | Update copyright headers | Jon Bratseth | 2017-06-14 | 1 | -1/+1 |
| | |||||
* | Revert "Update copyright headers" | Jon Bratseth | 2017-06-14 | 1 | -1/+1 |
| | |||||
* | Update copyright headers | Jon Bratseth | 2017-06-14 | 1 | -1/+1 |
| | |||||
* | Revert "Copyright header" | Jon Bratseth | 2017-06-13 | 1 | -1/+1 |
| | |||||
* | Copyright header | Jon Bratseth | 2017-06-13 | 1 | -1/+1 |
| | |||||
* | Correct heuristic for urls | Jon Bratseth | 2017-04-26 | 1 | -13/+9 |
| | |||||
* | suppress fallthrough warnings | Arne H Juul | 2017-04-26 | 1 | -0/+2 |
| | | | | * add comments where they occurred, somebody should look at that | ||||
* | Detect language after tokenization | Jon Bratseth | 2017-01-20 | 1 | -3/+3 |
| | | | | | | | This is a prerequisite to try to be smarter about what subset of the input text is used for language detection, however it breaks functionality in one subtle way: If an application does not pass language explicitly (such that it must be detected), and the input is CJK, and there are configured special tokens, those special tokens will not be detected if they are surrounded by word characters (instead of e.g space). | ||||
* | Use github name in @author | Jon Bratseth | 2016-06-16 | 1 | -1/+1 |
| | |||||
* | Publish | Jon Bratseth | 2016-06-15 | 1 | -0/+550 |