summaryrefslogtreecommitdiffstats
path: root/container-search/src/main/java/com/yahoo/prelude/IndexFacts.java
diff options
context:
space:
mode:
authorJon Bratseth <bratseth@yahoo-inc.com>2017-01-20 15:12:15 +0100
committerJon Bratseth <bratseth@yahoo-inc.com>2017-01-20 15:12:15 +0100
commit09caf52b327f6a48af8acf02872a49e08d75c9c9 (patch)
tree7e3b4422ebe5a6ec71e2b2fbaeabb3cb4306f226 /container-search/src/main/java/com/yahoo/prelude/IndexFacts.java
parent262d072c1ac996b34f6c70efc95853be699ca935 (diff)
Detect language after tokenization
This is a prerequisite to try to be smarter about what subset of the input text is used for language detection, however it breaks functionality in one subtle way: If an application does not pass language explicitly (such that it must be detected), and the input is CJK, and there are configured special tokens, those special tokens will not be detected if they are surrounded by word characters (instead of e.g space).
Diffstat (limited to 'container-search/src/main/java/com/yahoo/prelude/IndexFacts.java')
-rw-r--r--container-search/src/main/java/com/yahoo/prelude/IndexFacts.java4
1 files changed, 2 insertions, 2 deletions
diff --git a/container-search/src/main/java/com/yahoo/prelude/IndexFacts.java b/container-search/src/main/java/com/yahoo/prelude/IndexFacts.java
index 3631dedeffc..3f931c92489 100644
--- a/container-search/src/main/java/com/yahoo/prelude/IndexFacts.java
+++ b/container-search/src/main/java/com/yahoo/prelude/IndexFacts.java
@@ -18,11 +18,11 @@ import static com.yahoo.text.Lowercase.toLowerCase;
* session.getIndex(indexName).[get index info]
* </code></pre>
*
- * @author <a href="mailto:steinar@yahoo-inc.com">Steinar Knutsen</a>
+ * @author Steinar Knutsen
*/
// TODO: We should replace this with a better representation of search definitions
// which is immutable, models clusters and search definitions inside clusters properly,
-// and uses better names.
+// and uses better names. -bratseth
public class IndexFacts {
private Map<String, List<String>> clusterByDocument;