Add code pointers

author: Jon Bratseth <bratseth@verizonmedia.com> 2019-02-18 15:19:46 +0100
committer: Jon Bratseth <bratseth@verizonmedia.com> 2019-02-18 15:19:46 +0100
commit: fbaf43ef6e0f1be149ad1a7dce11a541d485fb2c (patch)
tree: 49a589012bd76f6f8dd8bbbb3ddc3cd4ff6b16ea
parent: a2eb3041fbcdc53be8f04c190faccad62f20077b (diff)
1 files changed, 49 insertions, 13 deletions
diff --git a/TODO.md b/TODO.md
index 74cbdf7c6ca..7f6a17bf082 100644
--- a/TODO.md
+++ b/TODO.md
@@ -2,7 +2,8 @@
 # List of possible future enhancements and features
 
 This lists some possible improvements to Vespa which have been considered or requested, can be developed relatively 
-independently of other work and are not yet under development.
+independently of other work, and are not yet under development. For more information on the code structure in Vespa, see
+[Code-map.md](Code-map.md).
 
 ## Query tracing including content nodes
 
@@ -16,16 +17,9 @@ the container, not in the content nodes. This is to implement similar tracing ca
 integrating trace information from each content node into the container level trace. This would make it easier to 
 understand the execution and performance consequences of various query expressions.
 
-## Change search protocol from fnet to RPC
-
-**Effort:** Low<br/>
-**Difficulty:** Low<br/>
-**Skills:** Java, C++, networking
+**Code pointers:**
+- [Container-level tracing](https://github.com/vespa-engine/vespa/blob/master/processing/src/main/java/com/yahoo/processing/execution/Execution.java#L245=)
 
-Currently, search requests happens over a very old custom protocol called "fnet". While this is efficient, it is hard to extend. 
-We want to replace it by RPC calls. 
-An RPC alternative is already implemented for summary fetch requests, but not for search requests.
-The largest part of this work is to encode the Query object as a Slime structure in Java and decode that structure in C++.
 
 ## Support query profiles for document processors
 
@@ -38,6 +32,11 @@ bundles of parameters accessible to Searchers processing queries. Writes go thro
 Document Processors, but have no equivalent support for parametrization. This is to allow configuration of document 
 processor profiles by reusing the query profile support also for document processors.
 
+**Code pointers:**
+- [Query profiles](https://github.com/vespa-engine/vespa/blob/master/container-search/src/main/java/com/yahoo/search/query/profile/QueryProfile.java)
+- [Document processors](https://github.com/vespa-engine/vespa/blob/master/docproc/src/main/java/com/yahoo/docproc/DocumentProcessor.java)
+
+
 ## Background reindexing
 
 **Effort:** Medium<br/>
@@ -48,10 +47,32 @@ Some times there is a need to reindex existing data to refresh the set of tokens
 definition changes impacts the tokens produced, and changing versions of linguistics libraries also cause token changes. 
 As content clusters store the raw data of documents it should be possible to reindex locally inside clusters in the 
 background. However, today this is not supported and content need to be rewritten from the outside to refresh tokens, 
-which is inconvenient and suboptimal. This is to support (scheduled or triggered) backgroun reindexing from local data. 
+which is inconvenient and suboptimal. This is to support (scheduled or triggered) background reindexing from local data. 
 This can be achieved by configuring a message bus route which feeds content from a cluster back to itself through the 
 indexing container cluster and triggering a visiting job using this route.
 
+**Code pointers:**
+- Document API which can be used to receive a dumpt of documents: [DocumentAccess](https://github.com/vespa-engine/vespa/blob/master/documentapi/src/main/java/com/yahoo/documentapi/DocumentAccess.java)
+
+
+## Change search protocol from fnet to RPC
+
+**Effort:** Medium<br/>
+**Difficulty:** Low<br/>
+**Skills:** Java, C++, networking
+
+Currently, search requests happens over a very old custom protocol called "fnet". While this is efficient, it is hard to extend. 
+We want to replace it by RPC calls. 
+An RPC alternative is already implemented for summary fetch requests, but not for search requests.
+The largest part of this work is to encode the Query object as a Slime structure in Java and decode that structure in C++.
+
+**Code pointers:**
+- FS4 protocol search invokers (to be replaced by RPC): [FS4SearchInvoker](https://github.com/vespa-engine/vespa/blob/master/container-search/src/main/java/com/yahoo/prelude/fastsearch/FS4SearchInvoker.java), [FS4FillInvoker](https://github.com/vespa-engine/vespa/blob/master/container-search/src/main/java/com/yahoo/prelude/fastsearch/FS4FillInvoker.java)
+- Current Query encoding (to be replaced by Slime): [QueryPacket](https://github.com/vespa-engine/vespa/blob/master/container-search/src/main/java/com/yahoo/fs4/QueryPacket.java)
+- Slime: [Java](https://github.com/vespa-engine/vespa/blob/master/vespajlib/src/main/java/com/yahoo/slime/Slime.java), [C++](https://github.com/vespa-engine/vespa/blob/master/vespalib/src/vespa/vespalib/data/slime/slime.h)
+- [C++ query (to be constructed from Slime)](https://github.com/vespa-engine/vespa/blob/master/searchlib/src/vespa/searchlib/query/query.h)
+
+
 ## Java implementation of the content layer for testing
 
 **Effort:** Medium<br/>
@@ -65,6 +86,10 @@ functionality would enable developers to do more testing locally within their ID
 performance is not a concern and some components, such as ranking expressions and features are already available as 
 libraries (see the searchlib module).
 
+**Code pointers:**
+- Content cluster mock in Java  (currently empy): [ContentCluster](https://github.com/vespa-engine/vespa/blob/master/application/src/main/java/com/yahoo/application/content/ContentCluster.java)
+- The model of a search definition this must consume config from: [Search](https://github.com/vespa-engine/vespa/blob/master/config-model/src/main/java/com/yahoo/searchdefinition/Search.java)
+
 ## Update where
 
 **Effort:** Medium<br/>
@@ -74,15 +99,22 @@ libraries (see the searchlib module).
 Support "update where" operations which changes/removes all documents matching some document selection expression. This 
 entails adding a new document API operation and probably supporting continuations similar to visiting.
 
+- Document operations: [Java](https://github.com/vespa-engine/vespa/tree/master/document/src/main/java/com/yahoo/document/update), [C++](https://github.com/vespa-engine/vespa/tree/master/document/src/vespa/document/update)
+
+
 ## Indexed search in maps
 
 **Effort:** Medium<br/>
 **Difficulty:** Medium<br/>
-**Skills:** C++, Java, multithreading, performance, indexing, data structures
+**Skills:** C++, multithreading, performance, indexing, data structures
 
 Vespa supports maps and and making them searchable in memory by declaring as an attribute. 
 However, maps cannot be indexed as text-search disk indexes. 
 
+**Code pointers:**
+- [Current text indexes](https://github.com/vespa-engine/vespa/tree/master/searchlib/src/vespa/searchlib/index)
+
+
 ## Global writes
 
 **Effort:** Large<br/>
@@ -104,6 +136,9 @@ rates on loss of connectivity to one (any) data center, re-establish global data
 recovered and support some degree of tradeoff between consistency and operation latency (although the exact modes to be 
 supported is part of the design and analysis needed).
 
+**Code pointers:**
+- [Document API](https://github.com/vespa-engine/vespa/tree/master/documentapi/src/main/java/com/yahoo/documentapi)
+
 ## Global dynamic tensors
 
 **Effort:** High
@@ -121,4 +156,5 @@ perfect consistency. This in turn must push updates to all content nodes in a be
 budget, such that query execution and document write traffic is prioritized over ensuring perfect consistency of global
 model updates.
 
-
+**Code pointers:**
+- Tensor modify operation (for document tensors): [Java](https://github.com/vespa-engine/vespa/blob/master/document/src/main/java/com/yahoo/document/update/TensorModifyUpdate.java), [C++](https://github.com/vespa-engine/vespa/blob/master/document/src/vespa/document/update/tensor_modify_update.h)
author	Jon Bratseth <bratseth@verizonmedia.com>	2019-02-18 15:19:46 +0100
committer	Jon Bratseth <bratseth@verizonmedia.com>	2019-02-18 15:19:46 +0100
commit	fbaf43ef6e0f1be149ad1a7dce11a541d485fb2c (patch)
tree	49a589012bd76f6f8dd8bbbb3ddc3cd4ff6b16ea
parent	a2eb3041fbcdc53be8f04c190faccad62f20077b (diff)