Sort by complexity

author: Jon Bratseth <bratseth@verizonmedia.com> 2019-02-18 13:58:02 +0100
committer: Jon Bratseth <bratseth@verizonmedia.com> 2019-02-18 13:58:02 +0100
commit: a2eb3041fbcdc53be8f04c190faccad62f20077b (patch)
tree: 73e5174e4f4a7e476f6e2f05130ba59e28938f0e /TODO.md
parent: 6c83d19a6d60068d50fc952d358d023c5bcd2fdb (diff)
1 files changed, 57 insertions, 55 deletions
diff --git a/TODO.md b/TODO.md
index e9c8a76ae6e..74cbdf7c6ca 100644
--- a/TODO.md
+++ b/TODO.md
@@ -4,39 +4,21 @@
 This lists some possible improvements to Vespa which have been considered or requested, can be developed relatively 
 independently of other work and are not yet under development.
 
-## Global writes
-
-**Effort:** Large<br/>
-**Difficulty:** Master<br/>
-**Skills:** C++, Java, distributed systems, performance, multithreading, network, distributed consistency
-
-Vespa instances distribute data automatically within clusters, but these clusters are meant to consist of co-located 
-machines - the distribution algorithm is not suitable for global distribution across datacenters because it cannot 
-seamlessly tolerate datacenter-wide outages and does not attempt to minimize bandwith usage between datacenters.
-Application usually achieve global precense instead by setting up multiple independent instances in different 
-datacenters and write to all in parallel. This is robust and works well on average, but puts additional burden on 
-applications to achieve cross-datacenter data consistency on datacenter failures, and does not enable automatic 
-data recovery across datacenters, such that data redundancy is effectively required within each datacenter. 
-This is fine in most cases, but not in the case where storage space drives cost and intermittent loss of data coverage 
-(completeness as seen from queries) is tolerable.
-
-A solution should sustain current write rates (tens of thousands of writes per ndoe per second), sustain write and read 
-rates on loss of connectivity to one (any) data center, re-establish global data consistency when a lost datacenter is 
-recovered and support some degree of tradeoff between consistency and operation latency (although the exact modes to be 
-supported is part of the design and analysis needed).
-
-## Indexed search in maps
+## Query tracing including content nodes
 
-**Effort:** Medium<br/>
-**Difficulty:** Medium<br/>
-**Skills:** C++, Java, multithreading, performance, indexing, data structures
+**Effort:** Low<br/>
+**Difficulty:** Low<br/>
+**Skills:** Java, C++, multithreading
 
-Vespa supports maps and and making them searchable in memory by declaring as an attribute. 
-However, maps cannot be indexed as text-search disk indexes. 
+Currently, trace information can be requested for a given query by adding travelevel=N to the query. This is useful for 
+debugging as well as understanding performance bottlenecks. However, the trace information only includes execution in 
+the container, not in the content nodes. This is to implement similar tracing capabilities in the search core and 
+integrating trace information from each content node into the container level trace. This would make it easier to 
+understand the execution and performance consequences of various query expressions.
 
 ## Change search protocol from fnet to RPC
 
-**Effort:** Small<br/>
+**Effort:** Low<br/>
 **Difficulty:** Low<br/>
 **Skills:** Java, C++, networking
 
@@ -47,7 +29,7 @@ The largest part of this work is to encode the Query object as a Slime structure
 
 ## Support query profiles for document processors
 
-**Effort:** Small<br/>
+**Effort:** Low<br/>
 **Difficulty:** Low<br/>
 **Skills:** Java
 
@@ -70,23 +52,6 @@ which is inconvenient and suboptimal. This is to support (scheduled or triggered
 This can be achieved by configuring a message bus route which feeds content from a cluster back to itself through the 
 indexing container cluster and triggering a visiting job using this route.
 
-## Global dynamic tensors
-
-**Effort:** High
-**Difficulty:** Master<br/>
-**Skills:** Java, C++, distributed systems, performance, networking, distributed consistency
-
-Tensors in ranking models may either be passed with the query, be part of the document or be configured as part of the 
-application package (global tensors). This is fine for many kinds of models but does not support the case of really 
-large tensors (which barely fit in memory) and/or dynamically changing tensors (online learning of global models). 
-These use cases require support for global tensors (tensors available locally on all content nodes during execution 
-but not sent with the query or residing in documents) which are not configured as part of the application package but 
-which are written independently and dynamically updateable at a high write rate. To support this at large scale, with a
-high write rate, we need a small cluster of nodes storing the source of truth of the global tensor and which have 
-perfect consistency. This in turn must push updates to all content nodes in a best effort fashion given a fixed bandwith
-budget, such that query execution and document write traffic is prioritized over ensuring perfect consistency of global
-model updates.
-
 ## Java implementation of the content layer for testing
 
 **Effort:** Medium<br/>
@@ -109,14 +74,51 @@ libraries (see the searchlib module).
 Support "update where" operations which changes/removes all documents matching some document selection expression. This 
 entails adding a new document API operation and probably supporting continuations similar to visiting.
 
-## Query tracing including content nodes
+## Indexed search in maps
+
+**Effort:** Medium<br/>
+**Difficulty:** Medium<br/>
+**Skills:** C++, Java, multithreading, performance, indexing, data structures
+
+Vespa supports maps and and making them searchable in memory by declaring as an attribute. 
+However, maps cannot be indexed as text-search disk indexes. 
+
+## Global writes
+
+**Effort:** Large<br/>
+**Difficulty:** Master<br/>
+**Skills:** C++, Java, distributed systems, performance, multithreading, network, distributed consistency
+
+Vespa instances distribute data automatically within clusters, but these clusters are meant to consist of co-located 
+machines - the distribution algorithm is not suitable for global distribution across datacenters because it cannot 
+seamlessly tolerate datacenter-wide outages and does not attempt to minimize bandwith usage between datacenters.
+Application usually achieve global precense instead by setting up multiple independent instances in different 
+datacenters and write to all in parallel. This is robust and works well on average, but puts additional burden on 
+applications to achieve cross-datacenter data consistency on datacenter failures, and does not enable automatic 
+data recovery across datacenters, such that data redundancy is effectively required within each datacenter. 
+This is fine in most cases, but not in the case where storage space drives cost and intermittent loss of data coverage 
+(completeness as seen from queries) is tolerable.
+
+A solution should sustain current write rates (tens of thousands of writes per ndoe per second), sustain write and read 
+rates on loss of connectivity to one (any) data center, re-establish global data consistency when a lost datacenter is 
+recovered and support some degree of tradeoff between consistency and operation latency (although the exact modes to be 
+supported is part of the design and analysis needed).
+
+## Global dynamic tensors
+
+**Effort:** High
+**Difficulty:** Master<br/>
+**Skills:** Java, C++, distributed systems, performance, networking, distributed consistency
+
+Tensors in ranking models may either be passed with the query, be part of the document or be configured as part of the 
+application package (global tensors). This is fine for many kinds of models but does not support the case of really 
+large tensors (which barely fit in memory) and/or dynamically changing tensors (online learning of global models). 
+These use cases require support for global tensors (tensors available locally on all content nodes during execution 
+but not sent with the query or residing in documents) which are not configured as part of the application package but 
+which are written independently and dynamically updateable at a high write rate. To support this at large scale, with a
+high write rate, we need a small cluster of nodes storing the source of truth of the global tensor and which have 
+perfect consistency. This in turn must push updates to all content nodes in a best effort fashion given a fixed bandwith
+budget, such that query execution and document write traffic is prioritized over ensuring perfect consistency of global
+model updates.
 
-**Effort:** Low<br/>
-**Difficulty:** Low<br/>
-**Skills:** Java, C++, multithreading
 
-Currently, trace information can be requested for a given query by adding travelevel=N to the query. This is useful for 
-debugging as well as understanding performance bottlenecks. However, the trace information only includes execution in 
-the container, not in the content nodes. This is to implement similar tracing capabilities in the search core and 
-integrating trace information from each content node into the container level trace. This would make it easier to 
-understand the execution and performance consequences of various query expressions.
author	Jon Bratseth <bratseth@verizonmedia.com>	2019-02-18 13:58:02 +0100
committer	Jon Bratseth <bratseth@verizonmedia.com>	2019-02-18 13:58:02 +0100
commit	a2eb3041fbcdc53be8f04c190faccad62f20077b (patch)
tree	73e5174e4f4a7e476f6e2f05130ba59e28938f0e /TODO.md
parent	6c83d19a6d60068d50fc952d358d023c5bcd2fdb (diff)