summaryrefslogtreecommitdiffstats
path: root/TODO.md
diff options
context:
space:
mode:
authorJon Bratseth <bratseth@verizonmedia.com>2019-02-18 13:58:02 +0100
committerJon Bratseth <bratseth@verizonmedia.com>2019-02-18 13:58:02 +0100
commita2eb3041fbcdc53be8f04c190faccad62f20077b (patch)
tree73e5174e4f4a7e476f6e2f05130ba59e28938f0e /TODO.md
parent6c83d19a6d60068d50fc952d358d023c5bcd2fdb (diff)
Sort by complexity
Diffstat (limited to 'TODO.md')
-rw-r--r--TODO.md112
1 files changed, 57 insertions, 55 deletions
diff --git a/TODO.md b/TODO.md
index e9c8a76ae6e..74cbdf7c6ca 100644
--- a/TODO.md
+++ b/TODO.md
@@ -4,39 +4,21 @@
This lists some possible improvements to Vespa which have been considered or requested, can be developed relatively
independently of other work and are not yet under development.
-## Global writes
-
-**Effort:** Large<br/>
-**Difficulty:** Master<br/>
-**Skills:** C++, Java, distributed systems, performance, multithreading, network, distributed consistency
-
-Vespa instances distribute data automatically within clusters, but these clusters are meant to consist of co-located
-machines - the distribution algorithm is not suitable for global distribution across datacenters because it cannot
-seamlessly tolerate datacenter-wide outages and does not attempt to minimize bandwith usage between datacenters.
-Application usually achieve global precense instead by setting up multiple independent instances in different
-datacenters and write to all in parallel. This is robust and works well on average, but puts additional burden on
-applications to achieve cross-datacenter data consistency on datacenter failures, and does not enable automatic
-data recovery across datacenters, such that data redundancy is effectively required within each datacenter.
-This is fine in most cases, but not in the case where storage space drives cost and intermittent loss of data coverage
-(completeness as seen from queries) is tolerable.
-
-A solution should sustain current write rates (tens of thousands of writes per ndoe per second), sustain write and read
-rates on loss of connectivity to one (any) data center, re-establish global data consistency when a lost datacenter is
-recovered and support some degree of tradeoff between consistency and operation latency (although the exact modes to be
-supported is part of the design and analysis needed).
-
-## Indexed search in maps
+## Query tracing including content nodes
-**Effort:** Medium<br/>
-**Difficulty:** Medium<br/>
-**Skills:** C++, Java, multithreading, performance, indexing, data structures
+**Effort:** Low<br/>
+**Difficulty:** Low<br/>
+**Skills:** Java, C++, multithreading
-Vespa supports maps and and making them searchable in memory by declaring as an attribute.
-However, maps cannot be indexed as text-search disk indexes.
+Currently, trace information can be requested for a given query by adding travelevel=N to the query. This is useful for
+debugging as well as understanding performance bottlenecks. However, the trace information only includes execution in
+the container, not in the content nodes. This is to implement similar tracing capabilities in the search core and
+integrating trace information from each content node into the container level trace. This would make it easier to
+understand the execution and performance consequences of various query expressions.
## Change search protocol from fnet to RPC
-**Effort:** Small<br/>
+**Effort:** Low<br/>
**Difficulty:** Low<br/>
**Skills:** Java, C++, networking
@@ -47,7 +29,7 @@ The largest part of this work is to encode the Query object as a Slime structure
## Support query profiles for document processors
-**Effort:** Small<br/>
+**Effort:** Low<br/>
**Difficulty:** Low<br/>
**Skills:** Java
@@ -70,23 +52,6 @@ which is inconvenient and suboptimal. This is to support (scheduled or triggered
This can be achieved by configuring a message bus route which feeds content from a cluster back to itself through the
indexing container cluster and triggering a visiting job using this route.
-## Global dynamic tensors
-
-**Effort:** High
-**Difficulty:** Master<br/>
-**Skills:** Java, C++, distributed systems, performance, networking, distributed consistency
-
-Tensors in ranking models may either be passed with the query, be part of the document or be configured as part of the
-application package (global tensors). This is fine for many kinds of models but does not support the case of really
-large tensors (which barely fit in memory) and/or dynamically changing tensors (online learning of global models).
-These use cases require support for global tensors (tensors available locally on all content nodes during execution
-but not sent with the query or residing in documents) which are not configured as part of the application package but
-which are written independently and dynamically updateable at a high write rate. To support this at large scale, with a
-high write rate, we need a small cluster of nodes storing the source of truth of the global tensor and which have
-perfect consistency. This in turn must push updates to all content nodes in a best effort fashion given a fixed bandwith
-budget, such that query execution and document write traffic is prioritized over ensuring perfect consistency of global
-model updates.
-
## Java implementation of the content layer for testing
**Effort:** Medium<br/>
@@ -109,14 +74,51 @@ libraries (see the searchlib module).
Support "update where" operations which changes/removes all documents matching some document selection expression. This
entails adding a new document API operation and probably supporting continuations similar to visiting.
-## Query tracing including content nodes
+## Indexed search in maps
+
+**Effort:** Medium<br/>
+**Difficulty:** Medium<br/>
+**Skills:** C++, Java, multithreading, performance, indexing, data structures
+
+Vespa supports maps and and making them searchable in memory by declaring as an attribute.
+However, maps cannot be indexed as text-search disk indexes.
+
+## Global writes
+
+**Effort:** Large<br/>
+**Difficulty:** Master<br/>
+**Skills:** C++, Java, distributed systems, performance, multithreading, network, distributed consistency
+
+Vespa instances distribute data automatically within clusters, but these clusters are meant to consist of co-located
+machines - the distribution algorithm is not suitable for global distribution across datacenters because it cannot
+seamlessly tolerate datacenter-wide outages and does not attempt to minimize bandwith usage between datacenters.
+Application usually achieve global precense instead by setting up multiple independent instances in different
+datacenters and write to all in parallel. This is robust and works well on average, but puts additional burden on
+applications to achieve cross-datacenter data consistency on datacenter failures, and does not enable automatic
+data recovery across datacenters, such that data redundancy is effectively required within each datacenter.
+This is fine in most cases, but not in the case where storage space drives cost and intermittent loss of data coverage
+(completeness as seen from queries) is tolerable.
+
+A solution should sustain current write rates (tens of thousands of writes per ndoe per second), sustain write and read
+rates on loss of connectivity to one (any) data center, re-establish global data consistency when a lost datacenter is
+recovered and support some degree of tradeoff between consistency and operation latency (although the exact modes to be
+supported is part of the design and analysis needed).
+
+## Global dynamic tensors
+
+**Effort:** High
+**Difficulty:** Master<br/>
+**Skills:** Java, C++, distributed systems, performance, networking, distributed consistency
+
+Tensors in ranking models may either be passed with the query, be part of the document or be configured as part of the
+application package (global tensors). This is fine for many kinds of models but does not support the case of really
+large tensors (which barely fit in memory) and/or dynamically changing tensors (online learning of global models).
+These use cases require support for global tensors (tensors available locally on all content nodes during execution
+but not sent with the query or residing in documents) which are not configured as part of the application package but
+which are written independently and dynamically updateable at a high write rate. To support this at large scale, with a
+high write rate, we need a small cluster of nodes storing the source of truth of the global tensor and which have
+perfect consistency. This in turn must push updates to all content nodes in a best effort fashion given a fixed bandwith
+budget, such that query execution and document write traffic is prioritized over ensuring perfect consistency of global
+model updates.
-**Effort:** Low<br/>
-**Difficulty:** Low<br/>
-**Skills:** Java, C++, multithreading
-Currently, trace information can be requested for a given query by adding travelevel=N to the query. This is useful for
-debugging as well as understanding performance bottlenecks. However, the trace information only includes execution in
-the container, not in the content nodes. This is to implement similar tracing capabilities in the search core and
-integrating trace information from each content node into the container level trace. This would make it easier to
-understand the execution and performance consequences of various query expressions.