summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorKristian Aune <kraune@verizonmedia.com>2023-09-20 16:10:41 +0200
committerKristian Aune <kraune@verizonmedia.com>2023-09-20 16:10:41 +0200
commit5c65fa5021d3fc3ed0c204473e2ef8c1c7f2cef4 (patch)
tree583218e074a4b328e7968d62b57d61fc1d8ac384
parent09a2a7c21a1ca82a4ca56a449283217fedf8125b (diff)
Fix links and HTML errors
-rw-r--r--CONTRIBUTING.md40
-rw-r--r--Code-map.md35
-rw-r--r--README.md17
-rw-r--r--TODO.md75
-rw-r--r--screwdriver.yaml2
5 files changed, 79 insertions, 90 deletions
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 9e18ffbf487..b96c7ee8a60 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -2,41 +2,41 @@
# Contributing to Vespa
-Contributions to [Vespa](http://github.com/vespa-engine/vespa),
-[Vespa system tests](http://github.com/vespa-engine/system-test),
+Contributions to [Vespa](https://github.com/vespa-engine/vespa),
+[Vespa system tests](https://github.com/vespa-engine/system-test),
[Vespa samples](https://github.com/vespa-engine/sample-apps)
-and the [Vespa documentation](http://github.com/vespa-engine/documentation) are welcome.
+and the [Vespa documentation](https://github.com/vespa-engine/documentation) are welcome.
This documents tells you what you need to know to contribute.
## Open development
-All work on Vespa happens directly on Github,
-using the [Github flow model](https://guides.github.com/introduction/flow/).
-We release the master branch four times a week and you should expect it to always work.
+All work on Vespa happens directly on GitHub,
+using the [GitHub flow model](https://docs.github.com/en/get-started/quickstart/github-flow).
+We release the master branch four times a week, and you should expect it to always work.
The continuous build of Vespa is at [https://factory.vespa.oath.cloud](https://factory.vespa.oath.cloud).
You can follow the fate of each commit there.
-All pull requests must be approved by a
+All pull requests must be approved by a
[Vespa Committer](https://github.com/orgs/vespa-engine/people).
You can find a suitable reviewer in the OWNERS file upward in the source tree from
where you are making the change (OWNERS have a special responsibility for
ensuring the long-term integrity of a portion of the code).
-The way to become a committer (and OWNER) is to make some quality contributions
+The way to become a committer (and OWNER) is to make some quality contributions
to an area of the code. See [GOVERNANCE](GOVERNANCE.md) for more details.
### Creating a Pull Request
-Please follow
-[best practices](https://github.com/trein/dev-best-practices/wiki/Git-Commit-Best-Practices)
+Please follow
+[best practices](https://github.com/trein/dev-best-practices/wiki/Git-Commit-Best-Practices)
for creating git commits.
-When your code is ready to be submitted,
-[submit a pull request](https://help.github.com/articles/creating-a-pull-request/)
+When your code is ready to be submitted,
+[submit a pull request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request)
to request a code review.
-We only seek to accept code that you are authorized to contribute to the project.
-We have added a pull request template on our projects so that your contributions are made
+We only seek to accept code that you are authorized to contribute to the project.
+We have added a pull request template on our projects so that your contributions are made
with the following confirmation:
> I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.
@@ -44,15 +44,15 @@ with the following confirmation:
## Versioning
Vespa uses semantic versioning - see
-[vespa versions](http://docs.vespa.ai/en/vespa-versions.html).
+[vespa versions](https://vespa.ai/releases#versions).
Notice in particular that any Java API in a package having a @PublicAPI
annotation in the package-info, and no @Beta annotation on the class,
-cannot be changed in an incompatible way between major versions:
+cannot be changed in an incompatible way between major versions:
Existing types and method signatures must be preserved
(but can be marked deprecated).
We verify ABI compatibility during the regular Java build you'll run with Maven (mvn install).
-This build step will also fail if you *add* to public API's, which is fine if there's a good reason
+This build step will also fail if you _add_ to public APIs, which is fine if there's a good reason
to do it. In that case update the ABI spec as instructed in the error message.
## Issues
@@ -60,13 +60,13 @@ to do it. In that case update the ABI spec as instructed in the error message.
We track issues in [GitHub issues](https://github.com/vespa-engine/vespa/issues).
It is fine to submit issues also for feature requests and ideas, whether or not you intend to work on them.
-There is also a [ToDo list](TODO.md) for larger things nobody are working on yet.
+There is also a [ToDo list](TODO.md) for larger things nobody is working on yet.
## Community
-If you have questions, want to share your experience or help others,
+If you have questions, want to share your experience or help others,
join our [Slack channel](http://slack.vespa.ai).
-See also [Stack Overflow questions tagged Vespa](http://stackoverflow.com/questions/tagged/vespa),
+See also [Stack Overflow questions tagged Vespa](https://stackoverflow.com/questions/tagged/vespa),
and feel free to add your own.
### Getting started
diff --git a/Code-map.md b/Code-map.md
index 17c27327c5a..ffa72290094 100644
--- a/Code-map.md
+++ b/Code-map.md
@@ -5,11 +5,11 @@
You want to get familiar with the Vespa code base but don't know where to start?
Vespa consists of about 1.7 million lines of code, about equal parts Java and C++.
-Since it it's mostly written by a team of developers selected for their ability
-to do this kind of thing unusually well, who have been given time to dedicate
-themselves to it for a long time, it is mostly easily to work with. However, one
+Since it is mostly written by a team of developers selected for their ability
+to do this kind of thing unusually well, who have been given time to dedicate
+themselves to it for a long time, it is mostly easily to work with. However, one
thing we haven't done is to create a module structure friendly to newcomers - the code
-simply organized in a flat structure of about 150 modules.
+simply organized in a flat structure of about 150 modules.
This document aims to provide a map from the
[functional elements](https://docs.vespa.ai/en/overview.html)
@@ -18,22 +18,21 @@ of Vespa to the most important modules in the flat module structure in the
![Code map](Code-map.png)
-It covers the modules you are most likely to encounter as a developer.
-The rest are either small and needed for technical reasons or doing one thing
-which should be self-explanatory, or implementing the cloud service run by the
-Vespa team which we don't expect anybody else to run and therefore be interested
+It covers the modules you are most likely to encounter as a developer.
+The rest are either small and needed for technical reasons or doing one thing
+which should be self-explanatory, or implementing the cloud service run by the
+Vespa team which we don't expect anybody else to run and therefore be interested
in changing.
-
## The stateless container
When a request is made to Vespa it first enters some stateless container cluster,
called jDisc. This consists of:
-- a __jDisc core__ layer which provides a model of a running application, general protocol-independent request-response handling, with various protocol implementations,
-- a __jDisc container__ layer providing component management, configuration and similar.
-- a __search middleware__ layer containing query/result API's, query execution logic etc.
-- API's and modules for writing and processing document operations.
+- a **jDisc core** layer which provides a model of a running application, general protocol-independent request-response handling, with various protocol implementations,
+- a **jDisc container** layer providing component management, configuration and similar.
+- a **search middleware** layer containing query/result APIs, query execution logic etc.
+- APIs and modules for writing and processing document operations.
The stateless container is implemented in Java.
@@ -58,7 +57,7 @@ Document operation modules:
- [container-messagebus](https://github.com/vespa-engine/vespa/tree/master/container-messagebus) - MessageBus connector for jDisc.
- [documentapi](https://github.com/vespa-engine/vespa/tree/master/documentapi) - API for issuing document operations to Vespa over messagebus.
- [docproc](https://github.com/vespa-engine/vespa/tree/master/docproc) - chainable document (operation) processors: Document operations issued over messagebus to Vespa will usually be routed through a container running a document processor chain.
-- [indexinglanguage](https://github.com/vespa-engine/vespa/tree/master/indexinglanguage) - implementation of the "indexing" language which is used to express the statements prefixed by "indexing:" in the search definition.
+- [indexinglanguage](https://github.com/vespa-engine/vespa/tree/master/indexinglanguage) - implementation of the "indexing" language which is used to express the statements prefixed by "indexing:" in the search definition.
- [docprocs](https://github.com/vespa-engine/vespa/tree/master/docprocs) - document processor components bundled with Vespa. Notably the Indexingprocessor - a document processor invoking the indexing language statements configured for the document type in question on document operations.
- [vespaclient-container-plugin](https://github.com/vespa-engine/vespa/tree/master/vespaclient-container-plugin) - implements the document/v1 API and internal API used by the Java HTTP client on top of the jDisc container, forwarding to the Document API.
- [vespa-feed-client](https://github.com/vespa-engine/vespa/tree/master/vespa-feed-client) - client for fast writing to the internal API implemented by vespaclient-container-plugin.
@@ -72,9 +71,8 @@ This is written in C++.
- [searchlib](https://github.com/vespa-engine/vespa/tree/master/searchlib) - libraries invoked by searchcore: Ranking (feature execution framework (fef), rank feature implementations, ranking expressions), index and btree implementations, attributes (forward indexes) etc. In addition, this contains the Java libraries for ranking.
- [storage](https://github.com/vespa-engine/vespa/tree/master/storage/src/vespa/storage) - system for elastic and auto-recovering data storage over clusters of nodes.
- [eval](https://github.com/vespa-engine/vespa/tree/master/eval) - library for efficient evaluation of ranking expressions. Tensor API and implementation.
-- [storageapi](https://github.com/vespa-engine/vespa/tree/master/storageapi/src/vespa/storageapi) - message bus messages and implementation for the document API.
-- [clustercontroller-core](https://github.com/vespa-engine/vespa/tree/master/clustercontroller-core) - cluster controller for storage, implemented in Java. This provides singular node-level decision making for storage, based on ZooKeeper.
-
+- [storageapi](https://github.com/vespa-engine/vespa/tree/master/storage/src/vespa/storageapi) - message bus messages and implementation for the document API.
+- [clustercontroller-core](https://github.com/vespa-engine/vespa/tree/master/clustercontroller-core) - cluster controller for storage, implemented in Java. This provides singular node-level decision-making for storage, based on ZooKeeper.
## Configuration and administration
@@ -94,6 +92,3 @@ Libraries used throughput the code.
- [vespalib](https://github.com/vespa-engine/vespa/tree/master/vespalib) - general utility library for C++
- [vespajlib](https://github.com/vespa-engine/vespa/tree/master/vespajlib) - general utility library for Java. Includes the Java implementation of the tensor library.
-
-
-
diff --git a/README.md b/README.md
index df9e1109a2b..a684aae70dc 100644
--- a/README.md
+++ b/README.md
@@ -8,12 +8,12 @@ over big data at serving time.
This is the primary repository for Vespa where all development is happening.
New production releases from this repository's master branch are made each weekday from Monday through Thursday.
-* Home page: [https://vespa.ai](https://vespa.ai)
-* Documentation: [https://docs.vespa.ai](https://docs.vespa.ai)
-* Continuous build: [https://factory.vespa.oath.cloud](https://factory.vespa.oath.cloud)
-* Run applications in the cloud for free: [https://cloud.vespa.ai](https://cloud.vespa.ai)
+- Home page: [https://vespa.ai](https://vespa.ai)
+- Documentation: [https://docs.vespa.ai](https://docs.vespa.ai)
+- Continuous build: [https://factory.vespa.oath.cloud](https://factory.vespa.oath.cloud)
+- Run applications in the cloud for free: [https://cloud.vespa.ai](https://cloud.vespa.ai)
-Vespa build status: [![Vespa Build Status](https://cd.screwdriver.cd/pipelines/6386/build-vespa/badge)](https://cd.screwdriver.cd/pipelines/6386)
+Vespa build status: [![Vespa Build Status](https://api.screwdriver.cd/v4/pipelines/6386/build-vespa/badge)](https://cd.screwdriver.cd/pipelines/6386)
## Table of contents
@@ -31,17 +31,15 @@ evaluate machine-learned models over the selected data, organize and aggregate i
than 100 milliseconds, all while the data corpus is continuously changing.
This is hard to do, especially with large data sets that needs to be distributed over multiple nodes and evaluated in
-parallel. Vespa is a platform which performs these operations for you with high availability and performance.
+parallel. Vespa is a platform which performs these operations for you with high availability and performance.
It has been in development for many years and is used on a number of large internet services and apps which serve
hundreds of thousands of queries from Vespa per second.
-
## Install
Run your own Vespa instance: [https://docs.vespa.ai/en/getting-started.html](https://docs.vespa.ai/en/getting-started.html)
Or deploy your Vespa applications to the cloud service: [https://cloud.vespa.ai](https://cloud.vespa.ai)
-
## Usage
- The application created in the getting started guide is fully functional and production ready, but you may want to [add more nodes](https://docs.vespa.ai/en/multinode-systems.html) for redundancy.
@@ -52,7 +50,6 @@ Or deploy your Vespa applications to the cloud service: [https://cloud.vespa.ai]
Full documentation is at [https://docs.vespa.ai](https://docs.vespa.ai).
-
## Contribute
We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) to learn how to contribute.
@@ -60,7 +57,6 @@ We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) to learn how to
If you want to contribute to the documentation, see
[https://github.com/vespa-engine/documentation](https://github.com/vespa-engine/documentation)
-
## Building
You do not need to build Vespa to use it, but if you want to contribute you need to be able to build the code.
@@ -83,7 +79,6 @@ for building Vespa, running unit tests and running system tests:
Use this if you only need to build the Java modules, otherwise follow the complete development guide above.
-
## License
Code licensed under the Apache 2.0 license. See [LICENSE](LICENSE) for terms.
diff --git a/TODO.md b/TODO.md
index 34d69f598d5..392c17fe616 100644
--- a/TODO.md
+++ b/TODO.md
@@ -1,29 +1,27 @@
<!-- Copyright Yahoo. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root. -->
+
# List of possible future enhancements and features
-This lists some possible improvements to Vespa which have been considered or requested, can be developed relatively
+This lists some possible improvements to Vespa which have been considered or requested, can be developed relatively
independently of other work, and are not yet under development. For more information on the code structure in Vespa, see
[Code-map.md](Code-map.md).
-
## Support query profiles for document processors
**Effort:** Low<br/>
**Difficulty:** Low<br/>
**Skills:** Java
-Query profiles make it simple to support multiple buckets, behavior profiles for different use cases etc by providing
-bundles of parameters accessible to Searchers processing queries. Writes go through a similar chain of processors -
-Document Processors, but have no equivalent support for parametrization. This is to allow configuration of document
+Query profiles make it simple to support multiple buckets, behavior profiles for different use cases etc. by providing
+bundles of parameters accessible to Searchers processing queries. Writes go through a similar chain of processors -
+Document Processors, but have no equivalent support for parametrization. This is to allow configuration of document
processor profiles by reusing the query profile support also for document processors.
-See [slack discussion](https://vespatalk.slack.com/archives/C01QNBPPNT1/p1624176344102300) for more details.
-
**Code pointers:**
+
- [Query profiles](https://github.com/vespa-engine/vespa/blob/master/container-search/src/main/java/com/yahoo/search/query/profile/QueryProfile.java)
- [Document processors](https://github.com/vespa-engine/vespa/blob/master/docproc/src/main/java/com/yahoo/docproc/DocumentProcessor.java)
-
## Java implementation of the content layer for testing
**Effort:** Medium<br/>
@@ -31,16 +29,16 @@ See [slack discussion](https://vespatalk.slack.com/archives/C01QNBPPNT1/p1624176
**Skills:** Java
There is currently support for creating Application instances programmatically in Java to unit test application package
-functionality (see com.yahoo.application.Application). However, only Java component functionality can be tested in this
-way as the content layer is not available, being implemented in C++. A Java implementation, of some or all of the
-functionality would enable developers to do more testing locally within their IDE. This is medium effort because
-performance is not a concern and some components, such as ranking expressions and features are already available as
+functionality (see com.yahoo.application.Application). However, only Java component functionality can be tested in this
+way as the content layer is not available, being implemented in C++. A Java implementation, of some or all of the
+functionality would enable developers to do more testing locally within their IDE. This is medium effort because
+performance is not a concern and some components, such as ranking expressions and features are already available as
libraries (see the searchlib module).
**Code pointers:**
-- Content cluster mock in Java (currently empy): [ContentCluster](https://github.com/vespa-engine/vespa/blob/master/application/src/main/java/com/yahoo/application/content/ContentCluster.java)
-- The model of a search definition this must consume config from: [Search](https://github.com/vespa-engine/vespa/blob/master/config-model/src/main/java/com/yahoo/searchdefinition/Search.java)
+- Content cluster mock in Java (currently empy): [ContentCluster](https://github.com/vespa-engine/vespa/blob/master/application/src/main/java/com/yahoo/application/content/ContentCluster.java)
+- The model of a search definition this must consume config from: [Search](https://github.com/vespa-engine/vespa/blob/master/config-model/src/main/java/com/yahoo/searchdefinition/Search.java)
## Indexed search in maps
@@ -48,12 +46,12 @@ libraries (see the searchlib module).
**Difficulty:** Medium<br/>
**Skills:** C++, multithreading, performance, indexing, data structures
-Vespa supports maps and and making them searchable in memory by declaring as an attribute.
-However, maps cannot be indexed as text-search disk indexes.
+Vespa supports maps and making them searchable in memory by declaring as an attribute.
+However, maps cannot be indexed as text-search disk indexes.
**Code pointers:**
-- [Current text indexes](https://github.com/vespa-engine/vespa/tree/master/searchlib/src/vespa/searchlib/index)
+- [Current text indexes](https://github.com/vespa-engine/vespa/tree/master/searchlib/src/vespa/searchlib/index)
## Global writes
@@ -61,24 +59,24 @@ However, maps cannot be indexed as text-search disk indexes.
**Difficulty:** High<br/>
**Skills:** C++, Java, distributed systems, performance, multithreading, network, distributed consistency
-Vespa instances distribute data automatically within clusters, but these clusters are meant to consist of co-located
-machines - the distribution algorithm is not suitable for global distribution across datacenters because it cannot
+Vespa instances distribute data automatically within clusters, but these clusters are meant to consist of co-located
+machines - the distribution algorithm is not suitable for global distribution across datacenters because it cannot
seamlessly tolerate datacenter-wide outages and does not attempt to minimize bandwidth usage between datacenters.
-Application usually achieve global precense instead by setting up multiple independent instances in different
-datacenters and write to all in parallel. This is robust and works well on average, but puts additional burden on
-applications to achieve cross-datacenter data consistency on datacenter failures, and does not enable automatic
-data recovery across datacenters, such that data redundancy is effectively required within each datacenter.
-This is fine in most cases, but not in the case where storage space drives cost and intermittent loss of data coverage
+Application usually achieve global presence instead by setting up multiple independent instances in different
+datacenters and write to all in parallel. This is robust and works well on average, but puts additional burden on
+applications to achieve cross-datacenter data consistency on datacenter failures, and does not enable automatic
+data recovery across datacenters, such that data redundancy is effectively required within each datacenter.
+This is fine in most cases, but not in the case where storage space drives cost and intermittent loss of data coverage
(completeness as seen from queries) is tolerable.
-A solution should sustain current write rates (tens of thousands of writes per ndoe per second), sustain write and read
-rates on loss of connectivity to one (any) data center, re-establish global data consistency when a lost datacenter is
-recovered and support some degree of tradeoff between consistency and operation latency (although the exact modes to be
+A solution should sustain current write rates (tens of thousands of writes per node per second), sustain write and read
+rates on loss of connectivity to one (any) data center, re-establish global data consistency when a lost datacenter is
+recovered and support some degree of tradeoff between consistency and operation latency (although the exact modes to be
supported is part of the design and analysis needed).
**Code pointers:**
-- [Document API](https://github.com/vespa-engine/vespa/tree/master/documentapi/src/main/java/com/yahoo/documentapi)
+- [Document API](https://github.com/vespa-engine/vespa/tree/master/documentapi/src/main/java/com/yahoo/documentapi)
## Global dynamic tensors
@@ -86,20 +84,20 @@ supported is part of the design and analysis needed).
**Difficulty:** High<br/>
**Skills:** Java, C++, distributed systems, performance, networking, distributed consistency
-Tensors in ranking models may either be passed with the query, be part of the document or be configured as part of the
-application package (global tensors). This is fine for many kinds of models but does not support the case of really
-large tensors (which barely fit in memory) and/or dynamically changing tensors (online learning of global models).
-These use cases require support for global tensors (tensors available locally on all content nodes during execution
-but not sent with the query or residing in documents) which are not configured as part of the application package but
-which are written independently and dynamically updateable at a high write rate. To support this at large scale, with a
-high write rate, we need a small cluster of nodes storing the source of truth of the global tensor and which have
-perfect consistency. This in turn must push updates to all content nodes in a best effort fashion given a fixed bandwidth
+Tensors in ranking models may either be passed with the query, be part of the document or be configured as part of the
+application package (global tensors). This is fine for many kinds of models but does not support the case of really
+large tensors (which barely fit in memory) and/or dynamically changing tensors (online learning of global models).
+These use cases require support for global tensors (tensors available locally on all content nodes during execution
+but not sent with the query or residing in documents) which are not configured as part of the application package but
+which are written independently and dynamically update-able at a high write rate. To support this at large scale, with a
+high write rate, we need a small cluster of nodes storing the source of truth of the global tensor and which have
+perfect consistency. This in turn must push updates to all content nodes in a best-effort fashion given a fixed bandwidth
budget, such that query execution and document write traffic is prioritized over ensuring perfect consistency of global
model updates.
**Code pointers:**
-- Tensor modify operation (for document tensors): [Java](https://github.com/vespa-engine/vespa/blob/master/document/src/main/java/com/yahoo/document/update/TensorModifyUpdate.java), [C++](https://github.com/vespa-engine/vespa/blob/master/document/src/vespa/document/update/tensor_modify_update.h)
+- Tensor modify operation (for document tensors): [Java](https://github.com/vespa-engine/vespa/blob/master/document/src/main/java/com/yahoo/document/update/TensorModifyUpdate.java), [C++](https://github.com/vespa-engine/vespa/blob/master/document/src/vespa/document/update/tensor_modify_update.h)
## Feed clients in different languages
@@ -115,7 +113,7 @@ throughput using this API to what the undocumented, custom-protocol /feedapi off
this changed with HTTP/2 support in Vespa. The clean design of /document/v1 makes it
easy to interface with from any language and runtime that support HTTP/2.
An implementation currently only exists for Java, and requires a JDK8+ runtime,
-and implementations in other languages are very welcome. The below psuedo-code could
+and implementations in other languages are very welcome. The below pseudocode could
be a starting point for an asynchronous implementation with futures and promises.
Let `http` be an asynchronous HTTP/2 client, which returns a `future` for each request.
@@ -154,4 +152,5 @@ dependents may be added, while the queue is emptied from the head one entry at a
a dependency (`previous`) completes computation. `enqueue` blocks until there is room in the client.
**Code pointers:**
+
- [Java feed client](https://github.com/vespa-engine/vespa/blob/master/vespa-feed-client-api/src/main/java/ai/vespa/feed/client/FeedClient.java)
diff --git a/screwdriver.yaml b/screwdriver.yaml
index b7afeaa72a2..9a291eff813 100644
--- a/screwdriver.yaml
+++ b/screwdriver.yaml
@@ -532,6 +532,6 @@ jobs:
--assume-extension --check-html --check-external-hash --no-enforce-http \
--typhoeus '{"connecttimeout": 10, "timeout": 30, "followlocation": false}' \
--hydra '{"max_concurrency": 1}' \
- --ignore-urls '/localhost:8080/,/127.0.0.1:3000/' \
+ --ignore-urls '/localhost:8080/,/127.0.0.1:3000/,/favicon.svg/' \
--swap-urls '(.*).md:\1.html' \
_site