summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorKristian Aune <kkraune@users.noreply.github.com>2023-09-21 08:42:20 +0200
committerGitHub <noreply@github.com>2023-09-21 08:42:20 +0200
commit92d656cb14e33c4aea1677241aa687bdc70d5bc1 (patch)
tree87933ae86f7facd0925971906e982d5a7f3e440b
parent3dc7da716e69046a2504aa59bb944bc33460e71a (diff)
parentddd41ffd14b897d31dc259ff88586ef907628e32 (diff)
Merge pull request #28586 from vespa-engine/kkraune/fix-links
Kkraune/fix links
-rw-r--r--CONTRIBUTING.md40
-rw-r--r--Code-map.md35
-rw-r--r--README.md17
-rw-r--r--TODO.md75
-rw-r--r--client/README.md50
-rw-r--r--document/doc/document-format.html1247
-rw-r--r--screwdriver.yaml2
7 files changed, 835 insertions, 631 deletions
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 9e18ffbf487..b96c7ee8a60 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -2,41 +2,41 @@
# Contributing to Vespa
-Contributions to [Vespa](http://github.com/vespa-engine/vespa),
-[Vespa system tests](http://github.com/vespa-engine/system-test),
+Contributions to [Vespa](https://github.com/vespa-engine/vespa),
+[Vespa system tests](https://github.com/vespa-engine/system-test),
[Vespa samples](https://github.com/vespa-engine/sample-apps)
-and the [Vespa documentation](http://github.com/vespa-engine/documentation) are welcome.
+and the [Vespa documentation](https://github.com/vespa-engine/documentation) are welcome.
This documents tells you what you need to know to contribute.
## Open development
-All work on Vespa happens directly on Github,
-using the [Github flow model](https://guides.github.com/introduction/flow/).
-We release the master branch four times a week and you should expect it to always work.
+All work on Vespa happens directly on GitHub,
+using the [GitHub flow model](https://docs.github.com/en/get-started/quickstart/github-flow).
+We release the master branch four times a week, and you should expect it to always work.
The continuous build of Vespa is at [https://factory.vespa.oath.cloud](https://factory.vespa.oath.cloud).
You can follow the fate of each commit there.
-All pull requests must be approved by a
+All pull requests must be approved by a
[Vespa Committer](https://github.com/orgs/vespa-engine/people).
You can find a suitable reviewer in the OWNERS file upward in the source tree from
where you are making the change (OWNERS have a special responsibility for
ensuring the long-term integrity of a portion of the code).
-The way to become a committer (and OWNER) is to make some quality contributions
+The way to become a committer (and OWNER) is to make some quality contributions
to an area of the code. See [GOVERNANCE](GOVERNANCE.md) for more details.
### Creating a Pull Request
-Please follow
-[best practices](https://github.com/trein/dev-best-practices/wiki/Git-Commit-Best-Practices)
+Please follow
+[best practices](https://github.com/trein/dev-best-practices/wiki/Git-Commit-Best-Practices)
for creating git commits.
-When your code is ready to be submitted,
-[submit a pull request](https://help.github.com/articles/creating-a-pull-request/)
+When your code is ready to be submitted,
+[submit a pull request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request)
to request a code review.
-We only seek to accept code that you are authorized to contribute to the project.
-We have added a pull request template on our projects so that your contributions are made
+We only seek to accept code that you are authorized to contribute to the project.
+We have added a pull request template on our projects so that your contributions are made
with the following confirmation:
> I confirm that this contribution is made under the terms of the license found in the root directory of this repository's source tree and that I have the authority necessary to make this contribution on behalf of its copyright owner.
@@ -44,15 +44,15 @@ with the following confirmation:
## Versioning
Vespa uses semantic versioning - see
-[vespa versions](http://docs.vespa.ai/en/vespa-versions.html).
+[vespa versions](https://vespa.ai/releases#versions).
Notice in particular that any Java API in a package having a @PublicAPI
annotation in the package-info, and no @Beta annotation on the class,
-cannot be changed in an incompatible way between major versions:
+cannot be changed in an incompatible way between major versions:
Existing types and method signatures must be preserved
(but can be marked deprecated).
We verify ABI compatibility during the regular Java build you'll run with Maven (mvn install).
-This build step will also fail if you *add* to public API's, which is fine if there's a good reason
+This build step will also fail if you _add_ to public APIs, which is fine if there's a good reason
to do it. In that case update the ABI spec as instructed in the error message.
## Issues
@@ -60,13 +60,13 @@ to do it. In that case update the ABI spec as instructed in the error message.
We track issues in [GitHub issues](https://github.com/vespa-engine/vespa/issues).
It is fine to submit issues also for feature requests and ideas, whether or not you intend to work on them.
-There is also a [ToDo list](TODO.md) for larger things nobody are working on yet.
+There is also a [ToDo list](TODO.md) for larger things nobody is working on yet.
## Community
-If you have questions, want to share your experience or help others,
+If you have questions, want to share your experience or help others,
join our [Slack channel](http://slack.vespa.ai).
-See also [Stack Overflow questions tagged Vespa](http://stackoverflow.com/questions/tagged/vespa),
+See also [Stack Overflow questions tagged Vespa](https://stackoverflow.com/questions/tagged/vespa),
and feel free to add your own.
### Getting started
diff --git a/Code-map.md b/Code-map.md
index 17c27327c5a..ffa72290094 100644
--- a/Code-map.md
+++ b/Code-map.md
@@ -5,11 +5,11 @@
You want to get familiar with the Vespa code base but don't know where to start?
Vespa consists of about 1.7 million lines of code, about equal parts Java and C++.
-Since it it's mostly written by a team of developers selected for their ability
-to do this kind of thing unusually well, who have been given time to dedicate
-themselves to it for a long time, it is mostly easily to work with. However, one
+Since it is mostly written by a team of developers selected for their ability
+to do this kind of thing unusually well, who have been given time to dedicate
+themselves to it for a long time, it is mostly easily to work with. However, one
thing we haven't done is to create a module structure friendly to newcomers - the code
-simply organized in a flat structure of about 150 modules.
+simply organized in a flat structure of about 150 modules.
This document aims to provide a map from the
[functional elements](https://docs.vespa.ai/en/overview.html)
@@ -18,22 +18,21 @@ of Vespa to the most important modules in the flat module structure in the
![Code map](Code-map.png)
-It covers the modules you are most likely to encounter as a developer.
-The rest are either small and needed for technical reasons or doing one thing
-which should be self-explanatory, or implementing the cloud service run by the
-Vespa team which we don't expect anybody else to run and therefore be interested
+It covers the modules you are most likely to encounter as a developer.
+The rest are either small and needed for technical reasons or doing one thing
+which should be self-explanatory, or implementing the cloud service run by the
+Vespa team which we don't expect anybody else to run and therefore be interested
in changing.
-
## The stateless container
When a request is made to Vespa it first enters some stateless container cluster,
called jDisc. This consists of:
-- a __jDisc core__ layer which provides a model of a running application, general protocol-independent request-response handling, with various protocol implementations,
-- a __jDisc container__ layer providing component management, configuration and similar.
-- a __search middleware__ layer containing query/result API's, query execution logic etc.
-- API's and modules for writing and processing document operations.
+- a **jDisc core** layer which provides a model of a running application, general protocol-independent request-response handling, with various protocol implementations,
+- a **jDisc container** layer providing component management, configuration and similar.
+- a **search middleware** layer containing query/result APIs, query execution logic etc.
+- APIs and modules for writing and processing document operations.
The stateless container is implemented in Java.
@@ -58,7 +57,7 @@ Document operation modules:
- [container-messagebus](https://github.com/vespa-engine/vespa/tree/master/container-messagebus) - MessageBus connector for jDisc.
- [documentapi](https://github.com/vespa-engine/vespa/tree/master/documentapi) - API for issuing document operations to Vespa over messagebus.
- [docproc](https://github.com/vespa-engine/vespa/tree/master/docproc) - chainable document (operation) processors: Document operations issued over messagebus to Vespa will usually be routed through a container running a document processor chain.
-- [indexinglanguage](https://github.com/vespa-engine/vespa/tree/master/indexinglanguage) - implementation of the "indexing" language which is used to express the statements prefixed by "indexing:" in the search definition.
+- [indexinglanguage](https://github.com/vespa-engine/vespa/tree/master/indexinglanguage) - implementation of the "indexing" language which is used to express the statements prefixed by "indexing:" in the search definition.
- [docprocs](https://github.com/vespa-engine/vespa/tree/master/docprocs) - document processor components bundled with Vespa. Notably the Indexingprocessor - a document processor invoking the indexing language statements configured for the document type in question on document operations.
- [vespaclient-container-plugin](https://github.com/vespa-engine/vespa/tree/master/vespaclient-container-plugin) - implements the document/v1 API and internal API used by the Java HTTP client on top of the jDisc container, forwarding to the Document API.
- [vespa-feed-client](https://github.com/vespa-engine/vespa/tree/master/vespa-feed-client) - client for fast writing to the internal API implemented by vespaclient-container-plugin.
@@ -72,9 +71,8 @@ This is written in C++.
- [searchlib](https://github.com/vespa-engine/vespa/tree/master/searchlib) - libraries invoked by searchcore: Ranking (feature execution framework (fef), rank feature implementations, ranking expressions), index and btree implementations, attributes (forward indexes) etc. In addition, this contains the Java libraries for ranking.
- [storage](https://github.com/vespa-engine/vespa/tree/master/storage/src/vespa/storage) - system for elastic and auto-recovering data storage over clusters of nodes.
- [eval](https://github.com/vespa-engine/vespa/tree/master/eval) - library for efficient evaluation of ranking expressions. Tensor API and implementation.
-- [storageapi](https://github.com/vespa-engine/vespa/tree/master/storageapi/src/vespa/storageapi) - message bus messages and implementation for the document API.
-- [clustercontroller-core](https://github.com/vespa-engine/vespa/tree/master/clustercontroller-core) - cluster controller for storage, implemented in Java. This provides singular node-level decision making for storage, based on ZooKeeper.
-
+- [storageapi](https://github.com/vespa-engine/vespa/tree/master/storage/src/vespa/storageapi) - message bus messages and implementation for the document API.
+- [clustercontroller-core](https://github.com/vespa-engine/vespa/tree/master/clustercontroller-core) - cluster controller for storage, implemented in Java. This provides singular node-level decision-making for storage, based on ZooKeeper.
## Configuration and administration
@@ -94,6 +92,3 @@ Libraries used throughput the code.
- [vespalib](https://github.com/vespa-engine/vespa/tree/master/vespalib) - general utility library for C++
- [vespajlib](https://github.com/vespa-engine/vespa/tree/master/vespajlib) - general utility library for Java. Includes the Java implementation of the tensor library.
-
-
-
diff --git a/README.md b/README.md
index df9e1109a2b..a684aae70dc 100644
--- a/README.md
+++ b/README.md
@@ -8,12 +8,12 @@ over big data at serving time.
This is the primary repository for Vespa where all development is happening.
New production releases from this repository's master branch are made each weekday from Monday through Thursday.
-* Home page: [https://vespa.ai](https://vespa.ai)
-* Documentation: [https://docs.vespa.ai](https://docs.vespa.ai)
-* Continuous build: [https://factory.vespa.oath.cloud](https://factory.vespa.oath.cloud)
-* Run applications in the cloud for free: [https://cloud.vespa.ai](https://cloud.vespa.ai)
+- Home page: [https://vespa.ai](https://vespa.ai)
+- Documentation: [https://docs.vespa.ai](https://docs.vespa.ai)
+- Continuous build: [https://factory.vespa.oath.cloud](https://factory.vespa.oath.cloud)
+- Run applications in the cloud for free: [https://cloud.vespa.ai](https://cloud.vespa.ai)
-Vespa build status: [![Vespa Build Status](https://cd.screwdriver.cd/pipelines/6386/build-vespa/badge)](https://cd.screwdriver.cd/pipelines/6386)
+Vespa build status: [![Vespa Build Status](https://api.screwdriver.cd/v4/pipelines/6386/build-vespa/badge)](https://cd.screwdriver.cd/pipelines/6386)
## Table of contents
@@ -31,17 +31,15 @@ evaluate machine-learned models over the selected data, organize and aggregate i
than 100 milliseconds, all while the data corpus is continuously changing.
This is hard to do, especially with large data sets that needs to be distributed over multiple nodes and evaluated in
-parallel. Vespa is a platform which performs these operations for you with high availability and performance.
+parallel. Vespa is a platform which performs these operations for you with high availability and performance.
It has been in development for many years and is used on a number of large internet services and apps which serve
hundreds of thousands of queries from Vespa per second.
-
## Install
Run your own Vespa instance: [https://docs.vespa.ai/en/getting-started.html](https://docs.vespa.ai/en/getting-started.html)
Or deploy your Vespa applications to the cloud service: [https://cloud.vespa.ai](https://cloud.vespa.ai)
-
## Usage
- The application created in the getting started guide is fully functional and production ready, but you may want to [add more nodes](https://docs.vespa.ai/en/multinode-systems.html) for redundancy.
@@ -52,7 +50,6 @@ Or deploy your Vespa applications to the cloud service: [https://cloud.vespa.ai]
Full documentation is at [https://docs.vespa.ai](https://docs.vespa.ai).
-
## Contribute
We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) to learn how to contribute.
@@ -60,7 +57,6 @@ We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) to learn how to
If you want to contribute to the documentation, see
[https://github.com/vespa-engine/documentation](https://github.com/vespa-engine/documentation)
-
## Building
You do not need to build Vespa to use it, but if you want to contribute you need to be able to build the code.
@@ -83,7 +79,6 @@ for building Vespa, running unit tests and running system tests:
Use this if you only need to build the Java modules, otherwise follow the complete development guide above.
-
## License
Code licensed under the Apache 2.0 license. See [LICENSE](LICENSE) for terms.
diff --git a/TODO.md b/TODO.md
index 34d69f598d5..392c17fe616 100644
--- a/TODO.md
+++ b/TODO.md
@@ -1,29 +1,27 @@
<!-- Copyright Yahoo. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root. -->
+
# List of possible future enhancements and features
-This lists some possible improvements to Vespa which have been considered or requested, can be developed relatively
+This lists some possible improvements to Vespa which have been considered or requested, can be developed relatively
independently of other work, and are not yet under development. For more information on the code structure in Vespa, see
[Code-map.md](Code-map.md).
-
## Support query profiles for document processors
**Effort:** Low<br/>
**Difficulty:** Low<br/>
**Skills:** Java
-Query profiles make it simple to support multiple buckets, behavior profiles for different use cases etc by providing
-bundles of parameters accessible to Searchers processing queries. Writes go through a similar chain of processors -
-Document Processors, but have no equivalent support for parametrization. This is to allow configuration of document
+Query profiles make it simple to support multiple buckets, behavior profiles for different use cases etc. by providing
+bundles of parameters accessible to Searchers processing queries. Writes go through a similar chain of processors -
+Document Processors, but have no equivalent support for parametrization. This is to allow configuration of document
processor profiles by reusing the query profile support also for document processors.
-See [slack discussion](https://vespatalk.slack.com/archives/C01QNBPPNT1/p1624176344102300) for more details.
-
**Code pointers:**
+
- [Query profiles](https://github.com/vespa-engine/vespa/blob/master/container-search/src/main/java/com/yahoo/search/query/profile/QueryProfile.java)
- [Document processors](https://github.com/vespa-engine/vespa/blob/master/docproc/src/main/java/com/yahoo/docproc/DocumentProcessor.java)
-
## Java implementation of the content layer for testing
**Effort:** Medium<br/>
@@ -31,16 +29,16 @@ See [slack discussion](https://vespatalk.slack.com/archives/C01QNBPPNT1/p1624176
**Skills:** Java
There is currently support for creating Application instances programmatically in Java to unit test application package
-functionality (see com.yahoo.application.Application). However, only Java component functionality can be tested in this
-way as the content layer is not available, being implemented in C++. A Java implementation, of some or all of the
-functionality would enable developers to do more testing locally within their IDE. This is medium effort because
-performance is not a concern and some components, such as ranking expressions and features are already available as
+functionality (see com.yahoo.application.Application). However, only Java component functionality can be tested in this
+way as the content layer is not available, being implemented in C++. A Java implementation, of some or all of the
+functionality would enable developers to do more testing locally within their IDE. This is medium effort because
+performance is not a concern and some components, such as ranking expressions and features are already available as
libraries (see the searchlib module).
**Code pointers:**
-- Content cluster mock in Java (currently empy): [ContentCluster](https://github.com/vespa-engine/vespa/blob/master/application/src/main/java/com/yahoo/application/content/ContentCluster.java)
-- The model of a search definition this must consume config from: [Search](https://github.com/vespa-engine/vespa/blob/master/config-model/src/main/java/com/yahoo/searchdefinition/Search.java)
+- Content cluster mock in Java (currently empy): [ContentCluster](https://github.com/vespa-engine/vespa/blob/master/application/src/main/java/com/yahoo/application/content/ContentCluster.java)
+- The model of a search definition this must consume config from: [Search](https://github.com/vespa-engine/vespa/blob/master/config-model/src/main/java/com/yahoo/searchdefinition/Search.java)
## Indexed search in maps
@@ -48,12 +46,12 @@ libraries (see the searchlib module).
**Difficulty:** Medium<br/>
**Skills:** C++, multithreading, performance, indexing, data structures
-Vespa supports maps and and making them searchable in memory by declaring as an attribute.
-However, maps cannot be indexed as text-search disk indexes.
+Vespa supports maps and making them searchable in memory by declaring as an attribute.
+However, maps cannot be indexed as text-search disk indexes.
**Code pointers:**
-- [Current text indexes](https://github.com/vespa-engine/vespa/tree/master/searchlib/src/vespa/searchlib/index)
+- [Current text indexes](https://github.com/vespa-engine/vespa/tree/master/searchlib/src/vespa/searchlib/index)
## Global writes
@@ -61,24 +59,24 @@ However, maps cannot be indexed as text-search disk indexes.
**Difficulty:** High<br/>
**Skills:** C++, Java, distributed systems, performance, multithreading, network, distributed consistency
-Vespa instances distribute data automatically within clusters, but these clusters are meant to consist of co-located
-machines - the distribution algorithm is not suitable for global distribution across datacenters because it cannot
+Vespa instances distribute data automatically within clusters, but these clusters are meant to consist of co-located
+machines - the distribution algorithm is not suitable for global distribution across datacenters because it cannot
seamlessly tolerate datacenter-wide outages and does not attempt to minimize bandwidth usage between datacenters.
-Application usually achieve global precense instead by setting up multiple independent instances in different
-datacenters and write to all in parallel. This is robust and works well on average, but puts additional burden on
-applications to achieve cross-datacenter data consistency on datacenter failures, and does not enable automatic
-data recovery across datacenters, such that data redundancy is effectively required within each datacenter.
-This is fine in most cases, but not in the case where storage space drives cost and intermittent loss of data coverage
+Application usually achieve global presence instead by setting up multiple independent instances in different
+datacenters and write to all in parallel. This is robust and works well on average, but puts additional burden on
+applications to achieve cross-datacenter data consistency on datacenter failures, and does not enable automatic
+data recovery across datacenters, such that data redundancy is effectively required within each datacenter.
+This is fine in most cases, but not in the case where storage space drives cost and intermittent loss of data coverage
(completeness as seen from queries) is tolerable.
-A solution should sustain current write rates (tens of thousands of writes per ndoe per second), sustain write and read
-rates on loss of connectivity to one (any) data center, re-establish global data consistency when a lost datacenter is
-recovered and support some degree of tradeoff between consistency and operation latency (although the exact modes to be
+A solution should sustain current write rates (tens of thousands of writes per node per second), sustain write and read
+rates on loss of connectivity to one (any) data center, re-establish global data consistency when a lost datacenter is
+recovered and support some degree of tradeoff between consistency and operation latency (although the exact modes to be
supported is part of the design and analysis needed).
**Code pointers:**
-- [Document API](https://github.com/vespa-engine/vespa/tree/master/documentapi/src/main/java/com/yahoo/documentapi)
+- [Document API](https://github.com/vespa-engine/vespa/tree/master/documentapi/src/main/java/com/yahoo/documentapi)
## Global dynamic tensors
@@ -86,20 +84,20 @@ supported is part of the design and analysis needed).
**Difficulty:** High<br/>
**Skills:** Java, C++, distributed systems, performance, networking, distributed consistency
-Tensors in ranking models may either be passed with the query, be part of the document or be configured as part of the
-application package (global tensors). This is fine for many kinds of models but does not support the case of really
-large tensors (which barely fit in memory) and/or dynamically changing tensors (online learning of global models).
-These use cases require support for global tensors (tensors available locally on all content nodes during execution
-but not sent with the query or residing in documents) which are not configured as part of the application package but
-which are written independently and dynamically updateable at a high write rate. To support this at large scale, with a
-high write rate, we need a small cluster of nodes storing the source of truth of the global tensor and which have
-perfect consistency. This in turn must push updates to all content nodes in a best effort fashion given a fixed bandwidth
+Tensors in ranking models may either be passed with the query, be part of the document or be configured as part of the
+application package (global tensors). This is fine for many kinds of models but does not support the case of really
+large tensors (which barely fit in memory) and/or dynamically changing tensors (online learning of global models).
+These use cases require support for global tensors (tensors available locally on all content nodes during execution
+but not sent with the query or residing in documents) which are not configured as part of the application package but
+which are written independently and dynamically update-able at a high write rate. To support this at large scale, with a
+high write rate, we need a small cluster of nodes storing the source of truth of the global tensor and which have
+perfect consistency. This in turn must push updates to all content nodes in a best-effort fashion given a fixed bandwidth
budget, such that query execution and document write traffic is prioritized over ensuring perfect consistency of global
model updates.
**Code pointers:**
-- Tensor modify operation (for document tensors): [Java](https://github.com/vespa-engine/vespa/blob/master/document/src/main/java/com/yahoo/document/update/TensorModifyUpdate.java), [C++](https://github.com/vespa-engine/vespa/blob/master/document/src/vespa/document/update/tensor_modify_update.h)
+- Tensor modify operation (for document tensors): [Java](https://github.com/vespa-engine/vespa/blob/master/document/src/main/java/com/yahoo/document/update/TensorModifyUpdate.java), [C++](https://github.com/vespa-engine/vespa/blob/master/document/src/vespa/document/update/tensor_modify_update.h)
## Feed clients in different languages
@@ -115,7 +113,7 @@ throughput using this API to what the undocumented, custom-protocol /feedapi off
this changed with HTTP/2 support in Vespa. The clean design of /document/v1 makes it
easy to interface with from any language and runtime that support HTTP/2.
An implementation currently only exists for Java, and requires a JDK8+ runtime,
-and implementations in other languages are very welcome. The below psuedo-code could
+and implementations in other languages are very welcome. The below pseudocode could
be a starting point for an asynchronous implementation with futures and promises.
Let `http` be an asynchronous HTTP/2 client, which returns a `future` for each request.
@@ -154,4 +152,5 @@ dependents may be added, while the queue is emptied from the head one entry at a
a dependency (`previous`) completes computation. `enqueue` blocks until there is room in the client.
**Code pointers:**
+
- [Java feed client](https://github.com/vespa-engine/vespa/blob/master/vespa-feed-client-api/src/main/java/ai/vespa/feed/client/FeedClient.java)
diff --git a/client/README.md b/client/README.md
index 8403e02a485..82fa32bb2d5 100644
--- a/client/README.md
+++ b/client/README.md
@@ -3,60 +3,60 @@
![Vespa logo](https://vespa.ai/assets/vespa-logo-color.png)
# Vespa clients
-This part of the Vespa repository got Vespa client implementations for operations like
-* deploy
-* read/write
-* query
-<!-- ToDo: illustration -->
+This part of the Vespa repository got Vespa client implementations for operations like
+- deploy
+- read/write
+- query
+<!-- ToDo: illustration -->
## Vespa CLI
+
The Vespa command-line tool, see the [README](go/README.md).
Use the Vespa CLI to deploy, feed and query a Vespa application,
for local, self-hosted or [Vespa Cloud](https://cloud.vespa.ai/) instances.
-
-
## pyvespa
-[pyvespa](https://pyvespa.readthedocs.io/) provides a python API to Vespa -
+
+[pyvespa](https://pyvespa.readthedocs.io/en/latest/) provides a python API to Vespa -
use it to create, modify, deploy and interact with running Vespa instances.
The main pyvespa goal is to allow for faster prototyping
and to facilitate Machine Learning experiments for Vespa applications.
-
-
## Vespa FE (fixme: better name and description here)
-This is a [work-in-progress javascript app](js/app) for querying a Vespa application.
-
+This is a [work-in-progress javascript app](js/app) for querying a Vespa application.
-----
+---
## Misc
<!-- ToDo: move this / demote this somehow -->
+
### vespa_query_dsl
+
This lib is used for composing Vespa
[YQL queries](https://docs.vespa.ai/en/reference/query-language-reference.html).
For usage, refer to the [QTest.java](src/test/java/ai/vespa/client/dsl/QTest.java) unit test.
ToDos:
+
- [ ] support `predicate` (https://docs.vespa.ai/en/predicate-fields.html)
- [ ] support methods for checking positive/negative conditions for specific field
-- [X] support order by annotation
-- [X] support order by
-- [X] support sub operators in contains (sameElement, phrase, near, onear, equiv)
-- [X] support group syntax
-- [X] support `nonEmpty`
-- [X] support `dotProduct`
-- [X] support `weightedSet`
-- [X] support `wand`
-- [X] support `weakAnd`
+- [x] support order by annotation
+- [x] support order by
+- [x] support sub operators in contains (sameElement, phrase, near, onear, equiv)
+- [x] support group syntax
+- [x] support `nonEmpty`
+- [x] support `dotProduct`
+- [x] support `weightedSet`
+- [x] support `wand`
+- [x] support `weakAnd`
- [x] support `userInput`
- [x] support `rank`
- [x] support filter annotation
-- [X] unit tests
-- [X] support other annotations
-- [X] handle edge cases (e.g. `Q.b("test").contains("a").build()`)
+- [x] unit tests
+- [x] support other annotations
+- [x] handle edge cases (e.g. `Q.b("test").contains("a").build()`)
diff --git a/document/doc/document-format.html b/document/doc/document-format.html
index ce985b8a10d..bf67f1723e8 100644
--- a/document/doc/document-format.html
+++ b/document/doc/document-format.html
@@ -1,546 +1,759 @@
<!-- Copyright Yahoo. Licensed under the terms of the Apache 2.0 license. See LICENSE in the project root. -->
-<html>
-<head>
-<title>Developers guide to the serialized document format</title>
-</head>
-<body>
-<h1>Developers guide to the serialized document format</h1>
+<html lang="en">
+ <head>
+ <title>Developers guide to the serialized document format</title>
+ </head>
+ <body>
+ <h1>Developers guide to the serialized document format</h1>
-<p>When a Vespa document is stored or transferred from one application to
-another, it is serialized. The serialization format tries to achieve
-serialization robustness and speed. The most important fields are kept in a
-header that is accessible at low cost. The other fields are located by table
-look-ups.</p>
+ <p>
+ When a Vespa document is stored or transferred from one application to
+ another, it is serialized. The serialization format tries to achieve
+ serialization robustness and speed. The most important fields are kept in
+ a header that is accessible at low cost. The other fields are located by
+ table look-ups.
+ </p>
-<h2>Purpose</h2>
+ <h2>Purpose</h2>
-<p>The purpose of the serialized format is
-<ul>
-<li><b>Robustness</b>. The format shall detect errors gracefully.</li>
-<li><b>Speed</b>. Deserialization shall be fast, especially for basic fields like <b>DocumentId</b>.</li>
-<li><b>Size</b>. The serialized format shall be compact and allow for efficient storage and transfer.
-</ul>
-</p>
+ <p>The purpose of the serialized format is</p>
+ <ul>
+ <li><b>Robustness</b>. The format shall detect errors gracefully.</li>
+ <li>
+ <b>Speed</b>. Deserialization shall be fast, especially for basic fields
+ like <b>DocumentId</b>.
+ </li>
+ <li>
+ <b>Size</b>. The serialized format shall be compact and allow for
+ efficient storage and transfer.
+ </li>
+ </ul>
-<p><strong>All fields are in network byte order.</strong></p>
+ <p><strong>All fields are in network byte order.</strong></p>
-<h2>Changelog</h2>
+ <h2>Changelog</h2>
-<h3>Current version: 8</h3>
+ <h3>Current version: 8</h3>
-<ul>
-<li>CRC removed from document format. There used to be a 4 byte CRC in the end
-of a header or header + body serialization, calculated as a crc32 of all the
-other data in the serialization. This CRC was included in the document length.
-</ul>
+ <ul>
+ <li>
+ CRC removed from document format. There used to be a 4 byte CRC in the
+ end of a header or header + body serialization, calculated as a crc32 of
+ all the other data in the serialization. This CRC was included in the
+ document length.
+ </li>
+ </ul>
+ <h3>Version: 7</h3>
-<h3>Version: 7</h3>
+ <ul>
+ <li>
+ The document length is now a static sized 4 byte value, instead of a
+ variable 2,4,8 byte value.
+ </li>
+ <li>
+ (Anything else? I wrote this changelog when bopping from 7 to 8. Dunno
+ if more was changed in 7.)
+ </li>
+ </ul>
-<ul>
-<li>The document length is now a static sized 4 byte value, instead of a variable 2,4,8 byte value.
-<li>(Anything else? I wrote this changelog when bopping from 7 to 8. Dunno if more was changed in 7.)
-</ul>
+ <h3>Version: 6</h3>
-<h3>Version: 6</h3>
+ This is the oldest version that we currently support. No known installation
+ stores documents with a version smaller than this.
-This is the oldest version that we currently support. No known installation stores documents with a version smaller than this.
+ <h2>Document Format</h2>
-<h2>Document Format</h2>
+ <p>This is the description of the serialized document format.</p>
-<p>This is the description of the serialized document format.</p>
+ <table border="1" cellspacing="0" cellpadding="1%" width="100%">
+ <caption>
+ <em>Document serialization format</em>
+ </caption>
+ <tr>
+ <td width="10%"><b>Field</b></td>
+ <td width="10%"><b>Type</b></td>
+ <td width="10%"><b>Length</b></td>
+ <td><b>Description</b></td>
+ </tr>
+ <tr>
+ <td>Version</td>
+ <td>Short integer</td>
+ <td>2</td>
+ <td>Version number. Current is 6.</td>
+ </tr>
+ <tr>
+ <td>Length</td>
+ <td>Integer</td>
+ <td>4 bytes</td>
+ <td>Total length of object (excluding this field and version).</td>
+ </tr>
+ <tr>
+ <td>Document ID</td>
+ <td>Bytes</td>
+ <td>&nbsp;</td>
+ <td>Unique ID for document. 0-terminated string, UTF-8 encoding.</td>
+ </tr>
+ <tr>
+ <td>Field Map</td>
+ <td>Bytes</td>
+ <td>See below</td>
+ <td>
+ Placeholder for fields. (Note: Fieldmaps may contain other fieldmaps)
+ </td>
+ </tr>
+ </table>
-<table border="1" cellspacing="0" cellpadding="1%" width="100%">
-<caption><em>Document serialization format</em></caption>
-<tr>
-<td width="10%"><b>Field</td>
-<td width="10%"><b>Type</td>
-<td width="10%"><b>Length</td>
-<td><b>Description</td>
-</tr>
-<tr><td>Version</td>
-<td>Short integer</td>
-<td>2</td>
-<td>Version number. Current is 6.</td>
-</tr>
-<tr><td>Length</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>Total length of object (excluding this field and version).</td>
-</tr>
-<tr><td>Document ID</td>
-<td>Bytes</td>
-<td>&nbsp;</td>
-<td>Unique ID for document. 0-terminated string, UTF-8 encoding.</td>
-</tr>
-<tr><td>Field Map</td>
-<td>Bytes</td>
-<td>See below</td><td>Placeholder for fields. (Note: Fieldmaps may contain other fieldmaps)</td>
-</tr>
-</table>
+ <p>Field maps are serialized like this</p>
+ <p></p>
-<p>Field maps are serialized like this</p></p><p>
+ <table border="1" cellspacing="0" cellpadding="1%" width="100%">
+ <caption>
+ <em>Fieldmap serialization format</em>
+ </caption>
+ <tr>
+ <td width="10%"><b>Field</b></td>
+ <td width="10%"><b>Type</b></td>
+ <td width="10%"><b>Length (bytes)</b></td>
+ <td><b>Description</b></td>
+ </tr>
+ <tr>
+ <td>Inventory bit mask</td>
+ <td>Byte</td>
+ <td>1</td>
+ <td>
+ Inventory bits describing the FieldMap element with data:<br />
+ &nbsp;Bit 0 set: FieldMap has document type <br />
+ &nbsp;Bit 1 set: FieldMap has header fields <br />
+ &nbsp;Bit 2 set: FieldMap has body fields <br />
+ &nbsp;Bit 3 set: FieldMap has external body fields<br />
+ </td>
+ </tr>
-<table border="1" cellspacing="0" cellpadding="1%" width="100%">
-<caption><em>Fieldmap serialization format</em></caption>
-<tr>
-<td width="10%"><b>Field</td>
-<td width="10%"><b>Type</td>
-<td width="10%"><b>Length (bytes)</td>
-<td><b>Description</td>
-</tr>
-<tr><td>Inventory bit mask</td>
-<td>Byte</td>
-<td>1</td>
-<td>
-Inventory bits describing the FieldMap element with data:<br>
-&nbsp;Bit 0 set: FieldMap has document type <br>
-&nbsp;Bit 1 set: FieldMap has header fields <br>
-&nbsp;Bit 2 set: FieldMap has body fields <br>
-&nbsp;Bit 3 set: FieldMap has external body fields<br>
-</tr>
-<tr><td colspan = "4"><b>Below section is present when bit 0 of inventory is set</b></td></tr>
-<tr><td>Document Type</td>
-<td>Bytes</td>
-<td>&nbsp;</td>
-<td>Document type. (0-terminated string, UTF-8 encoding.)</td>
-</tr>
-<tr><td>Version</td>
-<td>Short integer</td>
-<td>2</td>
-<td>Document type version number.</td></tr>
-<tr><td colspan = "4"><b>Below section is present when bit 1 of inventory is set</b></td></tr>
-<tr><td>Header data</td>
-<td>Data array</td>
-<td>See below</td>
-<td>Header data packed in data array</td></tr>
-<tr><td colspan = "4"><b>Below section is present when bit 2 of inventory is set</b></td></tr>
-<tr><td>Body data</td>
-<td>Data array</td>
-<td>See below</td>
-<td>Body data packed in data array</td></tr>
-</table>
+ <tr>
+ <td colspan="4">
+ <b>Below section is present when bit 0 of inventory is set</b>
+ </td>
+ </tr>
+ <tr>
+ <td>Document Type</td>
+ <td>Bytes</td>
+ <td>&nbsp;</td>
+ <td>Document type. (0-terminated string, UTF-8 encoding.)</td>
+ </tr>
+ <tr>
+ <td>Version</td>
+ <td>Short integer</td>
+ <td>2</td>
+ <td>Document type version number.</td>
+ </tr>
+ <tr>
+ <td colspan="4">
+ <b>Below section is present when bit 1 of inventory is set</b>
+ </td>
+ </tr>
+ <tr>
+ <td>Header data</td>
+ <td>Data array</td>
+ <td>See below</td>
+ <td>Header data packed in data array</td>
+ </tr>
+ <tr>
+ <td colspan="4">
+ <b>Below section is present when bit 2 of inventory is set</b>
+ </td>
+ </tr>
+ <tr>
+ <td>Body data</td>
+ <td>Data array</td>
+ <td>See below</td>
+ <td>Body data packed in data array</td>
+ </tr>
+ </table>
+ <p></p>
+ <table border="1" cellspacing="0" cellpadding="1%" width="100%">
+ <caption>
+ <em>Data array serialization</em>
+ </caption>
+ <tr>
+ <td width="10%"><b>Field</b></td>
+ <td width="10%"><b>Type</b></td>
+ <td width="10%"><b>Length (bytes)</b></td>
+ <td><b>Description</b></td>
+ </tr>
+ <tr>
+ <td>Data length</td>
+ <td>Integer_2_4_8</td>
+ <td>2, 4 or 8</td>
+ <td>
+ Length of data block (see below). NOTE THAT THIS LENGTH INCLUDE
+ ITSELF.
+ </td>
+ </tr>
+ <tr>
+ <td>Number of fields</td>
+ <td>Integer_1_4</td>
+ <td>1 or 4</td>
+ <td>Number of fields in data array</td>
+ </tr>
-<p>
-<table border="1" cellspacing="0" cellpadding="1%" width="100%">
-<caption><em>Data array serialization</em></caption>
-<tr>
-<td width="10%"><b>Field</td>
-<td width="10%"><b>Type</td>
-<td width="10%"><b>Length (bytes)</td>
-<td><b>Description</b></td>
-</tr>
-<tr><td>Data length</td>
-<td>Integer_2_4_8</td>
-<td>2, 4 or 8</td>
-<td>Length of data block (see below). NOTE THAT THIS LENGTH INCLUDE ITSELF.</td>
-</tr>
-<tr><td>Number of fields<td>Integer_1_4</td>
-<td>1 or 4</td>
-<td>Number of fields in data array</td>
-<tr><td colspan = "4"><b>Below block is repeated "Number of fields" times</b></td></tr>
-<tr><td>Field ID<td>Integer_1_4</td>
-<td>1 or 4</td>
-<td>ID of field.</td>
-<tr><td>Field Size<td>Integer_1_2_4</td>
-<td>1, 2 or 4</td>
-<td>Length of field.</td>
-</td>
-<tr><td colspan = "4"><b>End of repeated block </b></td></tr>
-<tr><td>Data block<td>Bytes</td>
-<td>&nbsp;</td>
-<td>The data block.<br>
-&nbsp; - Items are ordered the same way field array is sorted.<br>
-&nbsp; - Use lengths from field array above to find item offset and length.<br>
-&nbsp; - If the block is compressed, lengths refer to decompressed version</td>
-</table>
+ <tr>
+ <td colspan="4">
+ <b>Below block is repeated "Number of fields" times</b>
+ </td>
+ </tr>
+ <tr>
+ <td>Field ID</td>
+ <td>Integer_1_4</td>
+ <td>1 or 4</td>
+ <td>ID of field.</td>
+ </tr>
+ <tr>
+ <td>Field Size</td>
+ <td>Integer_1_2_4</td>
+ <td>1, 2 or 4</td>
+ <td>Length of field.</td>
+ </tr>
+ <tr>
+ <td colspan="4"><b>End of repeated block </b></td>
+ </tr>
+ <tr>
+ <td>Data block</td>
+ <td>Bytes</td>
+ <td>&nbsp;</td>
+ <td>
+ The data block.<br />
+ &nbsp; - Items are ordered the same way field array is sorted.<br />
+ &nbsp; - Use lengths from field array above to find item offset and
+ length.<br />
+ &nbsp; - If the block is compressed, lengths refer to decompressed
+ version
+ </td>
+ </tr>
+ </table>
+ <table border="1" cellspacing="0" cellpadding="1%" width="100%">
+ <caption>
+ <em>Data type serialization</em>
+ </caption>
+ <tr>
+ <td width="15%"><b>Data type</b></td>
+ <td width="10%"><b>Length</b></td>
+ <td><b>Serialization</b></td>
+ </tr>
+ <tr>
+ <td>Integer (ID 0)</td>
+ <td>4</td>
+ <td>Signed integer, two's complement notation, network byte order.</td>
+ </tr>
+ <tr>
+ <td>Floating point number (ID 1)</td>
+ <td>4</td>
+ <td>IEEE 754, single precision, network byte order.</td>
+ </tr>
+ <tr>
+ <td>String (ID 2)</td>
+ <td>1 + (1 or 4) + length + 1</td>
+ <td>
+ Strings are serialization format:<br />
+ &nbsp;- First byte represents coding. This has traditionally denoted
+ the maximum number of bits per character in the UTF-8 encoded string,
+ but has never been used in deserialization code.
+ <ul>
+ <li>Set to 32 if not used.</li>
+ <li>
+ Set to &lt;32 if you know the UTF-8 string uses less bits per
+ character; e.g. ASCII could use 8.
+ </li>
+ <li>
+ Set bit 6 (decimal 64) if the string has an
+ <a href="#annotations">annotation tree</a>.
+ </li>
+ </ul>
+ <br />
+ &nbsp;- Integer_1_4 with length of string. <br />
+ &nbsp;- The string (UTF-8 encoding), including 0-terminating byte.<br />
+ &nbsp;- An annotation tree, if bit 6 (decimal 64) of coding byte is
+ set:
+ <ul>
+ <li>
+ total length of all span trees excl. itself:
+ <strong>uint32</strong>
+ </li>
+ <li>number of span trees <strong>int_1_2_4</strong></li>
+ <li>
+ for each root node:
+ <ol>
+ <li>tree name serialized as String</li>
+ <li>
+ serialized SpanNode as given below, see
+ <a href="#annotations">annotation serialization</a>
+ </li>
+ </ol>
+ </li>
+ </ul>
+ </td>
+ </tr>
+ <tr>
+ <td>Raw bytes (ID 3)</td>
+ <td>Length of buffer</td>
+ <td>Byte for byte copy</td>
+ </tr>
+ <tr>
+ <td>Long integer (ID 4)</td>
+ <td>8</td>
+ <td>Signed integer, two's complement notation, network byte order.</td>
+ </tr>
+ <tr>
+ <td>Double floating point number (ID 5)</td>
+ <td>8</td>
+ <td>IEEE 754, double precision, network byte order.</td>
+ </tr>
+ <tr>
+ <td>Array (ID 6)</td>
+ <td>At least 8 bytes</td>
+ <td>
+ Arrays of any fields are serialized like this:<br />
+ &nbsp;- 4 bytes: Data type array consists of <br />
+ &nbsp;- 4 bytes: Number of elements in array<br />
+ &nbsp;Below sequence is repeated "number of element" times<br />
+ &nbsp;- 4 bytes: Length of element<br />
+ &nbsp;- Serialized element<br />
+ </td>
+ </tr>
+ <tr>
+ <td>Fieldmap (ID 7)</td>
+ <td>&nbsp;</td>
+ <td>Field maps (embedded or not) are defined above</td>
+ </tr>
+ <tr>
+ <td>Document (ID 8)</td>
+ <td>&nbsp;</td>
+ <td>Document objects (embedded or not) are defined above</td>
+ </tr>
+ <tr>
+ <td>Timestamp (ID 9)</td>
+ <td>&nbsp;</td>
+ <td>Same as long integer</td>
+ </tr>
+ <tr>
+ <td>Uri (ID 10)</td>
+ <td>&nbsp;</td>
+ <td>Same as string</td>
+ </tr>
+ <tr>
+ <td>Exact string (ID 11)</td>
+ <td>&nbsp;</td>
+ <td>Same as string</td>
+ </tr>
+ <tr>
+ <td>Content (ID 12)</td>
+ <td>At least 11 bytes</td>
+ <td>
+ Content fields are serialized like this:<br />
+ &nbsp;- Content type length (1 byte)<br />
+ &nbsp;- Content type (0 terminated string, UTF-8 encoding.)<br />
+ &nbsp;- Content encoding length (1 byte)<br />
+ &nbsp;- Content encoding (0 terminated string, UTF-8 encoding.)<br />
+ &nbsp;- Content language length (1 byte)<br />
+ &nbsp;- Content language (0 terminated string, UTF-8 encoding.)<br />
+ &nbsp;- Content length (Integer, 4 bytes)<br />
+ &nbsp;- Content (including 0-terminating char)<br />
+ </td>
+ </tr>
+ <tr>
+ <td>Content meta (ID 13)</td>
+ <td>At least 12 bytes</td>
+ <td>
+ Content (attachment) meta data are serialized like this:<br />
+ &nbsp;- Attachment size (Integer, 4 bytes)<br />
+ &nbsp;- Attachment name (0 terminated string, UTF-8 encoding.)<br />
+ &nbsp;- Attachment encoding (0 terminated string, UTF-8 encoding.)<br />
+ &nbsp;- Attachment content type (0 terminated string, UTF-8
+ encoding.)<br />
+ &nbsp;- Attachment part (0 terminated string, UTF-8 encoding.)<br />
+ &nbsp;- Attachment flag (Integer, 4 bytes)<br />
+ </td>
+ </tr>
+ <tr>
+ <td>Term boost (ID 15)</td>
+ <td>&nbsp;</td>
+ <td>Same as string</td>
+ </tr>
+ <tr>
+ <td>Byte (ID 16)</td>
+ <td>1</td>
+ <td>One single byte</td>
+ </tr>
+ <tr>
+ <td>Set (ID 17)</td>
+ <td>At least 8 bytes</td>
+ <td>
+ Set of any fields are serialized like this:<br />
+ &nbsp;- Integer (4 bytes): Data type set is made up of<br />
+ &nbsp;- Integer (4 bytes): Number of elements in set<br />
+ &nbsp;Below sequence is repeated "number of element" times<br />
+ &nbsp;- Serialized element<br />
+ </td>
+ </tr>
+ </table>
-<table border="1" cellspacing="0" cellpadding="1%" width="100%">
-<caption><em>Data type serialization</em></caption>
-<tr>
-<td width="15%"><b>Data type</td>
-<td width="10%"><b>Length</td>
-<td><b>Serialization</td>
-</tr>
-<tr><td>Integer (ID 0)</td>
-<td>4</td>
-<td>Signed integer, two's complement notation, network byte order.</td>
-</tr>
-<tr><td>Floating point number (ID 1)</td>
-<td>4</td>
-<td>IEEE 754, single precision, network byte order.</td>
-</tr>
-<tr><td>String (ID 2)</td>
-<td>1 + (1 or 4) + length + 1</td>
-<td>Strings are serialization format:<br />
-&nbsp;- First byte represents coding. This has traditionally denoted the maximum number of bits
- per character in the UTF-8 encoded string, but has never been used in deserialization code.
-<ul>
- <li>Set to 32 if not used.</li>
- <li>Set to &lt;32 if you know the UTF-8 string uses less bits per character; e.g. ASCII could use 8.</li>
- <li>Set bit 6 (decimal 64) if the string has an <a href="#annotations">annotation tree</a>.</li>
-</ul>
-<br />
-&nbsp;- Integer_1_4 with length of string. <br />
-&nbsp;- The string (UTF-8 encoding), including 0-terminating byte.<br>
-&nbsp;- An annotation tree, if bit 6 (decimal 64) of coding byte is set:
-<ul>
- <li>total length of all span trees excl. itself: <strong>uint32</strong></li>
- <li>number of span trees <strong>int_1_2_4</strong></li>
- <li>for each root node:
- <ol>
- <li>tree name serialized as String</li>
- <li>serialized SpanNode as given below, see <a href="#annotations">annotation serialization</a></li>
- </ol>
-</ul>
-</td>
-</tr>
-<tr><td>Raw bytes (ID 3)</td>
-<td>Length of buffer</td>
-<td>Byte for byte copy</td>
-</tr>
-<tr><td>Long integer (ID 4)</td>
-<td>8</td>
-<td>Signed integer, two's complement notation, network byte order.</td>
-</tr>
-<tr><td>Double floating point number (ID 5)</td>
-<td>8</td>
-<td>IEEE 754, double precision, network byte order.</td>
-</tr>
-<tr><td>Array (ID 6)</td>
-<td>At least 8 bytes</td>
-<td>Arrays of any fields are serialized like this:<br>
-&nbsp;- 4 bytes: Data type array consists of <br>
-&nbsp;- 4 bytes: Number of elements in array<br>
-&nbsp;Below sequence is repeated "number of element" times<br>
-&nbsp;- 4 bytes: Length of element<br>
-&nbsp;- Serialized element<br>
-</td></tr>
-<tr><td>Fieldmap (ID 7)</td>
-<td>&nbsp;</td>
-<td>Field maps (embedded or not) are defined above</td>
-</tr>
-<tr><td>Document (ID 8)</td>
-<td>&nbsp;</td>
-<td>Document objects (embedded or not) are defined above</td>
-</tr>
-<tr><td>Timestamp (ID 9)</td>
-<td>&nbsp;</td>
-<td>Same as long integer</td>
-</tr>
-<tr><td>Uri (ID 10)</td>
-<td>&nbsp;</td>
-<td>Same as string</td>
-</tr>
-<tr><td>Exact string (ID 11)</td>
-<td>&nbsp;</td>
-<td>Same as string</td>
-</tr>
-<tr><td>Content (ID 12)</td>
-<td>At least 11 bytes</td>
-<td>Content fields are serialized like this:<br>
-&nbsp;- Content type length (1 byte)<br>
-&nbsp;- Content type (0 terminated string, UTF-8 encoding.)<br>
-&nbsp;- Content encoding length (1 byte)<br>
-&nbsp;- Content encoding (0 terminated string, UTF-8 encoding.)<br>
-&nbsp;- Content language length (1 byte)<br>
-&nbsp;- Content language (0 terminated string, UTF-8 encoding.)<br>
-&nbsp;- Content length (Integer, 4 bytes)<br>
-&nbsp;- Content (including 0-terminating char)<br>
-</td>
-</tr>
-<tr><td>Content meta (ID 13)</td>
-<td>At least 12 bytes</td>
-<td>Content (attachment) meta data are serialized like this:<br>
-&nbsp;- Attachment size (Integer, 4 bytes)<br>
-&nbsp;- Attachment name (0 terminated string, UTF-8 encoding.)<br>
-&nbsp;- Attachment encoding (0 terminated string, UTF-8 encoding.)<br>
-&nbsp;- Attachment content type (0 terminated string, UTF-8 encoding.)<br>
-&nbsp;- Attachment part (0 terminated string, UTF-8 encoding.)<br>
-&nbsp;- Attachment flag (Integer, 4 bytes)<br>
-</td>
-</tr>
-<tr><td>Term boost (ID 15)</td>
-<td>&nbsp;</td>
-<td>Same as string</td>
-</tr>
-<tr><td>Byte (ID 16)</td>
-<td>1</td>
-<td>One single byte</td>
-</tr>
-<tr><td>Set (ID 17)</td>
-<td>At least 8 bytes</td>
-<td>Set of any fields are serialized like this:<br>
-&nbsp;- Integer (4 bytes): Data type set is made up of<br>
-&nbsp;- Integer (4 bytes): Number of elements in set<br>
-&nbsp;Below sequence is repeated "number of element" times<br>
-&nbsp;- Serialized element<br>
-</td></tr>
-</table>
+ <table border="1" cellspacing="0" cellpadding="1%" width="100%">
+ <caption id="annotations">
+ <em>Annotation tree serialization</em>
+ </caption>
+ <tr>
+ <td width="15%"><b>Data type</b></td>
+ <td width="10%"><b>Length</b></td>
+ <td><b>Serialization</b></td>
+ </tr>
+ <tr>
+ <td>SpanNode (base class)</td>
+ <td>1 + (1, 2 or 4) + Annotation serialization + subclass payload</td>
+ <td>
+ <ul>
+ <li>
+ type <strong>byte</strong> (1: Span, 2: SpanList, 4:
+ AlternateSpanList)
+ </li>
+ <li>number of annotations <strong>int_1_2_4</strong></li>
+ <li>each annotation as given below</li>
+ <li>
+ (remaining payload serialized as given below by subclasses Span,
+ SpanList and AlternateSpanList)
+ </li>
+ </ul>
+ </td>
+ </tr>
+ <tr>
+ <td>Annotation</td>
+ <td>4 + (1, 2 or 4) + (possibly 4 + FieldValue serialization)</td>
+ <td>
+ <ul>
+ <li>
+ MD5 name hash (4 LSBytes) <strong>uint32</strong> (NOTE: 0-127
+ reserved for internal Vespa usage.)
+ </li>
+ <li>length <strong>int_1_2_4</strong></li>
+ <li>
+ the following fields are <em>only</em> present if length &gt; 0:
+ <ul>
+ <li>data type id <strong>uint32</strong></li>
+ <li>
+ NOTE: no sequence id, as we will rely on annotations being
+ serialized/deserialized in particular order, so we don't need
+ to write this explicitly
+ </li>
+ <li>FieldValue as given by its own serialization</li>
+ </ul>
+ </li>
+ </ul>
+ </td>
+ </tr>
+ <tr>
+ <td>Span</td>
+ <td>SpanNode serialization + (1, 2 or 4) + (1, 2 or 4)</td>
+ <td>
+ <ul>
+ <li>serialization from SpanNode base class</li>
+ <li>
+ from index, as given by Java String (UTF-16)
+ <strong>int_1_2_4</strong>
+ </li>
+ <li>
+ length, as given by Java String (UTF-16)
+ <strong>int_1_2_4</strong>
+ </li>
+ </ul>
+ </td>
+ </tr>
+ <tr>
+ <td>SpanList</td>
+ <td>
+ SpanNode serialization + (1, 2 or 4) + n times SpanNode serialization
+ </td>
+ <td>
+ <ul>
+ <li>serialization from SpanNode base class</li>
+ <li>number of children <strong>int_1_2_4</strong></li>
+ <li>
+ each child node serialized as SpanNode (Span, SpanList,
+ AlternateSpanList)
+ </li>
+ </ul>
+ </td>
+ </tr>
+ <tr>
+ <td>AlternateSpanList</td>
+ <td>
+ SpanNode serialization + (1, 2 or 4) + n times (8 + SpanList
+ serialization)
+ </td>
+ <td>
+ <ul>
+ <li>serialization from SpanNode base class</li>
+ <li>number of child trees <strong>int_1_2_4</strong></li>
+ <li>
+ for each child tree:
+ <ul>
+ <li>probability <strong>double</strong></li>
+ <li>serialization as given by SpanList above</li>
+ </ul>
+ </li>
+ </ul>
+ </td>
+ </tr>
+ <tr>
+ <td>AnnotationRef</td>
+ <td>1, 2 or 4</td>
+ <td>
+ AnnotationRef serialization
+ <ul>
+ <li>
+ unique sequence id of annotation being referred to
+ <strong>int1_2_4</strong>
+ </li>
+ </ul>
+ </td>
+ </tr>
+ </table>
+ <table border="1" cellspacing="0" cellpadding="1%" width="100%">
+ <caption>
+ <em>Data types used in serialized format</em>
+ </caption>
+ <tr>
+ <td width="15%"><b>Data type</b></td>
+ <td><b>Serialization</b></td>
+ </tr>
+ <tr>
+ <td>Integer_1_4</td>
+ <td>
+ If bit 7 of first byte is unset, coded using 1 byte.<br />
+ If bit 7 of first byte is set, coded using 4 bytes (bit 7 of first
+ byte must be masked away).<br />
+ <em>Range: 0 - 2**31-1.</em>
+ </td>
+ </tr>
+ <tr>
+ <td>Integer_1_2_4</td>
+ <td>
+ If bit 7 of first byte is unset, coded using 1 byte.<br />
+ If bit 7 of first byte is set and bit 6 of first byte is unset, coded
+ using 2 bytes (bit 7 and 6 of first byte must be masked away).<br />
+ If bit 7 and 6 of first byte are set, coded using 4 bytes (bit 7 and 6
+ of first byte must be masked away).<br />
+ <em>Range: 0 - 2**30-1.</em>
+ </td>
+ </tr>
+ <tr>
+ <td>Integer_2_4_8</td>
+ <td>
+ If bit 7 of first byte is unset, coded using 2 byte.<br />
+ If bit 7 of first byte is set and bit 6 of first byte is unset, coded
+ using 4 bytes (bit 7 and 6 of first byte must be masked away).<br />
+ If bit 7 and 6 of first byte are set, coded using 8 bytes (bit 7 and 6
+ of first byte must be masked away).<br />
+ <em>Range: 0 - 2**62-1.</em>
+ </td>
+ </tr>
+ </table>
-<a name="annotations"><table border="1" cellspacing="0" cellpadding="1%" width="100%">
-<caption><em>Annotation tree serialization</em></caption>
-<tr>
-<td width="15%"><b>Data type</td>
-<td width="10%"><b>Length</td>
-<td><b>Serialization</td>
-</tr>
-<tr>
-<td>SpanNode (base class)</td>
-<td>1 + (1, 2 or 4) + Annotation serialization + subclass payload</td>
-<td>
- <ul>
-<li> type <strong>byte</strong> (1: Span, 2: SpanList, 4: AlternateSpanList)
-</li> <li> number of annotations <strong>int_1_2_4</strong>
-</li> <li> each annotation as given below
-</li> <li> (remaining payload serialized as given below by subclasses Span, SpanList and AlternateSpanList)
-</li></ul>
-</td>
-</tr>
-<tr>
-<td>Annotation</td>
-<td>4 + (1, 2 or 4) + (possibly 4 + FieldValue serialization)</td>
-<td>
-<ul>
-<li>MD5 name hash (4 LSBytes) <strong>uint32</strong> (NOTE: 0-127 reserved for internal Vespa usage.)
-</li> <li> length <strong>int_1_2_4</strong>
-</li> <li> the following fields are <em>only</em> present if length &gt; 0: <ul>
-<li> data type id <strong>uint32</strong>
-</li> <li> NOTE: no sequence id, as we will rely on annotations being serialized/deserialized in particular order, so we don't need to write this explicitly
-</li> <li> FieldValue as given by its own serialization
-</li></ul>
-</li></ul>
-</td>
-</tr>
-<tr>
-<td>Span</td>
-<td>SpanNode serialization + (1, 2 or 4) + (1, 2 or 4)</td>
-<td><ul>
-<li>serialization from SpanNode base class</li>
-<li> from index, as given by Java String (UTF-16) <strong>int_1_2_4</strong>
-</li> <li> length, as given by Java String (UTF-16) <strong>int_1_2_4</strong>
-</li></ul>
-</td>
-</tr>
-<tr>
-<td>SpanList</td>
-<td>SpanNode serialization + (1, 2 or 4) + n times SpanNode serialization</td>
-<td>
-<ul>
-<li>serialization from SpanNode base class</li>
-<li> number of children <strong>int_1_2_4</strong>
-</li> <li> each child node serialized as SpanNode (Span, SpanList, AlternateSpanList)
-</li></ul>
-</td>
-</tr>
-<tr>
-<td>AlternateSpanList</td>
-<td>SpanNode serialization + (1, 2 or 4) + n times (8 + SpanList serialization)</td>
-<td><ul>
-<li>serialization from SpanNode base class</li>
-<li> number of child trees <strong>int_1_2_4</strong>
-</li> <li> for each child tree: <ul>
-<li> probability <strong>double</strong>
-</li> <li> serialization as given by SpanList above
-</li></ul>
-</li></ul>
-</td>
-</tr>
-<tr>
-<td>AnnotationRef</td>
-<td>1, 2 or 4</td>
-<td>AnnotationRef serialization <ul>
-<li> unique sequence id of annotation being referred to <strong>int1_2_4</strong>
-</li></ul>
-</td>
-</tr>
-</table>
+ <h2>Document Update Format</h2>
-<table border="1" cellspacing="0" cellpadding="1%" width="100%">
-<caption><em>Data types used in serialized format</em></caption>
-<tr>
-<td width="15%"><b>Data type</td>
-<td><b>Serialization</b></td>
-</tr>
-<tr><td>Integer_1_4</td>
-<td>If bit 7 of first byte is unset, coded using 1 byte.<br />
- If bit 7 of first byte is set, coded using 4 bytes (bit 7 of first byte must be masked away).<br />
- <em>Range: 0 - 2**31-1.</em></td>
-</tr>
-<tr><td>Integer_1_2_4</td>
-<td>If bit 7 of first byte is unset, coded using 1 byte.<br />
- If bit 7 of first byte is set and bit 6 of first byte is unset, coded using 2 bytes (bit 7 and 6 of first byte must be masked away).<br />
- If bit 7 and 6 of first byte are set, coded using 4 bytes (bit 7 and 6 of first byte must be masked away).<br />
- <em>Range: 0 - 2**30-1.</em></td>
-</td>
-</tr>
-<tr><td>Integer_2_4_8</td>
-<td>If bit 7 of first byte is unset, coded using 2 byte.<br />
- If bit 7 of first byte is set and bit 6 of first byte is unset, coded using 4 bytes (bit 7 and 6 of first byte must be masked away).<br />
- If bit 7 and 6 of first byte are set, coded using 8 bytes (bit 7 and 6 of first byte must be masked away).<br />
- <em>Range: 0 - 2**62-1.</em></td>
-</td>
-</tr>
-</table>
+ <p>This is the description of the serialized document update format.</p>
+ <table border="1" cellspacing="0" cellpadding="1%" width="100%">
+ <caption>
+ <em>Document update serialization format</em>
+ </caption>
+ <tr>
+ <td width="10%"><b>Field</b></td>
+ <td width="10%"><b>Type</b></td>
+ <td width="10%"><b>Length</b></td>
+ <td><b>Description</b></td>
+ </tr>
+ <tr>
+ <td>Document ID</td>
+ <td>Bytes</td>
+ <td>&nbsp;</td>
+ <td>Unique ID for document. 0-terminated string, UTF-8 encoding.</td>
+ </tr>
+ <tr>
+ <td>Content byte</td>
+ <td>Byte</td>
+ <td>1 byte</td>
+ <td>Always set to 1</td>
+ </tr>
+ <tr>
+ <td>Document Type</td>
+ <td>Bytes</td>
+ <td>&nbsp;</td>
+ <td>Document type. (0-terminated string, UTF-8 encoding.)</td>
+ </tr>
+ <tr>
+ <td>Number of fields to update</td>
+ <td>Integer</td>
+ <td>4 bytes</td>
+ <td>The number of fields to update</td>
+ </tr>
+ <tr>
+ <td>Serialized field updates</td>
+ <td>Field Update</td>
+ <td>&nbsp;</td>
+ <td>The serialized field updates. See below.</td>
+ </tr>
+ </table>
-<h2>Document Update Format</h2>
+ <table border="1" cellspacing="0" cellpadding="1%" width="100%">
+ <caption>
+ <em>Document update serialization format</em>
+ </caption>
+ <tr>
+ <td width="10%"><b>Field</b></td>
+ <td width="10%"><b>Type</b></td>
+ <td width="10%"><b>Length</b></td>
+ <td><b>Description</b></td>
+ </tr>
+ <tr>
+ <td>Field Id</td>
+ <td>Integer</td>
+ <td>4 bytes</td>
+ <td>Field id within document type.</td>
+ </tr>
+ <tr>
+ <td>Number of value updates</td>
+ <td>Integer</td>
+ <td>4 bytes</td>
+ <td>Numer of value updates to this field.</td>
+ </tr>
+ <tr>
+ <td>Serialized field update values</td>
+ <td>Bytes</td>
+ <td>&nbsp;</td>
+ <td>The serialized field update values. See below.</td>
+ </tr>
+ </table>
-<p>This is the description of the serialized document update format.</p>
-
-<table border="1" cellspacing="0" cellpadding="1%" width="100%">
-<caption><em>Document update serialization format</em></caption>
-<tr>
-<td width="10%"><b>Field</td>
-<td width="10%"><b>Type</td>
-<td width="10%"><b>Length</td>
-<td><b>Description</td>
-</tr>
-<tr><td>Document ID</td>
-<td>Bytes</td>
-<td>&nbsp;</td>
-<td>Unique ID for document. 0-terminated string, UTF-8 encoding.</td>
-</tr>
-<tr><td>Content byte</td>
-<td>Byte</td>
-<td>1 byte</td>
-<td>Always set to 1</td>
-</tr>
-<tr><td>Document Type</td>
-<td>Bytes</td>
-<td>&nbsp;</td>
-<td>Document type. (0-terminated string, UTF-8 encoding.)</td>
-</tr>
-<tr><td>Number of fields to update</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>The number of fields to update</td>
-</tr>
-<tr><td>Serialized field updates</td>
-<td>Field Update</td>
-<td>&nbsp;</td>
-<td>The serialized field updates. See below.</td>
-</tr>
-</table>
-
-<table border="1" cellspacing="0" cellpadding="1%" width="100%">
-<caption><em>Document update serialization format</em></caption>
-<tr>
-<td width="10%"><b>Field</td>
-<td width="10%"><b>Type</td>
-<td width="10%"><b>Length</td>
-<td><b>Description</td>
-</tr>
-<tr><td>Field Id</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>Field id within document type.</td>
-</tr>
-<tr><td>Number of value updates</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>Numer of value updates to this field.</td>
-</tr>
-<tr><td>Serialized field update values</td>
-<td>Bytes</td>
-<td>&nbsp;</td>
-<td>The serialized field update values. See below.</td>
-</tr>
-</table>
-
-<table border="1" cellspacing="0" cellpadding="1%" width="100%">
-<caption><em>Document update value serialization format</em></caption>
-<tr>
-<td width="10%"><b>Field</td>
-<td width="10%"><b>Type</td>
-<td width="10%"><b>Length</td>
-<td><b>Description</td>
-</tr>
-<tr><th colspan="4">Add Value Update</th></tr>
-<tr><td>Add Value Update ID</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>25 + 0x1000 for value updates.</td>
-</tr>
-<tr><td>Field serialization</td>
-<td>FieldValue</td>
-<td>&nbsp;</td>
-<td>Serialization of the field to add.</td>
-</tr>
-<tr><td>Weight</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>Weight. Used if update applies to weighted set.</td>
-</tr>
-<tr><th colspan="4">Arithmetic Update</th></tr>
-<tr><td>Arithmetic Update ID</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>26 + 0x1000 for arithmetic updates.</td>
-</tr>
-<tr><td>Operator ID</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>Identifies whether this does add, subtract, multiply or divide.</td>
-</tr>
-<tr><td>Operand</td>
-<td>Double</td>
-<td>8 bytes</td>
-<td>The right operand to use in the arithmetic operation.</td>
-</tr>
-<tr><th colspan="4">Assign Update</th></tr>
-<tr><td>Assign Update ID</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>27 + 0x1000 for assign updates.</td>
-</tr>
-<tr><td>Content flag</td>
-<td>Byte</td>
-<td>1 bytes</td>
-<td>Contains 1 if we have content, 0 if not.</td>
-</tr>
-<tr><td>Field serialization</td>
-<td>FieldValue</td>
-<td>&nbsp;</td>
-<td>Serialization of the field to assign.</td>
-</tr>
-<tr><th colspan="4">Clear Update</th></tr>
-<tr><td>Clear Update ID</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>28 + 0x1000 for clear updates.</td>
-</tr>
-<tr><th colspan="4">Map Value Update</th></tr>
-<tr><td>Map Value Update ID</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>29 + 0x1000 for map value updates.</td>
-</tr>
-<tr><td>Field serialization</td>
-<td>FieldValue</td>
-<td>4 bytes</td>
-<td>The field indicating what entry to update.</td>
-</tr>
-<tr><td>Value Update</td>
-<td>Document Value Update</td>
-<td>&nbsp;</td>
-<td>The update operation to apply to the field indicated above.</td>
-</tr>
-<tr><th colspan="4">Remove Value Update</th></tr>
-<tr><td>Remove Update ID</td>
-<td>Integer</td>
-<td>4 bytes</td>
-<td>30 + 0x1000 for remove updates.</td>
-</tr>
-<tr><td>Field serialization</td>
-<td>FieldValue</td>
-<td>&nbsp;</td>
-<td>The field indicating what entry to update.</td>
-</tr>
-</table>
-
-</body>
+ <table border="1" cellspacing="0" cellpadding="1%" width="100%">
+ <caption>
+ <em>Document update value serialization format</em>
+ </caption>
+ <tr>
+ <td width="10%"><b>Field</b></td>
+ <td width="10%"><b>Type</b></td>
+ <td width="10%"><b>Length</b></td>
+ <td><b>Description</b></td>
+ </tr>
+ <tr>
+ <th colspan="4">Add Value Update</th>
+ </tr>
+ <tr>
+ <td>Add Value Update ID</td>
+ <td>Integer</td>
+ <td>4 bytes</td>
+ <td>25 + 0x1000 for value updates.</td>
+ </tr>
+ <tr>
+ <td>Field serialization</td>
+ <td>FieldValue</td>
+ <td>&nbsp;</td>
+ <td>Serialization of the field to add.</td>
+ </tr>
+ <tr>
+ <td>Weight</td>
+ <td>Integer</td>
+ <td>4 bytes</td>
+ <td>Weight. Used if update applies to weighted set.</td>
+ </tr>
+ <tr>
+ <th colspan="4">Arithmetic Update</th>
+ </tr>
+ <tr>
+ <td>Arithmetic Update ID</td>
+ <td>Integer</td>
+ <td>4 bytes</td>
+ <td>26 + 0x1000 for arithmetic updates.</td>
+ </tr>
+ <tr>
+ <td>Operator ID</td>
+ <td>Integer</td>
+ <td>4 bytes</td>
+ <td>Identifies whether this does add, subtract, multiply or divide.</td>
+ </tr>
+ <tr>
+ <td>Operand</td>
+ <td>Double</td>
+ <td>8 bytes</td>
+ <td>The right operand to use in the arithmetic operation.</td>
+ </tr>
+ <tr>
+ <th colspan="4">Assign Update</th>
+ </tr>
+ <tr>
+ <td>Assign Update ID</td>
+ <td>Integer</td>
+ <td>4 bytes</td>
+ <td>27 + 0x1000 for assign updates.</td>
+ </tr>
+ <tr>
+ <td>Content flag</td>
+ <td>Byte</td>
+ <td>1 bytes</td>
+ <td>Contains 1 if we have content, 0 if not.</td>
+ </tr>
+ <tr>
+ <td>Field serialization</td>
+ <td>FieldValue</td>
+ <td>&nbsp;</td>
+ <td>Serialization of the field to assign.</td>
+ </tr>
+ <tr>
+ <th colspan="4">Clear Update</th>
+ </tr>
+ <tr>
+ <td>Clear Update ID</td>
+ <td>Integer</td>
+ <td>4 bytes</td>
+ <td>28 + 0x1000 for clear updates.</td>
+ </tr>
+ <tr>
+ <th colspan="4">Map Value Update</th>
+ </tr>
+ <tr>
+ <td>Map Value Update ID</td>
+ <td>Integer</td>
+ <td>4 bytes</td>
+ <td>29 + 0x1000 for map value updates.</td>
+ </tr>
+ <tr>
+ <td>Field serialization</td>
+ <td>FieldValue</td>
+ <td>4 bytes</td>
+ <td>The field indicating what entry to update.</td>
+ </tr>
+ <tr>
+ <td>Value Update</td>
+ <td>Document Value Update</td>
+ <td>&nbsp;</td>
+ <td>The update operation to apply to the field indicated above.</td>
+ </tr>
+ <tr>
+ <th colspan="4">Remove Value Update</th>
+ </tr>
+ <tr>
+ <td>Remove Update ID</td>
+ <td>Integer</td>
+ <td>4 bytes</td>
+ <td>30 + 0x1000 for remove updates.</td>
+ </tr>
+ <tr>
+ <td>Field serialization</td>
+ <td>FieldValue</td>
+ <td>&nbsp;</td>
+ <td>The field indicating what entry to update.</td>
+ </tr>
+ </table>
+ </body>
</html>
diff --git a/screwdriver.yaml b/screwdriver.yaml
index 801c32bb5b2..3893f6afe8c 100644
--- a/screwdriver.yaml
+++ b/screwdriver.yaml
@@ -531,5 +531,7 @@ jobs:
--assume-extension --check-html --check-external-hash --no-enforce-http \
--typhoeus '{"connecttimeout": 10, "timeout": 30, "followlocation": false}' \
--hydra '{"max_concurrency": 1}' \
+ --ignore-urls '/slack.vespa.ai/,/localhost:8080/,/127.0.0.1:3000/,/favicon.svg/,/main.jsx/' \
+ --ignore-files '/fnet/index.html/' \
--swap-urls '(.*).md:\1.html' \
_site