summaryrefslogtreecommitdiffstats
path: root/Code-map.md
blob: 6b93f99c55288b9ce6edafb53d7b3daa861ded3f (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# A map to the Vespa code base

You want to get familiar with the Vespa code base but don't know where to start?

Vespa consists of about 1.7 million lines of code, about equal parts Java and C++.
Since it it's mostly written by a team of developers selected for their ability 
to do this kind of thing unusually well, who have been given time to dedicate 
themselves to it for a long time, it is mostly easily to work with. However, one 
thing we haven't done is to create a module structure friendly to newcomers - the code
simply organized in a flat structure of about 150 modules. 

This document aims to provide a map from the
[functional elements](https://docs.vespa.ai/documentation/overview.html)
of Vespa to the most implrtant modules in the flat module structure in the
[code base on GitHub]()[https://github.com/vespa-engine/vespa).


## The stateless container

When a request is made to Vespa it first enters some stateless container cluster,
called jDisc. This consists of:

- a __jDisc core__ layer which provides a model of a running application, general protocol-independent request-response handling, with various protocol implementations,
- a __jDisc container__ layer providing component management, configuration and similar.
- a __search middleware__ layer containing query/result API's, query execution logic etc.
- API's and modules for writing and processing document operations.

The stateless container is implemented in Java.

jDisc core modules:

- [jdisc_core](https://github.com/vespa-engine/vespa/tree/master/jdisc_core) - The core jDisc functionality
- [jdisc_http_service](https://github.com/vespa-engine/vespa/tree/master/jdisc_http_service) - HTTP connector for jDisc, implemented using Jetty.

jDisc container modules, layered on jDisc core:

- [container-disc](https://github.com/vespa-engine/vespa/tree/master/container-disc) - Integration between the jDisc container and jDisc core layers.
- [container-core](https://github.com/vespa-engine/vespa/tree/master/container-core) - Core jDisc container functionality: Metrics, OSGi integration for component bundles, etc.
- [component](https://github.com/vespa-engine/vespa/tree/master/component) - the component model. Components (in Java) will implement or subclass a type for this module.
- [chain](https://github.com/vespa-engine/vespa/tree/master/chain) - generic support for chaining components in a Chain of Responsibility structure, which is a pattern used repeatedly in higher level modules.
- [container-di](https://github.com/vespa-engine/vespa/tree/master/container-di) - component dependency injection framework for the container, compatible with Guice annotations but an separate implementation which handles injection of config and injection of component collections.
- [processing](https://github.com/vespa-engine/vespa/tree/master/processing) - generic, chainable request-response processing framework (Processors).

Search container, layered on jDisc container:

- [container-search](https://github.com/vespa-engine/vespa/tree/master/container-search) - Query-Result processing framework (Searchers) layered over the generic processing framework, query profiles, and the global query execution logic: Dispatch (scatter-gather), grouping, etc.

Document operation modules:

- [document](https://github.com/vespa-engine/vespa/tree/master/document) - The document model - documents, fields and document types, and operations on documents.
- [messagebus](https://github.com/vespa-engine/vespa/tree/master/messagebus) - Generic async, multi-hop message passing implemented in both Java and C++.
- [jdics_messagebus_service](https://github.com/vespa-engine/vespa/tree/master/jdisc_messagebus_service) - MessageBus connector for jDisc.
- [documentapi](https://github.com/vespa-engine/vespa/tree/master/documentapi) - API for issuing document operations to Vespa over messagebus.
- [docproc](https://github.com/vespa-engine/vespa/tree/master/docproc) - Chainable document (operation) processors: Document operations issued over messagebus to Vespa will usually be routed through a container running a document processor chain.
- [indexinglanguage](https://github.com/vespa-engine/vespa/tree/master/indexinglanguage) - Implementation of the "indexing" language which is used to express the statements prefixed by "indexing:" in the search definition. 
- [docprocs](https://github.com/vespa-engine/vespa/tree/master/docprocs) - Document processor components bundled with Vespa. Notably the Indexingprocessor - a document processor invoking the indexing language statements configured for the document type in question on document operations.
- [vespaclient-container-plugin](https://github.com/vespa-engine/vespa/tree/master/vespaclient-container-plugin) - Implements the document/v1 API and internal API used by the Java HTTP client on top of the jDisc container, forwarding to the Document API.
- [vespa-http-client](https://github.com/vespa-engine/vespa/tree/master/vespa-http-client) - client for fast writing to the internal API implemented by vespaclient-container-plugin.

## Content nodes

Content nodes store all data in Vespa, maintains reverse indexes and performs the distributed parts of query execution - matching, ranking and grouping/aggregation.
This is written in C++.

- [searchcore](https://github.com/vespa-engine/vespa/tree/master/searchcore) - core functionality for maintaining indexes, matching, data storage, grouping, as well as the content node server itselv (called proton).
- [searchlib](https://github.com/vespa-engine/vespa/tree/master/searchlib) - libraries invoked by searchcore: Ranking (feature execution framework (fef), rank feature implementations, ranking expressions), index and btree implementations etc. This also contains the Java libraries for ranking.
- [storage](https://github.com/vespa-engine/vespa/tree/master/storage/src/vespa/storage) - system for elastic and auto-recovering data storage over clusters of nodes.
- [eval](https://github.com/vespa-engine/vespa/tree/master/eval) - library for efficient evaluation of ranking expressions.
- [storageapi](https://github.com/vespa-engine/vespa/tree/master/storageapi/src/vespa/storageapi) - message bus messages and implementation for the document API.
- [clustercontroller-core](https://github.com/vespa-engine/vespa/tree/master/clustercontroller-core) - cluster controller for storage, implemented in Java. This provides singular node-level decision making for storage, based on ZooKeeper.


## Configuration and administration

The third major subsystem in Vespa is responsible for managing configuration, clusters, application deployment and similar.
It is implemented in Java.