vespa - An engine for low-latency computation over large data sets

	Commit message (Collapse)	Author	Age	Files	Lines
*	Propagate "create if missing"-flag outside binary Update payload in protocols	Tor Brede Vekterli	2024-04-26	21	-15/+168
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoids potentially having to deserialize the entire update just to get to a single bit of information that is technically metadata existing orthogonally to the document update itself. To ensure backwards/forwards compatibility, the flag is propagated as a Protobuf `enum` where the default value is a special "unspecified" sentinel, implying an old sender. Since the Java protocol implementation always eagerly deserializes messages, it unconditionally assigns the `create_if_missing` field when sending and completely ignores it when receiving. The C++ protocol implementation observes and propagates the field iff set. Otherwise the flag is deferred to the update object as before. This applies to both the DocumentAPI and StorageAPI protocols.
*	Replace all usages of Arrays.asList with List.of where possible.	Henning Baldersheim	2024-04-12	6	-43/+36
\|
*	Unify on List.of	Henning Baldersheim	2024-04-11	1	-3/+2
\|
*	Update to protobuf 5.26.1 (C++ API).	Tor Egge	2024-04-05	2	-2/+2
\|
*	Merge pull request #30547 from ↵	Tor Egge	2024-03-11	2	-182/+124
\|\ \| \| \| \| \| \| \| \|	vespa-engine/toregge/rewrite-documentapi-policies-unit-test-to-gtest Rewrite documentapi policies unit test to gtest.
\| *	Fix typo in documentapi policies unit test.	Tor Egge	2024-03-11	1	-1/+1
\| \|
\| *	Rewrite documentapi policies unit test to gtest.	Tor Egge	2024-03-09	2	-182/+124
\| \|
* \|	Merge pull request #30552 from ↵	Geir Storli	2024-03-11	2	-89/+33
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \|	vespa-engine/toregge/rewrite-documentapi-replymerger-unit-test-to-gtest Rewrite documentapi reply merger unit test to gtest.
\| * \|	Rewrite documentapi reply merger unit test to gtest.	Tor Egge	2024-03-09	2	-89/+33
\| \|/
* /	Rewrite documentapi messagebus unit test to gtest.	Tor Egge	2024-03-09	2	-36/+35
\|/
*	Use smaller buffers for Document(Update) serialization	Tor Brede Vekterli	2024-03-04	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \|	Default buffer size is 64 KiB, which adds a lot of unnecessary GC pressure when operations are small (which is often the case). Now explicitly preallocate just 8 KiB for documents and 4 KiB for updates. Additionally, use explicit HEAD serializer for `Document` serialization, as `serialize()` for some reason uses v6 internally (this only has an observable effect for updates, but still good to use the most recent version).
*	Add Vespa 9 deprecation comment for lazy deserialization	Tor Brede Vekterli	2024-02-27	3	-3/+6
\| \| \| \| \| \| \|	Since the lazy deserialization captures the entire message payload, it's not independent of the protocol version used and it's therefore not safe to set or use it from any other protocol than the legacy version.
*	More graceful reporting of Protobuf protocol decode errors	Tor Brede Vekterli	2024-02-26	2	-9/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a message fails decoding due to e.g. a missing document type during document decode, that will be reported as a protocol-level decode error, even if that message itself is perfectly compliant. Report such exceptions as warnings instead, but add more context to the exception message to make it clear that it happened in a decode-context. Example of old (and rather confusing) message in logs: ``` WARNING distributor vds.documentprotocol Document type foo not found ``` Now becomes: ``` WARNING distributor vds.documentprotocol Failed decoding message of type PutDocumentRequest: Document type foo not found ```
*	Don't do a hex dump of messages into the log upon protocol deserialization ↵	Tor Brede Vekterli	2024-02-23	1	-4/+0
\| \| \| \| \| \|	failures ... It's a tad excessive.
*	Optimize Java DocumentProtocol encoding memory usage	Tor Brede Vekterli	2024-02-23	4	-14/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit allows protocol implementations to directly construct and return a payload byte array that contains both the message identifier and the serialized message itself _without_ having to go through a `DocumentSerializer` indirection. A new method has been added to the `RoutableFactory` whose default implementation defers to the legacy `DocumentSerializer`-accepting method. This means the v6 protocol has the same semantics and performance characteristics as before. The new Protobuf protocol implementation now allocates the result byte array once with the correct size and writes both the message ID header and the protobuf data into this. This has the following performance benefits for the new protocol: - Reduces the number of buffer _allocations_ from 3 to 1. - Avoids 2 buffer _copies_ since we now directly allocate and write into the resulting array. - Encoding allocates the exact number of required bytes instead of always allocating 8K at a minimum. This also avoids the need for growing (by realloc and copy) the buffer during encoding.
*	Explicitly use HEAD serializer for document update instances	Tor Brede Vekterli	2024-02-23	1	-1/+2
\| \| \| \| \| \| \|	For some assuredly exciting reason, `DocumentUpdate.serialize()` by default uses the v6 protocol version instead of the HEAD version. This caused tensor updates (which are only available on the HEAD version) to fail serialization.
*	Use temporary byte array when serializing Protobuf in Java	Tor Brede Vekterli	2024-02-22	1	-7/+4
\| \| \| \| \| \| \| \| \| \| \|	The previous attempt at avoiding unneeded memory allocation and copying was futile because--obviously in retrospect--the underlying byte buffer is fixed in size and not dynamically growable. This did not manifest itself in any unit tests (too little data) or the set of system tests that I ran manually. It seems likely that we want to reconsider the encode/decode APIs in the `DocumentProtocol` to allow for more optimal memory management.
*	Include file names in IO failure exceptions	Tor Brede Vekterli	2024-02-22	1	-2/+2
\|
*	Move C++ DocumentAPI message tests to GTest	Tor Brede Vekterli	2024-02-22	11	-1359/+867
\| \| \| \| \| \| \| \| \| \| \| \|	Message-specific test cases are no longer delegated to a quasi-framework in a parent class, but implemented with regular test case functions. Clean up and move existing `TestBase` into a dedicated `MessageFixture` class. Use `std::filesystem::path` instead of plain strings for file paths. This also merges 3 standalone test apps into 1 GTest runner.
*	Tag Protobuf protocol boundary version and add binary test files	Tor Brede Vekterli	2024-02-21	76	-4/+4
\| \| \| \| \| \| \|	All _reported_ versions >= 8.310 use Protobuf protocol, all lower versions use the legacy protocol. Reported version is controlled by an environment variable and defaults to 8.309, i.e. the legacy version.
*	Improve codec error reporting and use Protobuf type parsers directly	Tor Brede Vekterli	2024-02-20	3	-52/+93
\| \| \| \| \| \| \|	Errors during (de-)serialization will now be logged, alongside the message type in question. Due to differences in internal wiring, the Java version logs the Document API type whereas C++ logs the protobuf message type (and also has throttled logging).
*	Add new Protobuf-based MessageBus DocumentAPI protocol	Tor Brede Vekterli	2024-02-16	37	-172/+4093
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds an entirely new implementation of the internal MessageBus DocumentAPI protocol, which shall be functionally 1-to-1 compatible with the existing legacy protocol. New protobuf schemas have been added to the top-level documentapi module, which are separated into different domains of responsibility: * CRUD messages * Visiting messages * Data inspection messages As well as a schema for shared, common message types. Both C++ and Java protocol implementations separate serialization and deserialization into a codec abstraction per message type, which hides the boilerplate required for Protobuf buffer management. The Java version is a tad more verbose due to generics type-erasure. This protocol does _not_ currently support lazy (de-)serialization in Java, as the existing mechanisms for doing so are inherently tied to the legacy protocol version. Performance tests will decide if we need to introduce such functionality to the new protocol version. To avoid having the new protocol go live in production, this commit changes the semantics of how MessageBus version reporting works (at least for the near future); instead of reporting the current Vespa _release_ version, it reports the highest supported _protocol_ version. This lets us conditionally enable the new protocol by reporting a MessageBus version greater than or equal to the protocol version _iff_ the protocol should be active. The new protocol is disabled by default. Other changes: * Protocol tests have been moved up one package directory level to be aligned with the actual package of the classes they test. This allows for using package-protected constructors in the serialization tests. * `DocumentDeserializer` now exposes the underlying document type repo/manager. This is done to detangle `Document`/`DocumentUpdate` deserialization from the underlying wire buffer management. * `RemoveLocationMessage` at long last contains a bucket space, which was forgotten when we initially added this concept to the other messages, and where the pain of adding it in later was too big (not so anymore!). Unit tests for both C++ and Java have been hoisted from the legacy test suite, cleaned up and extended with additional cases. The C++ tests use the old unit test kit and should receive a good follow-up washing and GTest-rewrite. Important: due to how MessageBus protocol versioning works, the final protocol version is _not_ yet decided, as setting it requires syncing against our build systems. A follow-up commit will assign the final version as well as include all required binary test files.
*	Non-funcitonal changes	jonmv	2023-11-30	4	-21/+18
\|
*	Use `shared_mutex` to allow non-contending reads (common case)	Tor Brede Vekterli	2023-10-17	2	-7/+7
\|
*	Improve thread safety of MessageBus ContentPolicy	Tor Brede Vekterli	2023-10-17	3	-43/+71
\| \| \| \| \| \| \| \| \|	Updates of distribution config and cached cluster state are now both thread safe. Move to `shared_ptr` to allow for taking immutable strong refs. Also remove pointless two-phased config switch-over in favor of directly updating value inside lock.
*	Revert "Merge pull request #28879 from ↵	jonmv	2023-10-11	2	-2/+2
\| \| \| \| \| \| \|	vespa-engine/revert-28869-jonmv/job-runner-thread-metrics" This reverts commit 67351aa3e2adbbb4872097ed799f1ca837f35e6d, reversing changes made to aed7902ee0371efb89747d467c4a2f8124ddc08d.
*	Revert "Jonmv/job runner thread metrics"	Harald Musum	2023-10-11	2	-2/+2
\|
*	Non-functional changes	jonmv	2023-10-11	2	-2/+2
\|
*	Update copyright	Jon Bratseth	2023-10-09	273	-273/+273
\|
*	Use std::filesystem::is_directory and std::filesystem::exists	Tor Egge	2023-07-20	1	-2/+2
\|
*	Update abi-specs after making config class Builders final	gjoranv	2023-07-17	1	-5/+10
\|
*	Deserialize match features in SearchResult used in streaming search.	Geir Storli	2023-05-02	3	-15/+48
\|
*	Serialize match features in vdslib::SearchResult.	Tor Egge	2023-04-28	2	-0/+48
\|
*	Merge pull request #26891 from vespa-engine/balder/unify-feed-operations	Håvard Pettersen	2023-04-27	1	-7/+0
\|\ \| \| \| \|	Unify passing of all feed operations through the various feed apis.
\| *	Unify passing of all feed operations through the various feed apis.	Henning Baldersheim	2023-04-27	1	-7/+0
\| \|
* \|	Use timestamp from Jetty as creation time for Request/HttpRequest	Bjørn Christian Seime	2023-04-27	1	-1/+1
\|/
*	Restore EOL at EOF.	Tor Egge	2023-04-21	1	-1/+1
\|
*	Add comments describing that SearchResult and DocumentSummary messages and ↵	Tor Egge	2023-04-21	2	-0/+4
\| \| \| \| \| \|	replies were replaced by QueryResult message and reply in 2010.
*	Remove (SearchResult\|DocumentSummary)(Command\|Reply) storage and documentapi ↵	Tor Egge	2023-04-21	30	-804/+11
\| \| \| \|	messages.
*	Add condition support to distributor `GetOperation`	Tor Brede Vekterli	2023-04-19	1	-0/+4
\| \| \| \| \| \|	This involves two things: * Propagate input condition to sent Get requests when present * Add condition match status to newest replica metadata aggregation
*	add requested annotations	Håvard Pettersen	2023-04-19	3	-2/+7
\|
*	add create-if-non-existent flag for document put	Håvard Pettersen	2023-04-19	19	-14/+151
\|
*	Reduce creation of Document instances without DocumentTypeRepo.	Geir Storli	2023-03-13	2	-11/+14
\|
*	Backport visit slicing to `vespa-visit` CLI tool	Tor Brede Vekterli	2023-03-01	1	-0/+4
\| \| \| \| \| \| \| \| \| \|	Allows for efficient parallelization across multiple visitor instances, mirroring the existing support in Document V1. Also clean up some legacy option value parsing code. Note: changing the parsed type for `maxtotalhits` from `int` to `long` is intentional; the internal limit is already a `long` and a cluster may have a lot more than `INT32_MAX` documents.
*	re-apply "remove fastos"	Håvard Pettersen	2023-03-01	1	-1/+0
\| \| \| \|	This reverts commit 003f019d7579e49f4ec7609ef8eac26ada6ae753.
*	Revert "remove fastos"	Harald Musum	2023-02-28	1	-0/+1
\|
*	remove fastos	Håvard Pettersen	2023-02-28	1	-1/+0
\|
*	avoid using fastos thread in searchcore	Håvard Pettersen	2023-02-27	3	-3/+0
\| \| \| \|	also remove some left-behind includes
*	untangle fnet from fastos	Håvard Pettersen	2023-02-22	2	-7/+4
\|
*	- Use T && f() && to avoid moving temporaries.	Henning Baldersheim	2023-02-03	1	-2/+2
\| \| \| \|	- std::make_unique/make_shared