summaryrefslogtreecommitdiffstats
path: root/storage
Commit message (Collapse)AuthorAgeFilesLines
* Use underlying duration instead of going via floating pointTor Brede Vekterli2023-06-021-1/+2
|
* Explicitly detect changed bucket ownership between read and write phasesTor Brede Vekterli2023-06-013-1/+20
| | | | | | | | | | | | | | | | | | | | | It was possible for bucket ownership to change between the condition check and operation write phases in a way that was not implicitly detected by the "has the replica set changed?"-test. This could trigger an assertion failure caused by a daisy-chain of events: 1. Distributor tries to update the bucket DB with a bucket that it no longer owns. This is a no-op, as the DB insertion silently drops updates to non-owned buckets. This is to avoid reinsertion of stale state when replies arrive to operations started on the old side of a state transition edge. 2. Distributor reads back the bucket it previously tried to write, with the code assuming it must have a valid, present state (it just wrote it, after all). 3. The distributor sets a flag in the read state and tries to write it back. 4. An invariant violation is detected when invalid state is attempted written and the process aborts. This commit adds explicit checking of ownership in the current and, if present, pending cluster states iff the cluster state version has changed between phases.
* Add dedicated condition probe metrics for `PutOperation`/`RemoveOperation`Tor Brede Vekterli2023-05-1616-38/+74
| | | | | Follows the same pattern as that used for sub-operation metrics for write-repair during Update processing.
* Handle MessageBus trace propagation for conditional removesTor Brede Vekterli2023-05-154-7/+49
| | | | Pretty much a mirror image of the logic in `PutOperation`.
* Merge pull request #27096 from ↵Henning Baldersheim2023-05-123-16/+241
|\ | | | | | | | | vespa-engine/havardpe/write-repair-for-conditional-remove added write repair logic to RemoveOperation
| * added write repair logic to RemoveOperationHåvard Pettersen2023-05-123-16/+241
| |
* | Wire MessageBus reply traces through conditional Put pipelineTor Brede Vekterli2023-05-1212-58/+206
|/ | | | | | | | | | | | | | `CheckCondition::Outcome` now exposes the resulting trace (if any) of the operations that were sent as part of the condition probe. The outcome-accessor is changed to return a non-const reference to allow for moving away the trace payload and into a higher-level reply. Turns out GetOperation did not aggregate reply traces as expected, and PersistenceMessageTrackerImpl did not transfer trace state upon failures (only success case). This makes this commit bigger than initially expected, but trace coverage should now be improved on a general basis, not just for conditional Put operations.
* Handle create-flag during Put write repairTor Brede Vekterli2023-05-092-4/+47
|
* Merge pull request #27027 from ↵Tor Brede Vekterli2023-05-082-1/+4
|\ | | | | | | | | vespa-engine/vekterli/default-enable-fast-update-restart-when-consistent Default-enable fast path updates when document versions are consistent
| * Add deprecation comment for configTor Brede Vekterli2023-05-081-0/+1
| |
| * Default-enable fast path updates when document versions are consistentTor Brede Vekterli2023-05-082-1/+3
| | | | | | | | | | | | | | This feature is always _implicitly_ enabled when 3-phase updates are enabled (which is the case by default). Flip the config default to true as well to be more in line with what's actually the case in production.
* | wire create flag from document api to storage apiHåvard Pettersen2023-05-082-1/+31
|/
* Merge pull request #27015 from ↵Tor Brede Vekterli2023-05-083-0/+25
|\ | | | | | | | | vespa-engine/vekterli/add-and-wire-condition-probe-config Add and wire in condition probing configuration
| * Add and wire in condition probing configurationTor Brede Vekterli2023-05-083-0/+25
| |
* | Propagate create-flag from parent Put to generated Put commandsTor Brede Vekterli2023-05-082-0/+24
|/
* Merge pull request #26979 from ↵Tor Brede Vekterli2023-05-0825-130/+990
|\ | | | | | | | | vespa-engine/vekterli/write-repair-for-conditional-puts Support write-repair for conditional Put operations
| * Improve reply cardinality checkingTor Brede Vekterli2023-05-051-2/+2
| |
| * Address PR review commentsTor Brede Vekterli2023-05-053-2/+17
| |
| * Support write-repair for conditional Put operationsTor Brede Vekterli2023-05-0425-130/+975
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a generic `CheckCondition` utility class that can be explicitly invoked by operations requiring a distinct condition evaluation phase. For now, this is only used by `PutOperation`. Distributed condition checking is only used _iff_ the target bucket's replica set is inconsistent and all involved content nodes support condition probing. Write-repair is currently disabled by default, as there does not yet exist any config wired in that enables it (coming as part of a follow-up). Also currently missing is proper metric wiring and MBus reply trace propagation for condition-checking sub-operations.
* | Change return type for VisitorFactory::makeVisitorEnvironment member functionTor Egge2023-05-057-14/+12
|/ | | | from unique pointer to shared pointer.
* propagate create flag to test-and-set helperHåvard Pettersen2023-05-032-1/+23
|
* wire create flagHåvard Pettersen2023-05-034-2/+17
|
* Ensure process-internal message ID uniquenessTor Brede Vekterli2023-04-2810-81/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a storage API command is created internally on a node it is always assigned a strictly increasing message ID that is guaranteed to be unique within the process. Some parts of the code use this as a way to distinguish messages from another. However, uniqueness (prior to this commit) did not necessarily hold, as the underlying wire protocol would inherit message IDs _from other nodes_ and override the generated ID with this. I.e. uniqueness no longer holds. This had exciting consequences when the stars aligned and a remote node sent the same ID as one generated at the same time internally on the receiver node. Luckily, in practice this would only be used in a potentially ambiguous context when sanity checking shared read lock sets for the _same bucket_ in the persistence threads. Invariant checks would detect this is as an attempted duplicate lock acquisition and abort the process. This has been latent for many, many years, but we've seen it happen exactly once. This commit introduces an explicit domain separation between the node-internal (locally unique) IDs and the ID used by the originator. The originator ID is maintained and returned over the wire to the caller when sending a response to the incoming request. Curiously, we don't actually need this originator ID at all since the caller maintains explicit state containing the sender command. Unfortunately we can't simply remove it, since versions prior to this commit will still use whatever's on the wire.
* Revert "Improve error message"Harald Musum2023-04-261-1/+1
|
* Merge pull request #26669 from vespa-engine/kkraune/error-msgKristian Aune2023-04-261-1/+1
|\ | | | | Improve error message
| * Improve error messageKristian Aune2023-04-031-1/+1
| |
* | Merge pull request #26807 from ↵Henning Baldersheim2023-04-219-200/+4
|\ \ | | | | | | | | | | | | vespa-engine/toregge/remove-searchresult-and-documentsummary-documentapi-messages Remove (SearchResult|DocumentSummary)(Command|Reply) storage and docu…
| * | Remove (SearchResult|DocumentSummary)(Command|Reply) storage and documentapi ↵Tor Egge2023-04-219-200/+4
| | | | | | | | | | | | messages.
* | | Merge pull request #26800 from vespa-engine/vekterli/minor-put-remove-cleanupsTor Brede Vekterli2023-04-213-30/+34
|\ \ \ | | | | | | | | Minor cleanups in Put/Remove operation code
| * | | Minor cleanups in Put/Remove operation codeTor Brede Vekterli2023-04-203-30/+34
| |/ / | | | | | | | | | | | | No changes in semantics, but removes an unneeded deep copy of the current cluster state for every processed Put.
* / / Add NewestReplica equality tests and gmock matcher for distinct elements in ↵Tor Brede Vekterli2023-04-203-10/+26
|/ / | | | | | | | | | | | | | | | | | | | | a range Add a new `matchers` directory in vespalib which can be used as a repository for generic, reusable GMock matchers. Move distributor tests from using an explicit gtest runner to using `GTest::gmock_main` which serves the same purpose. Need to depend on a gmock target (not just gtest) to be able to link with stuff required for matchers.
* | Merge pull request #26788 from ↵Tor Brede Vekterli2023-04-205-31/+139
|\ \ | | | | | | | | | | | | vespa-engine/vekterli/add-condition-match-metadata-aggregation-for-newest-replica Add condition support to distributor `GetOperation`
| * | Add condition support to distributor `GetOperation`Tor Brede Vekterli2023-04-195-31/+139
| | | | | | | | | | | | | | | | | | This involves two things: * Propagate input condition to sent Get requests when present * Add condition match status to newest replica metadata aggregation
* | | wire document condition probing through protobuf protocolHåvard Pettersen2023-04-198-1/+38
|/ /
* | Add backend support for distributed condition evaluationTor Brede Vekterli2023-04-1811-61/+245
| | | | | | | | | | | | | | | | | | | | | | | | Lets the "test" part of a test-and-set condition be evaluated locally on individual content nodes. Piggybacks on top of metadata-only Get operations, adding a new condition field to the request and a boolean match result to the response. Decouples the existing TaS utility code from being command-oriented, allowing it to be used in other contexts as well. Not yet wired through any protocols.
* | Change buffer state accounting from elements to entries.Tor Egge2023-04-041-1/+1
| |
* | Use timed waits on persistence queue condition variableTor Brede Vekterli2023-04-041-2/+2
|/ | | | | | | | | | This is a pragmatic workaround for our current optimistic signalling mechanisms seemingly being susceptible to lost wakeups under certain conditions. We should redesign how persistence queue signalling works on a more fundamental level to avoid this scenario altogether, but for now this will at least remove the possibility that a thread may be stalled if the persistence queues are completely quiescent over longer periods of time.
* Minor code cleanups; no changes in semanticsTor Brede Vekterli2023-03-285-12/+12
|
* Ensure proper memory visibility on distributor stripe flushTor Brede Vekterli2023-03-271-0/+6
| | | | | | | | | | | | | | | Flushing the distributor stripes happens as part of process shutdown and therefore takes place in the main thread. At this point we should no longer receive any messages from the RPC/messaging subsystems, but there may still be lingering replies to be purged. At this point, the stripe threads are joined but the communication manager thread is not (happens slightly after in the shutdown sequence). We therefore have to explicitly form a mutex acquire/release pair to ensure proper memory visibility between anything touched by the comm. manager thread (within the mutex) and our main thread. This should resolve a ThreadSanitizer error.
* Add capability checking to state API handlersTor Brede Vekterli2023-03-224-6/+25
| | | | | | | | | | | | | | | | | | This covers both the entry points from the `storagenode` and `searchnode` HTTP servers, though the former is mostly in the name of legacy support. Ideally, capability checking would exist as a property of the HTTP server (Portal) bindings, but the abstractions for the JSON request handling are sufficiently leaky that it ended up making more sense to push things further down the hierarchy. It's always a good thing to move away from using strings with implicit semantics as return types anyway. The `searchnode` state API handler mapping supports fine grained capabilities. The legacy `storagenode` state API forwarding does not; it uses a sledgehammer that expects the union of all possible API capability requirements.
* Include mutex to get definition of std::mutex.Tor Egge2023-03-161-0/+1
|
* Add locking for reported node state in TestNodeStateUpdater.Tor Egge2023-03-162-2/+5
|
* Merge pull request #26423 from ↵Geir Storli2023-03-138-17/+20
|\ | | | | | | | | vespa-engine/geirst/less-document-without-type-repo Reduce creation of Document instances without DocumentTypeRepo.
| * Reduce creation of Document instances without DocumentTypeRepo.Geir Storli2023-03-138-17/+20
| |
* | Be explicit about lbound/ubound for bucket DB iteration and add lbound variantTor Brede Vekterli2023-03-137-51/+90
|/ | | | | | The DB API was rather coy about whether `forEach` had lower or upper bound semantics with regards to the bucket ID passed in as a starting point. Be explicit and add a lower-bound variant.
* Use `optional` instead of `unique_ptr`Tor Brede Vekterli2023-03-071-41/+29
| | | | Plus some additional minor cleanup.
* use ref_counted in fnetHåvard Pettersen2023-03-065-8/+8
| | | | | | also get rid of some cleanup functions on reference counted classes enable specifying low-level parameters to addref/subref (cnt/reserve)
* Use matching duration for time_point.Tor Egge2023-03-066-8/+8
|
* typesafe getLastProcessedTime tooHenning Baldersheim2023-03-012-5/+5
|
* Use a typed period.Henning Baldersheim2023-03-013-13/+12
|