| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
If a maintenance operation reply comes from a node that went down after
its original request was sent _and_ cancelling is enabled, there is a
discrepancy where the distributor stripe message tracker does not know
about the operation, but the maintenance operation owner _does_ know
about it. This must be handled explicitly, as the message tracker returns
a default constructed bucket when the message is not known. This default
bucket must not be propagated out of the function, or we'll transitively
trigger an invariant check failure.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
Cluster state should state N nodes and distribution config should
state <N nodes in order to properly check that attempting to use
node N fails due to the _config_ and not the _state_.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a cluster state bundle contains distribution config, this is
internally propagated via the `StateManager` component to all registered
state listeners. One such state listener is `FileStorManager`, which
updates the content node-internal bucket space repository.
All `SetSystemStateCommand` and internal config-aware components
(`StateManager` and `ChangedBucketOwnershipHandler`) now explicitly
track whether the cluster controller provides distribution config,
or if the internally provided config should be used (including
fallback to internal config if necessary).
|
|\
| |
| |
| |
| | |
vespa-engine/vekterli/decode-cluster-state-bundle-distribution-config-cpp
Support decoding distribution config as part of cluster state bundles in C++
|
| |
| |
| |
| |
| |
| | |
Actually _encoding_ config in the same format as that used for
decoding config payloads is not directly supported, so we do our
own roundabout conversion as part of testing.
|
|\ \
| | |
| | |
| | |
| | | |
vespa-engine/toregge/adjust-storage-server-document-api-converter-unit-test-for-out-of-source-builds
Adjust storage server document api converter unit test for
|
| |/
| |
| |
| | |
out of source builds.
|
|/ |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
This has not been anything more than a no-op at best, as there
is no (and to the best of my knowledge; never has been) an init
state for distributors.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a received cluster state bundle contains an embedded distribution
config, the distributor will act _as if_ it had atomically received
and processed a distribution config change followed by a new cluster
state, but where no bucket info requests were sent for the config
change.
If a distributor observes a state bundle containing distribution
config it will note this internally and explicitly ignore any further
received distribution configs _not_ arriving from the cluster
controller. To handle downgrades and rollbacks, it undoes this
toggle if it later observes a state bundle _without_ distribution
config, reverting to using internal node config instead.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move distribution config transforms to vdslib so that the distribution
config bundle can contain derived configs for all bucket spaces in
one central place.
This is part of the prerequisite work needed before we can start
pushing distribution config from the cluster controller and rewiring
how distribution config is propagated and used in the backends.
Also, rename `Distribution::serialize()` to `Distribution::serialized()`
since it returns a const ref to a cached serialized form and does not do
on-demand serialization.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds a (live) config that specifies if content nodes and distributors
shall reject cluster state versions that are lower than the one that
is currently active on the node. This prevents "last write wins"
ordering problems when multiple cluster controllers have partially
overlapping leadership periods.
In the name of pragmatism, we try to auto-detect the case where
ZooKeeper state must have been lost on the cluster controller cluster,
and accept the state even with lower version number. Otherwise,
the content cluster would be effectively stalled until all its
processes had been manually restarted.
Adds wiring of live config to the `StateManager` component on both
content and distributor nodes.
|
|
|
|
|
|
|
|
|
| |
It's possible for a diff to contain multiple versions of the same
document, persisted at different timestamps. To avoid async scheduling
multiple operations per distinct document ID (technically a violation
of the SPI invariants), do a pre-pass of the diff to find the highest
timestamp present in the diff per document. Only schedule an operation
if it's the newest one for a document.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The legacy Put replica selection behavior may route new versions
of a document to replicas that are not considered optimal for
activation. This is not normally an issue, but can manifest itself
as missing coverage when the system is in flux with replicas
moving away from Retired nodes containing ready replicas, as the
existing replicas on the Retired node would be preferred for
activation (and thus be used for searches) but incoming Puts would
instead be sent to non-retired nodes due to being in the ideal state.
The new replica ordering (and transitively, selection) behavior is
identical between Puts and activation. This should help ensure
that new versions of the document is routed to replica(s) that
are most likely to be visible as part of searches.
New selection behavior for Puts is config-gated and defaults to
the legacy behavior.
This also subtly changes the fallback ordering criteria for replica
activation to consider the replica's existing DB _entry_ order
instead of its node index. Since DB entries are always ordered by
their ideal state order (with Retired nodes included), this will
evenly distribute fallback activations rather than skewing them
towards lower indexes. It is not expected that this has any negative
effects in practice, and is therefore _not_ a config-gated change.
|
|
|
|
| |
Was only used by `DirConfig`.
|
|
|
|
|
|
| |
Introduce a distinct `StorageConfigSet` which wraps the actual
underlying config objects and exposes them through a unified
`ConfigUri`.
|
|
|
|
|
|
|
|
|
|
| |
Once upon a time, VDS roamed the lands. It used real disk IO as
part of tests. Then came the meteor and in-memory dummy persistence
took over. Now it is time for the fossils to be moved into a museum
where they belong.
Also make PID file writing conditional on a config that is set to
`false` during unit testing (but `true` as default).
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoids potentially having to deserialize the entire update just
to get to a single bit of information that is technically
metadata existing orthogonally to the document update itself.
To ensure backwards/forwards compatibility, the flag is
propagated as a Protobuf `enum` where the default value is
a special "unspecified" sentinel, implying an old sender.
Since the Java protocol implementation always eagerly
deserializes messages, it unconditionally assigns the
`create_if_missing` field when sending and completely ignores
it when receiving.
The C++ protocol implementation observes and propagates the
field iff set. Otherwise the flag is deferred to the update
object as before. This applies to both the DocumentAPI and
StorageAPI protocols.
|
|
|
|
|
| |
Can't initialize members in constructor that depend on objects
that are subsequently reset by the superclass' `SetUp()` method.
|
| |
|
|
|
|
| |
Avoids the need for barriers to avoid stepping on the thread's toes
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bucket operations require either exclusive (single writer) or
shared (multiple readers) access. Prior to this commit, this
means that many enqueued feed operations to the same bucket
introduce pipeline stalls due to each operation having to wait
for all prior operations to the bucket to complete entirely
(including fsync of WAL append). This is a likely scenario when
feeding a document set that was previously acquired through
visiting, as such documents will inherently be output in
bucket-order.
With this commit, a configurable number of feed operations
(put, remove and update) bound for the exact same bucket may
be sent asynchronously to the persistence provider in the
context of the _same_ write lock. This mirrors how merge
operations work for puts and removes.
Batching is fairly conservative, and will _not_ batch across
further messages when any of the following holds:
* A non-feed operation is encountered
* More than one mutating operation is encountered for the
same document ID
* No more persistence throttler tokens can be acquired
* Max batch size has been reached
Updating the bucket DB, assigning bucket info and sending
replies is deferred until _all_ batched operations complete.
Max batch size is (re-)configurable live and defaults to a
batch size of 1, which shall have the exact same semantics as
the legacy behavior.
Additionally, clock sampling for persistence threads have been
abstracted away to allow for mocking in tests (no need for sleep!).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The document API has long since had a special field for update operations
where an optional expected _existing_ backend timestamp can be specified,
and where the update should only go through iff there is a timestamp
match.
This has been supported on the distributor all along, but only when
write-repair is taking place (i.e. rarely), but the actual backend
support has been lacking. No one has complained yet since this is
very much not an advertised feature, but if we want to e.g. use this
feature for improvements to batch updates we should ensure that it
works as expected.
With this commit, a non-zero "old timestamp" field is cross-checked
against the existing document, and the update is only applied if the
actual and expected timestamps match.
|
| |
|
|
|
|
|
|
| |
tests.
- Reduce penetration of generated StorFilestorConfig.
|
|\
| |
| |
| |
| | |
vespa-engine/balder/hardcode-enable_metadata_only_fetch_phase_for_inconsistent_updates
- Hardcode enable_metadata_only_fetch_phase_for_inconsistent_updates …
|
| | |
|
| |
| |
| |
| |
| |
| | |
restart_with_fast_update_path_if_all_get_timestamps_are_consistent to true.
- The tests expecting depending on these flags specify these values explicit.
|
|\ \
| | |
| | | |
Balder/gc unused distribution config
|
| | | |
|
| |/ |
|
|\ \
| | |
| | |
| | |
| | | |
vespa-engine/balder/disable_queue_limits_for_chained_merges-always-true
disable_queue_limits_for_chained_merges has long been true, GC
|
| |/ |
|
|/ |
|
| |
|
| |
|
|\
| |
| | |
Balder/always unordered merging
|
| | |
|
|\ \
| |/
|/|
| |
| | |
vespa-engine/balder/gc-maxpendingidealstateoperations
GC maxpendingidealstateoperations which has not been wired in for a l…
|