| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
Add guard against throwing exception out of ApplyBucketDiffState destructor.
|
| |
|
|\
| |
| |
| |
| | |
vespa-engine/vekterli/add-metric-for-max-time-since-bucket-gc
Add metric for max time since bucket GC was last run
|
| |
| |
| |
| |
| |
| | |
Max time is aggregated across all buckets. If this metric value grows
substantially larger than the configured GC period it indicates that
GC is being starved.
|
|\ \
| | |
| | |
| | |
| | | |
vespa-engine/toregge/add-detailed-metrics-for-failed-merge-operations
Add detailed metrics for failed merge operations.
|
| | | |
|
| |/ |
|
| |
| |
| |
| |
| |
| | |
We consider bucket maintenance so latency critical that we'll prefer
to stall scheduling of subsequent buckets instead of risking having
to re-scan the DB to encounter the bucket again.
|
| | |
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old maintenance scheduler behavior is to only remove a bucket from the
priority DB if its maintenance operation was successfully started. Failing
to start an operation could happen from both max pending throttling as well
as operation/bucket-specific blocking behavior. Since the scheduler would
encounter the same bucket as the one previously blocked upon its next tick
invocation, a single blocked bucket would run the risk of head-of-line
stalling the rest of the remaining maintenance queue (assuming the ongoing
DB scan did not encounter any higher priority buckets).
This commit changes the following aspects of maintenance scheduling:
* Always clear entries from the priority DB before trying to start an
operation. A blocked operation will be retried the next time the
regular bucket DB scan encounters the bucket.
* Avoid trying to start (and clear) inherently doomed operations by
_not_ trying to schedule any operations if it would be blocked due
to too many pending maintenance operations anyway. Introduces a
new `PendingWindowChecker` interface for this purpose.
* Explicitly inhibit all maintenance scheduling if a pending cluster
state is present. Operations are already _implicitly_ blocked from
starting if there's a pending cluster state, but this would cause
the priority DB from being pointlessly cleared.
|
| |
|
|\
| |
| |
| |
| | |
vespa-engine/toregge/remove-dead-code-in-put-operation
Remove dead code in PutOperation.
|
| | |
|
|/ |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
| |
Co-authored-by: Geir Storli <geirst@yahooinc.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Let any merge through that has already passed through at least one
other node's merge window, as that has already taken up a logical
resource slot on all those nodes. Busy-bouncing a merge at that point
would undo a great amount of time already expended.
The max number of enqueued merges is bounded by the number of nodes
in the system, as each node can still only accept a configurable number
of merges from distributors, and each distributor throttles their
maintenance operations based on priority-relative max pending limits.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds metrics for the following:
* Bucket replicas that should be moved out, e.g. retirement case or
node added to cluster that has higher ideal state priority.
* Bucket replicas that should be copied out, e.g. node is in ideal state
but might have to provide data other nodes in a merge.
* Bucket replicas that should be copied in, e.g. node does not have a
replica for a bucket that it is in ideal state for
* Bucket replicas that need syncing due to mismatching metadata.
These are aggregates across all bucket replicas, buckets and bucket spaces.
Should aid in visibility for data movement during node retirements when there
are concurrent replicas out of sync events.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ownership transfer
Avoids the case where different distributors can start merges with a max timestamp
that is lower than timestamps generated intra-second by other distributors used for
feed bound to the same bucket.
This is analogous to the existing "safe time period" functionality used for
handling external feed, and uses the same max clock skew config as this.
Correctness of this grace period is therefore inherently dependent on actual
cluster clock skew being less than this configured number.
Bucket activations are still allowed to take place during the grace period time
window, as these do not mutate the bucket contents and are therefore safe.
|
|
|
|
|
|
|
| |
transitions
Maintenance inhibition is already present, but it happens at a much lower level.
Add a high-level test to ensure that the wiring works as expected.
|
|\
| |
| |
| |
| | |
vespa-engine/vekterli/use-max-of-current-and-pending-distribution-bit-counts
Use max instead of min from current and pending cluster states' distribution bit counts [run-systemtest]
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
bit counts
Using min() has an unfortunate (very rare) edge case if a cluster goes _down_
in distribution bit counts (whether we really want to support this at all is
a different discussion, since it has some other unfortunate implications).
If the current state has e.g. 14 bits and the pending state has 8 bits, using 8
bits for `_distribution_bits` will trigger a `TooFewBucketBitsInUse` exception when
computing a cached ideal state for a bucket in the bucket DB. This is because the
ideal state algorithm is not defined for buckets using fewer bits than the state's
distribution bit count.
The cluster controller shall never push a cluster state with a distribution bit
count higher than the least split bucket across all nodes in the cluster, so
the cache lookup code should theoretically(tm) never be invoked with a bucket that
has fewer used bits than what's present in the pending state.
|
| | |
|
| | |
|
| | |
|
|/ |
|
|
|
|
|
|
|
| |
top-level distributor.
This replaces the previous hack (needed in legacy mode) that used
DistributorBucketSpaceRepo to achieve the same.
|
|
|
|
|
| |
Some assorted legacy bits and pieces still remain on the factory floor,
these will be cleaned up in follow-ups.
|
|
|
|
| |
be done
|
|\
| |
| |
| |
| | |
vespa-engine/vekterli/aggregate-pending-operation-stats-across-stripes
Aggregate pending operation stats across all stripes in stripe guard
|
| | |
|
|\ \
| |/
|/|
| |
| | |
vespa-engine/geirst/flip-to-new-distributor-stripe-code-path
Flip to always use the new distributor stripe code path.
|
| |
| |
| |
| | |
If the number of stripes is not configured, we tune it based on the sampled number of CPU cores.
|
|/ |
|
|
|
|
|
|
|
| |
Add a listener interface that lets the top-level distributor intercept
cluster state activations and use this for triggering the node init edge.
This happens when all stripes are paused so this is safe from data races.
Legacy code in the DistributorStripe remains for now.
|
| |
|
|\
| |
| |
| |
| | |
vespa-engine/vekterli/port-more-bucketdbupdater-tests
Port more BucketDBUpdater tests from legacy to new code path
|
| | |
|
|/ |
|
|\
| |
| |
| |
| | |
vespa-engine/vekterli/port-additional-tests-and-fix-regression
Port additional DB updater tests and fix delayed sending regression
|
| |
| |
| |
| |
| |
| |
| | |
Addresses a missing piece of functionality in the new code path where
queued bucket rechecks during a pending cluster state time window would
not be sent as expected when the pending state has been completed and
activated.
|
|\ \
| |/
|/|
| |
| | |
vespa-engine/geirst/main-distributor-thread-tick-wait-duration
Increase tick wait duration for main distributor thread when running …
|
| |
| |
| |
| |
| |
| |
| | |
multiple stripes.
This because it will no longer be running background maintenance jobs
(non-event tick will instead be used primarily for resending full bucket fetches etc).
|