aboutsummaryrefslogtreecommitdiffstats
path: root/storage
Commit message (Collapse)AuthorAgeFilesLines
* Add guard against lockstep wait in ApplyBucketDiffState::wait().Tor Egge2021-10-181-1/+7
| | | | Add guard against throwing exception out of ApplyBucketDiffState destructor.
* Add class representing async state for applying bucket diff to local node.Tor Egge2021-10-159-88/+301
|
* Merge pull request #19556 from ↵Tor Brede Vekterli2021-10-149-16/+89
|\ | | | | | | | | vespa-engine/vekterli/add-metric-for-max-time-since-bucket-gc Add metric for max time since bucket GC was last run
| * Add metric for max time since bucket GC was last runTor Brede Vekterli2021-10-149-16/+89
| | | | | | | | | | | | Max time is aggregated across all buckets. If this metric value grows substantially larger than the configured GC period it indicates that GC is being starved.
* | Merge pull request #19547 from ↵Geir Storli2021-10-145-33/+111
|\ \ | | | | | | | | | | | | vespa-engine/toregge/add-detailed-metrics-for-failed-merge-operations Add detailed metrics for failed merge operations.
| * | Use ASSERT_NO_FATAL_FAILURE() to propagate fatal failures.Tor Egge2021-10-141-6/+6
| | |
| * | Add detailed metrics for failed merge operations.Tor Egge2021-10-145-33/+111
| |/
* | Use blocking scheduling semantics for bucket activation maintenanceTor Brede Vekterli2021-10-143-4/+23
| | | | | | | | | | | | We consider bucket maintenance so latency critical that we'll prefer to stall scheduling of subsequent buckets instead of risking having to re-scan the DB to encounter the bucket again.
* | Make implicit bucket priority DB clearing on scheduling configurableTor Brede Vekterli2021-10-148-16/+90
| |
* | Don't let a blocked maintenance operation inhibit remaining maintenance queueTor Brede Vekterli2021-10-149-72/+150
|/ | | | | | | | | | | | | | | | | | | | | | | | The old maintenance scheduler behavior is to only remove a bucket from the priority DB if its maintenance operation was successfully started. Failing to start an operation could happen from both max pending throttling as well as operation/bucket-specific blocking behavior. Since the scheduler would encounter the same bucket as the one previously blocked upon its next tick invocation, a single blocked bucket would run the risk of head-of-line stalling the rest of the remaining maintenance queue (assuming the ongoing DB scan did not encounter any higher priority buckets). This commit changes the following aspects of maintenance scheduling: * Always clear entries from the priority DB before trying to start an operation. A blocked operation will be retried the next time the regular bucket DB scan encounters the bucket. * Avoid trying to start (and clear) inherently doomed operations by _not_ trying to schedule any operations if it would be blocked due to too many pending maintenance operations anyway. Introduces a new `PendingWindowChecker` interface for this purpose. * Explicitly inhibit all maintenance scheduling if a pending cluster state is present. Operations are already _implicitly_ blocked from starting if there's a pending cluster state, but this would cause the priority DB from being pointlessly cleared.
* Add metrics for blocked and throttled operations.Tor Egge2021-10-1312-6/+117
|
* Merge pull request #19517 from ↵Tor Brede Vekterli2021-10-123-114/+0
|\ | | | | | | | | vespa-engine/toregge/remove-dead-code-in-put-operation Remove dead code in PutOperation.
| * Remove dead code in PutOperation.Tor Egge2021-10-123-114/+0
| |
* | Update metrics when we don't have source-only bucket copies.Tor Egge2021-10-121-2/+2
|/
* Hint about NTPJon Bratseth2021-10-071-1/+1
|
* Update Verizon Media copyright notices.gjoranv2021-10-07129-129/+129
|
* Update 2018 copyright notices.gjoranv2021-10-0714-14/+14
|
* Update 2017 copyright notices.gjoranv2021-10-07468-468/+468
|
* Minor MergeThrottler code cleanups. No functional changes.Tor Brede Vekterli2021-10-063-185/+79
|
* Fix typo in config descriptionTor Brede Vekterli2021-10-051-1/+1
| | | Co-authored-by: Geir Storli <geirst@yahooinc.com>
* Make ignoring queue limit for forwarded merges configurableTor Brede Vekterli2021-10-054-6/+46
|
* Do not busy-bounce merges forwarded from other nodesTor Brede Vekterli2021-10-042-2/+34
| | | | | | | | | | | | Let any merge through that has already passed through at least one other node's merge window, as that has already taken up a logical resource slot on all those nodes. Busy-bouncing a merge at that point would undo a great amount of time already expended. The max number of enqueued merges is bounded by the number of nodes in the system, as each node can still only accept a configurable number of merges from distributors, and each distributor throttles their maintenance operations based on priority-relative max pending limits.
* Add noexcept specifier to PendingOperationStats constructor.Tor Egge2021-10-031-1/+1
|
* Expose aggregated low-level data movement statistics as metricsTor Brede Vekterli2021-09-287-23/+62
| | | | | | | | | | | | | | | | Adds metrics for the following: * Bucket replicas that should be moved out, e.g. retirement case or node added to cluster that has higher ideal state priority. * Bucket replicas that should be copied out, e.g. node is in ideal state but might have to provide data other nodes in a merge. * Bucket replicas that should be copied in, e.g. node does not have a replica for a bucket that it is in ideal state for * Bucket replicas that need syncing due to mismatching metadata. These are aggregates across all bucket replicas, buckets and bucket spaces. Should aid in visibility for data movement during node retirements when there are concurrent replicas out of sync events.
* Fix off-by-one assertion for intra-second timestamp overflow sanity checkTor Brede Vekterli2021-09-271-1/+1
|
* Add grace period inhibiting maintenance after state transitions with bucket ↵Tor Brede Vekterli2021-09-2711-25/+149
| | | | | | | | | | | | | | | | ownership transfer Avoids the case where different distributors can start merges with a max timestamp that is lower than timestamps generated intra-second by other distributors used for feed bound to the same bucket. This is analogous to the existing "safe time period" functionality used for handling external feed, and uses the same max clock skew config as this. Correctness of this grace period is therefore inherently dependent on actual cluster clock skew being less than this configured number. Bucket activations are still allowed to take place during the grace period time window, as these do not mutate the bucket contents and are therefore safe.
* Add high-level test that maintenance is inhibited during pending state ↵Tor Brede Vekterli2021-09-241-0/+19
| | | | | | | transitions Maintenance inhibition is already present, but it happens at a much lower level. Add a high-level test to ensure that the wiring works as expected.
* Merge pull request #19267 from ↵Tor Brede Vekterli2021-09-231-1/+1
|\ | | | | | | | | vespa-engine/vekterli/use-max-of-current-and-pending-distribution-bit-counts Use max instead of min from current and pending cluster states' distribution bit counts [run-systemtest]
| * Use max instead of min from current and pending cluster states' distribution ↵Tor Brede Vekterli2021-09-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bit counts Using min() has an unfortunate (very rare) edge case if a cluster goes _down_ in distribution bit counts (whether we really want to support this at all is a different discussion, since it has some other unfortunate implications). If the current state has e.g. 14 bits and the pending state has 8 bits, using 8 bits for `_distribution_bits` will trigger a `TooFewBucketBitsInUse` exception when computing a cached ideal state for a bucket in the bucket DB. This is because the ideal state algorithm is not defined for buckets using fewer bits than the state's distribution bit count. The cluster controller shall never push a cluster state with a distribution bit count higher than the least split bucket across all nodes in the cluster, so the cache lookup code should theoretically(tm) never be invoked with a bucket that has fewer used bits than what's present in the pending state.
* | use path in config includesArne H Juul2021-09-221-2/+2
| |
* | allow generated PB files outside source treeArne H Juul2021-09-221-1/+1
| |
* | Remove TODO that we won't fix.Geir Storli2021-09-211-1/+0
| |
* | Remove unused use_bucket_db parameter.Geir Storli2021-09-214-9/+7
|/
* Use BucketSpaceStateMap to track cluster state and distribution in the ↵Geir Storli2021-09-2019-128/+112
| | | | | | | top-level distributor. This replaces the previous hack (needed in legacy mode) that used DistributorBucketSpaceRepo to achieve the same.
* Remove most traces of distributor legacy mode.Tor Brede Vekterli2021-09-2021-6066/+151
| | | | | Some assorted legacy bits and pieces still remain on the factory floor, these will be cleaned up in follow-ups.
* Address low-hanging TODO fruit and remove stuff that's either done or won't ↵Tor Brede Vekterli2021-09-1611-33/+20
| | | | be done
* Merge pull request #19164 from ↵Tor Brede Vekterli2021-09-163-3/+30
|\ | | | | | | | | vespa-engine/vekterli/aggregate-pending-operation-stats-across-stripes Aggregate pending operation stats across all stripes in stripe guard
| * Aggregate pending operation stats across all stripes in stripe guardTor Brede Vekterli2021-09-163-3/+30
| |
* | Merge pull request #19162 from ↵Geir Storli2021-09-163-2/+33
|\ \ | |/ |/| | | | | vespa-engine/geirst/flip-to-new-distributor-stripe-code-path Flip to always use the new distributor stripe code path.
| * Flip to always use the new distributor stripe code path.Geir Storli2021-09-163-2/+33
| | | | | | | | If the number of stripes is not configured, we tune it based on the sampled number of CPU cores.
* | Use ideal state cache when populating StateChecker contextTor Brede Vekterli2021-09-151-3/+2
|/
* Move initializing handling to top-level distributorTor Brede Vekterli2021-09-1411-20/+98
| | | | | | | Add a listener interface that lets the top-level distributor intercept cluster state activations and use this for triggering the node init edge. This happens when all stripes are paused so this is safe from data races. Legacy code in the DistributorStripe remains for now.
* Port final batch of BucketDBUpdater tests from legacy to top-level code pathsTor Brede Vekterli2021-09-135-19/+698
|
* Merge pull request #19076 from ↵Tor Brede Vekterli2021-09-138-38/+1201
|\ | | | | | | | | vespa-engine/vekterli/port-more-bucketdbupdater-tests Port more BucketDBUpdater tests from legacy to new code path
| * Port more BucketDBUpdater tests from legacy to new code pathTor Brede Vekterli2021-09-108-38/+1201
| |
* | Rename test functions to be aligned with distributor stripe functionality.Geir Storli2021-09-091-4/+4
|/
* Merge pull request #19023 from ↵Geir Storli2021-09-095-0/+173
|\ | | | | | | | | vespa-engine/vekterli/port-additional-tests-and-fix-regression Port additional DB updater tests and fix delayed sending regression
| * Port additional DB updater tests and fix delayed sending regressionTor Brede Vekterli2021-09-085-0/+173
| | | | | | | | | | | | | | Addresses a missing piece of functionality in the new code path where queued bucket rechecks during a pending cluster state time window would not be sent as expected when the pending state has been completed and activated.
* | Merge pull request #19022 from ↵Geir Storli2021-09-091-1/+3
|\ \ | |/ |/| | | | | vespa-engine/geirst/main-distributor-thread-tick-wait-duration Increase tick wait duration for main distributor thread when running …
| * Increase tick wait duration for main distributor thread when running with ↵Geir Storli2021-09-081-1/+3
| | | | | | | | | | | | | | multiple stripes. This because it will no longer be running background maintenance jobs (non-event tick will instead be used primarily for resending full bucket fetches etc).