| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
- Wire in configuration of number of rpc targets.
|
| |
|
| |
|
|
|
|
| |
instance.
|
| |
|
| |
|
|
|
|
| |
They are still present in LegacyDistributorTest as long as legacy mode exists.
|
|\
| |
| |
| |
| | |
vespa-engine/geirst/baseline-utils-and-tests-for-distributor-stripe
Prepare baseline utils and tests for a single distributor stripe.
|
| |
| |
| |
| |
| | |
This is copied from DistributorTestUtil and LegacyDistributorTest,
and adjusted to work with one distributor stripe.
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We would previously check for the presence of pending null-bucket
`RequestBucketInfoCommand`s to determine if a pending cluster state
was present. We would also attempt to block all bucket delete operations
from starting if _any_ operation was pending towards that bucket on
a given node, including bucket info requests. The former was rewritten
to instead explicitly consider pending cluster state checks instead,
as checking null buckets no longer works when using stripes.
Unfortunately, due to a long-standing bug with message tracking of
`RequestBucketInfoCommand`s, these would _always_ be marked as pending
towards the null bucket. Since all ideal state ops would be blocked
by null-bucket info requests, this would avoid starting any ideal
state op as long as _any_ other op had an info request pending for
the target node. This had the desirable (but not explicitly coded for)
side effect of inhibiting bucket deletions from racing with half-finished
merge operations. It also had the undesirable effect of needlessly
blocking ops for completely unrelated buckets.
With these changes, we now explicitly handle bucket info requests for
single buckets in the `PendingMessageTracker`, allowing inhibition
of deletions to work as expected. Also add an explicit check for
pending info requests for all ideal state ops to mirror the old
behavior (but now per-bucket instead of globally...!).
|
|\
| |
| |
| |
| | |
vespa-engine/geirst/fix-stripe-dispatch-of-request-bucket-info-reply
Dispatch RequestBucketInfoReply for non-existing buckets to correct d…
|
| |
| |
| |
| | |
distributor stripe.
|
| | |
|
|/ |
|
| |
|
|
|
|
| |
This is handled similarly to per stripe distributor metrics.
|
|\
| |
| |
| |
| | |
vespa-engine/toregge/aggregate-metrics-on-the-fly-when-adding-to-snapshot
Aggregate distributor metrics when adding to snapshot.
|
| | |
|
| | |
|
|/
|
|
| |
waiting for full Q
|
|
|
|
| |
This function is only used by idealstatemanagertest in the context of testing a single stripe.
|
|\
| |
| |
| |
| | |
vespa-engine/toregge/aggregate-metrics-from-distributor-stripes-pass-2
Aggregate metrics from distributor stripes.
|
| |
| |
| |
| | |
Reorder member variables.
|
| | |
|
|\ \
| |/
|/|
| |
| | |
vespa-engine/geirst/getnodestate-command-in-distributor-main-thread
Handle GetNodeStateCommand in distributor main thread when running in…
|
| |
| |
| |
| | |
stripe mode.
|
|/ |
|
|
|
|
| |
instead of sum.
|
|
|
|
| |
spaces, and pending maintenance stats.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since distributor stripes no longer have access to the top-level
pending message tracking info, it's no longer possible to infer if
a pending cluster state is happening by looking at the sent messages.
Instead, do this more generally (and efficiently) by looking at the
potential pending cluster state directly.
Rewire the `isBlocked` logic to take in an operation context instead
of just a `PendingMessageTracker`, giving it access to a lot more
relevant information.
|
|
|
|
|
|
| |
a random distributor stripe.
Such commands will eventually be bounced with WRONG_DISTRIBUTION when handled by the stripe.
|
| |
|
| |
|
|\
| |
| |
| |
| | |
vespa-engine/geirst/dispatch-get-and-visitor-messages-to-stripe
Dispatch get and visitor messages to correct distributor stripe.
|
| | |
|
|/
|
|
|
|
|
|
| |
hash lookup.
If it is a wildcard lookup iterate as earlier on.
Also use vespalib::stringref in interface to avoid conversion.
Use vespalib:string in the hash map to locate string in object aswe are still on old abi.
|
|
|
|
|
|
|
|
|
| |
The most basic functionality is now supported using multiple distributor stripes (and threads).
Note that the following is (at least) still missing:
* Stripe-separate metrics with top-level aggregation.
* Aggregation over all stripes in misc functions in Distributor that currently is using the first stripe.
* Handling of messages without bucket id in the top-level Distributor instead of using the first stripe.
|
|\
| |
| |
| |
| | |
vespa-engine/geirst/validate-distributor-stripes-config
Add validation of the number of distributor stripes from config and a…
|
| |
| |
| |
| |
| |
| | |
asserts.
This ensures the number of stripes is a power of 2 and within MaxStripes boundary.
|
| | |
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
| |
To avoid starvation of high priority global bucket merges, we do not consider
these for blocking due to a node being "busy" (usually caused by a full merge
throttler queue).
This is for two reasons:
1. When an ideal state op is blocked, it is still removed from the internal
maintenance priority queue. This means a blocked high pri operation will
not be retried until the next DB pass (at which point the node is likely
to still be marked as busy when there's heavy merge traffic).
2. Global bucket merges have high priority and will most likely be allowed
to enter the merge throttler queues, displacing lower priority merges.
|
|
|
|
| |
stripe in sequence.
|
|
|
|
| |
stripe bits.
|
| |
|
|
|
|
|
|
|
|
| |
- Ideal state ops cannot look at null-bucket messages for determining
if full bucket checks are pending when running in striped mode, as
these are not handled by stripes when not in legacy mode.
- State checker context should use ideal state cache instead of recomputing
for every checked bucket (observed via `perf` in production).
|
| |
|
|\
| |
| | |
Minor cleanups in distributor maintenance handling code
|
| |
| |
| |
| | |
No functional changes
|
|/ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
New behavior:
- Only allow time to travel forwards within a given distributor process'
lifetime. This is a change from the old behavior, which would emit a
warning to the logs and happily continue from a previously used second,
possibly causing the distributor to reuse timestamps.
- Try to detect cases where the wall clock has been transiently set far
into the future--only to bounce back--by aborting the process if the
current observed time is more than 120 seconds older than the highest
observed wall clock time. This is an attempt to avoid generating _too_
many bogus future timestamps, as the distributor would otherwise continue
generating timestamps within the highest observed second.
|