| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Uses the same "out of sync ratio" value as is currently exposed
as a metric and through the State V2 REST API, but rendered as
a percentage to be more human-friendly.
|
|
|
|
|
|
| |
The cluster controller has no notion of the nature of these events
(could just be a benign upgrade cycle), so don't paint it with a scary
red color that implies something is wrong in the cluster.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With these changes the cluster controller continuously maintains
a global aggregate across all content nodes that represents the
number of pending and total buckets per bucket space. This aggregate
can be sampled in O(1) time.
An explicit metric `cluster-buckets-out-of-sync-ratio` has been
added, and the value is also emitted as part of the cluster state
REST API. Note: only emitted when statistics have been received
from _all_ distributors for a particular cluster state, as it would
otherwise potentially represent a state somewhere arbitrary between
two or more distinct states.
|
| |
|
| |
|
| |
|
|
|
|
| |
spotbugs SuppressWarning annotation.
|
| |
|
|
|
|
| |
Bucket count should have been pre-verified as present by the caller.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
node down edge
The stored entry count encompasses both visible documents and
tombstones. Using this count rather than bucket count avoids any
issues where a node only containing empty buckets (i.e. no actual
data) is prohibited from being marked as permanently down.
Entry count is cross-checked with the visible document count;
if the former is zero, the latter should always be zero as well.
Since entry/doc counts were only recently introduced as part of
the HostInfo payload, we have to handle the case where these do
not exist. If entry count is not present, the decision to allow
or disallow the transition falls back to the bucket count check.
|
|
|
|
|
|
|
| |
vespa-engine/revert-29678-jonmv/reapply-zk-3.9.1"
This reverts commit c8ece8b229362c7bf725e4433ef4fec86024cd29, reversing
changes made to d42b67f0fe821d122548a345f27fda7f9c9c9d10.
|
| |
|
|
|
|
|
|
|
| |
vespa-engine/revert-29671-jonmv/reapply-zk-3.9.1"
This reverts commit 28f8cf3e298d51ca703ceee36a992297d38637cc, reversing
changes made to 3a9f89fe60e3420eed435daee435a4f8534c9512.
|
| |
|
|
|
|
|
|
|
| |
vespa-engine/revert-29662-revert-29661-revert-29658-jonmv/zk-3.9.1-clients-2"
This reverts commit 9c8ba2608384ee79e143babd1e5a18a62166541f, reversing
changes made to 954785e4eb91286bd166c304e98042ec63b7eb84.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
| |
The fake impl acts "as if" a single-node ZK quorum is present, so it
cannot be directly used with most multi-node tests that require
multiple nodes to actually participate in leader elections.
|
|
|
|
|
|
|
|
|
|
| |
ZK will by default preallocate 65536 * 1024 bytes for its
write-ahead log file. This will happen for every test instantiation
of the ZooKeeper database. Now, I like wearing out SSD flash cells
as much as the next guy, but this just feels silly.
Input number is always multiplied by 1024, so reduce down to
64 to get a 64k preallocation instead.
|
| |
|
|
|
|
|
|
| |
of the ObjectMapper.
Unless special options are used, use a common instance, or create via factory metod.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Messages now prefixed with content cluster name to help disambiguate
which cluster is exceeding its limits in multi-cluster deployments.
Example message:
```
in content cluster 'my-cool-cluster': disk on node 1 [my-node-1.example.com] is 81.0% full
(the configured limit is 80.0%). See https://docs.vespa.ai/en/operations/feed-block.html
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Messages are generated centrally by the cluster controller and pushed
to content nodes as part of a cluster state bundle; the distributors
nodes merely repeat back what they have been told. This changes the
cluster controller feed block error message code to be less ambiguous
and to include a URL to our public documentation about feed blocks.
Example of _old_ message:
```
disk on node 1 [storage.1.local] (0.510 > 0.500)
```
Same feed block with _new_ message:
```
disk on node 1 [storage.1.local] is 51.0% full (the configured limit is 50.0%).
See https://docs.vespa.ai/en/operations/feed-block.html
```
|
| |
|
| |
|
|\
| |
| | |
New parent pom
|
| | |
|
|/ |
|
| |
|
|
|
|
|
| |
This means we will also check distributors that are on same node as
a retired storage node
|
| |
|
| |
|
|
|
|
|
|
|
| |
When we allow several groups to go down for maintenance we should check
nodes in the groups that are up if they have the required redundancy. They
might be up but have not yet synced all buckets after coming up. We want
to wait with allowing more nodes to be taken down until that is done.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|