| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| | |
vespa-engine/revert-16934-revert-16932-balder/move-metrics-from-partition-to-node-level
Revert "Revert "GC unused DiskState and add the partition metrics to node level.""
|
| | |
|
| |
| |
| |
| | |
level.""
|
|/ |
|
| |
|
| |
|
| |
|
|\
| |
| |
| |
| | |
vespa-engine/vekterli/dont-store-full-bundle-objects-in-state-history
Don't store full bundle objects in state history
|
| | |
|
| |
| |
| |
| |
| | |
Always print regardless of leader eligibility state; config is not
predicated on this.
|
| |
| |
| |
| |
| | |
Much less immediately interesting than the actual cluster node information.
Move it just above the general event log instead.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Bundles have a lot of sub-objects per state, so in systems with a
high amount of node entries, this adds unnecessary pressure on the
heap. Instead, store the string representations of the bundle and
the string representation of the diff to the previous state version
(if any). This is also inherently faster than computing the diffs
on-demand on every status page render.
Also remove mutable `official` field from `ClusterState`. Not worth
violating immutability of an object just to get some prettier (but
with high likelihood actually more confusing) status page rendering.
|
|/ |
|
|\
| |
| | |
GC unused DiskState
|
| | |
|
| | |
|
|\ \
| |/
|/| |
Add shared ZK client config generator for zkfacade and vespa-zkcli [run-systemtest]
|
| |
| |
| |
| | |
Use ZKClientConfig builder from Curator and ZooKeeperDatabase
|
| | |
|
| | |
|
| | |
|
|/
|
|
| |
initProgress and capacity. Also gc unused 'reliability' member.
|
|
|
|
| |
These types are often accidentally imported, and the JDK8 replacement is typically a one-liner.
|
|\
| |
| |
| |
| | |
vespa-engine/vekterli/immediately-exit-cc-if-node-index-changed-live
Immediately exit cluster controller if node index config is changed live
|
| |
| |
| |
| |
| | |
We do not support live reconfigs of CC index, so swiftly exit if we
detect this, allowing the config sentinel to restart the service.
|
|\ \
| |/
|/| |
Bjorncs/upgrade zk client [run-systemtest]
|
| |
| |
| |
| | |
Ensure extra required ZK dependencies are present on test classpath
|
| | |
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds the following safeguards/improvements:
- Do not clear pending (non-persisted) writes over a `connect()` edge.
Avoids having the controller eternally wait for a doomed pending
write to be completed when it has no other events that can trigger
a new write.
- Trigger `lostDatabaseConnection()` whenever ZK is reconfigured to
ensure we reload the newest state before trying to compute/publish
any new states.
- Explicitly drop leadership in `lostDatabaseConnection()` to immediately
prevent controller from trying any funny leader-related business
since it no longer can depend on ZK watches triggering.
- When falling back to default state/cluster bundle, ensure that any
subsequent dependent znode write is predicated on the pre-existing
znode version being 0, i.e. did not previously exist.
|
| |
|
|\
| |
| |
| |
| | |
vespa-engine/geirst/cluster-controller-resouce-usage-limits-metrics
Expose resource usage metrics for disk and memory limits for feed blo…
|
| | |
|
|/ |
|
| |
|
| |
|
|
|
|
|
| |
Available nodes here mean nodes that are reported as Up/Initializing
and where the wanted state is Up/Retired.
|
|
|
|
| |
indirectly
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ZooKeeper
This avoids a subtle edge case where the underlying ZK integration code
may fail silently a write, leaving the core controller logic to think that
it had actually durably persisted a particular state version.
In case of reelections racing with broadcasts, it would be possible for
leader-edge readbacks from ZK to retrieve a _lower_ version than one
that had already been published. This would cause the cluster controller
to get very confused about which cluster states nodes had already observed.
If a newly produced state version overlapped with a previously broadcast
state, the controller would not push the updated state to the nodes, as it
would (with good reason) assume the node had already observed it, seeing
that it had already ACKed the particular version number.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|\
| |
| |
| |
| | |
vespa-engine/hakonhall/also-deny-maintenance-when-another-node-is-in-maintenance
Also deny maintenance when another node is in maintenance
|
| | |
|
| |
| |
| |
| |
| |
| |
| | |
The cluster controller today already denies setting a node X safely to
maintenance M, if there is another node Y in another group that has wanted
state M. Which means that if Y is in M but wanted state is not M, X is allowed
to be set in M. This is an edge case which is rare.
|
| | |
|
| | |
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds an absolute number delta that is subtracted from the feed block limit
when a node has a resource already in feed blocked state. This means that
there's a lower watermark threshold that must be crossed before feeding
can be unblocked. Avoids flip-flopping between block states.
Default is currently 0.0, i.e. effectively disabled. To be modified
later for system tests and trial roll-outs.
A couple of caveats with the current implementation:
* The cluster state is not recomputed automatically when just the hysteresis
threshold is crossed, so the description will be out of date on the
content nodes. However, if any other feed block event happens (or the
hysteresis threshold is crossed), the state will be recomputed as expected.
This does not affect correctness, since the feed is still to be blocked.
* A node event remove/add pair is emitted for feed block status when the
hysteresis threshold is crossed and there's a cluster state recomputation.
|