| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
If there is more than one group, disallow suspending a node if there is a node
in another group that has a user wanted state != UP.
If there is 1 group, disallow suspending more than 1 node.
|
|
|
|
|
|
| |
If a storage node falls out of Slobrok, it will change from UP to Maintenance
after 60s, then after further 30s go to Down. Avoid allowing suspension in the
30s grace period just because it is Maintenance mode.
|
| |
|
| |
|
| |
|
|
|
|
| |
cluster state broadcast deadline [run-systemtest]""
|
|
|
|
| |
state broadcast deadline [run-systemtest]"
|
|\
| |
| |
| |
| | |
vespa-engine/hakonhall/increase-the-minimum-time-before-first-cluster-state-broadcast-run-systemtest
Avoid safe mutations in master moratorium and increase first cluster state broadcast deadline [run-systemtest]
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| | |
Instead, we'll want to create a more generalized solution that considers
all sources of node information (Slobrok _and_ explicit health check RPCs)
before potentially publishing a state or processing tasks.
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Avoids potentially publishing cluster states _before_ we have triggered
our own leadership election edge handling code. Could happen if code
called prior to the election edge logic checked the election handler
state and erroneously thought we had performed the prerequisite actions
we're supposed to do when assuming leadership (such as reading back
current state from ZK).
|
| |
| |
| |
| | |
using ZK
|
|/
|
|
|
|
|
|
| |
Otherwise, if there are transient Slobrok issues during CC startup and
we end up winning the election, we risk publishing a cluster state where
the entire cluster appears down (since we do not have any knowledge of
Slobrok node mapping state). This will adversely affect availability for
all the obvious reasons.
|
|
|
|
|
| |
Since version 0 states were ambiguous with the sentinel values for
"not written to ZK/not tagged as official", this could be mis-interpreted.
|
| |
|
| |
|
|\
| |
| |
| |
| | |
vespa-engine/vekterli/inhibit-db-connectivity-until-slobrok-is-ready
Inhibit ZooKeeper connections until our local Slobrok mirror is ready.
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| | |
Otherwise, if there are transient Slobrok issues during CC startup and
we end up winning the election, we risk publishing a cluster state where
the entire cluster appears down (since we do not have any knowledge of
Slobrok node mapping state). This will adversely affect availability for
all the obvious reasons.
|
| |
| |
| |
| | |
level.""
|
| | |
|
|/ |
|
|\
| |
| |
| |
| | |
vespa-engine/vekterli/dont-store-full-bundle-objects-in-state-history
Don't store full bundle objects in state history
|
| | |
|
| |
| |
| |
| |
| | |
Always print regardless of leader eligibility state; config is not
predicated on this.
|
| |
| |
| |
| |
| | |
Much less immediately interesting than the actual cluster node information.
Move it just above the general event log instead.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Bundles have a lot of sub-objects per state, so in systems with a
high amount of node entries, this adds unnecessary pressure on the
heap. Instead, store the string representations of the bundle and
the string representation of the diff to the previous state version
(if any). This is also inherently faster than computing the diffs
on-demand on every status page render.
Also remove mutable `official` field from `ClusterState`. Not worth
violating immutability of an object just to get some prettier (but
with high likelihood actually more confusing) status page rendering.
|
|/ |
|
|\
| |
| | |
GC unused DiskState
|
| | |
|
| | |
|
|\ \
| |/
|/| |
Add shared ZK client config generator for zkfacade and vespa-zkcli [run-systemtest]
|
| |
| |
| |
| | |
Use ZKClientConfig builder from Curator and ZooKeeperDatabase
|
| | |
|
| | |
|
|/ |
|
|
|
|
|
| |
We do not support live reconfigs of CC index, so swiftly exit if we
detect this, allowing the config sentinel to restart the service.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds the following safeguards/improvements:
- Do not clear pending (non-persisted) writes over a `connect()` edge.
Avoids having the controller eternally wait for a doomed pending
write to be completed when it has no other events that can trigger
a new write.
- Trigger `lostDatabaseConnection()` whenever ZK is reconfigured to
ensure we reload the newest state before trying to compute/publish
any new states.
- Explicitly drop leadership in `lostDatabaseConnection()` to immediately
prevent controller from trying any funny leader-related business
since it no longer can depend on ZK watches triggering.
- When falling back to default state/cluster bundle, ensure that any
subsequent dependent znode write is predicated on the pre-existing
znode version being 0, i.e. did not previously exist.
|
| |
|
|\
| |
| |
| |
| | |
vespa-engine/geirst/cluster-controller-resouce-usage-limits-metrics
Expose resource usage metrics for disk and memory limits for feed blo…
|
| | |
|
|/ |
|