Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Merge pull request #16494 from ↵ | Håkon Hallingstad | 2021-02-12 | 3 | -14/+55 |
|\ | | | | | | | | | vespa-engine/hakonhall/also-deny-maintenance-when-another-node-is-in-maintenance Also deny maintenance when another node is in maintenance | ||||
| * | Add test | Håkon Hallingstad | 2021-02-12 | 2 | -11/+49 |
| | | |||||
| * | Also deny maintenance when another node is in maintenance | Håkon Hallingstad | 2021-02-12 | 2 | -4/+7 |
| | | | | | | | | | | | | | | The cluster controller today already denies setting a node X safely to maintenance M, if there is another node Y in another group that has wanted state M. Which means that if Y is in M but wanted state is not M, X is allowed to be set in M. This is an edge case which is rare. | ||||
* | | enableSmallBuffers -> useSmallBuffers | Henning Baldersheim | 2021-02-12 | 2 | -2/+2 |
| | | |||||
* | | Use small buffers where size matters more than speed. | Henning Baldersheim | 2021-02-12 | 2 | -2/+3 |
| | | |||||
* | | Support configurable feed block hysteresis on the cluster controller | Tor Brede Vekterli | 2021-02-10 | 8 | -11/+171 |
|/ | | | | | | | | | | | | | | | | | | | Adds an absolute number delta that is subtracted from the feed block limit when a node has a resource already in feed blocked state. This means that there's a lower watermark threshold that must be crossed before feeding can be unblocked. Avoids flip-flopping between block states. Default is currently 0.0, i.e. effectively disabled. To be modified later for system tests and trial roll-outs. A couple of caveats with the current implementation: * The cluster state is not recomputed automatically when just the hysteresis threshold is crossed, so the description will be out of date on the content nodes. However, if any other feed block event happens (or the hysteresis threshold is crossed), the state will be recomputed as expected. This does not affect correctness, since the feed is still to be blocked. * A node event remove/add pair is emitted for feed block status when the hysteresis threshold is crossed and there's a cluster state recomputation. | ||||
* | Cleanup: Remove unnecessary and unused methods, simplify | Harald Musum | 2021-02-08 | 8 | -68/+45 |
| | |||||
* | Minor cleanup, no functional changes | Harald Musum | 2021-02-08 | 9 | -62/+44 |
| | |||||
* | Add extra row under content node if it's blocking feed | Tor Brede Vekterli | 2021-02-08 | 2 | -0/+37 |
| | | | | List all resource exhaustions for node, including enum store etc. | ||||
* | Merge pull request #16424 from ↵ | Geir Storli | 2021-02-08 | 7 | -30/+226 |
|\ | | | | | | | | | vespa-engine/geirst/resource-usage-metrics-in-cluster-controller Add and expose resource usage metrics from the cluster controller. | ||||
| * | Nodes above limit should only count each node once. | Geir Storli | 2021-02-08 | 2 | -2/+16 |
| | | |||||
| * | Add and expose resource usage metrics from the cluster controller. | Geir Storli | 2021-02-05 | 7 | -30/+212 |
| | | |||||
* | | Show feed block options on cluster controller status page | Tor Brede Vekterli | 2021-02-05 | 1 | -0/+7 |
|/ | |||||
* | Add resource usage per node to cluster controller status page | Tor Brede Vekterli | 2021-02-04 | 3 | -1/+56 |
| | | | | | Also adds top-level cluster feed block status. Does not yet make enum store/multivalue limit feed blocks visible per node. | ||||
* | Emit node-level events when resource exhaustion set changes | Tor Brede Vekterli | 2021-02-03 | 9 | -44/+138 |
| | |||||
* | Recompute cluster state if set of resource exhaustions changes | Tor Brede Vekterli | 2021-02-02 | 6 | -18/+127 |
| | | | | | Ensures that feed block description pushed to nodes is updated as further resource exhaustions are recorded (or disappear). | ||||
* | Add hostname to resource exhaustion description | Tor Brede Vekterli | 2021-01-29 | 5 | -19/+58 |
| | | | | Hostname is inferred from the node's RPC address | ||||
* | Support optional resource usage name field | Tor Brede Vekterli | 2021-01-29 | 6 | -24/+84 |
| | | | | | If present, will be reported alongside the resource type in the feed block description string. | ||||
* | add maybePublishOldMetrics hook | Arne Juul | 2021-01-28 | 1 | -0/+16 |
| | |||||
* | Add cluster feed block support to cluster controller | Tor Brede Vekterli | 2021-01-27 | 15 | -4/+583 |
| | | | | | | | | | | | | | | | Will push out a new cluster state bundle indicating cluster feed blocked if one or more nodes in the cluster has one or more resources exhausted. Similarly, a new state will be pushed out once no nodes have resources exhausted any more. The feed block description currently contains up to 3 separate exhausted resources, possibly across multiple nodes. A cluster-level event is emitted for both the block and unblock edges. No hysteresis is present yet, so if a node is oscillating around a block-limit, so will the cluster state. | ||||
* | Dummy change to trigger systemtest | Håkon Hallingstad | 2021-01-21 | 1 | -1/+0 |
| | |||||
* | Allows setting a node safely to maintenance in these two new circumstances: | Håkon Hallingstad | 2021-01-21 | 8 | -56/+302 |
| | | | | | | | | | | | | | 1. The node has state MAINTENANCE with (user) wanted state UP. 2. There are other nodes in the same hierarchical group that are set in MAINTENANCE with the same description. Also made the following change. 3. Deny a request for safe MAINTENANCE or DOWN, if the wanted state is already set but with a different description. If the descriptions are the same, it is assumed to be the same operator (e.g. Orchestrator) having changed its mind. | ||||
* | Support group maintenance [run-systemtest] | Håkon Hallingstad | 2021-01-19 | 6 | -25/+216 |
| | |||||
* | Add feed block propagation to ClusterStateBundle in Java | Tor Brede Vekterli | 2021-01-15 | 4 | -19/+173 |
| | | | | | | Fully forwards and backwards compatible. Currently only supports indicating feed blocked status for the entire cluster, with one associated descriptive message intended to be used by distributors. | ||||
* | Change isMaster to updateMasterState | Ola Aunrønning | 2021-01-04 | 2 | -2/+2 |
| | |||||
* | Always set 'is-master' metric. Remove 'master-change' | Ola Aunrønning | 2021-01-04 | 2 | -10/+3 |
| | |||||
* | Track explicitly when we are initializing config | Jon Bratseth | 2020-12-16 | 1 | -1/+3 |
| | |||||
* | Remove code in StatusPageServer, keep some inner classes temporarily | Harald Musum | 2020-11-19 | 4 | -655/+14 |
| | |||||
* | Simplify | Harald Musum | 2020-11-19 | 3 | -13/+16 |
| | |||||
* | Rename method | Harald Musum | 2020-11-18 | 1 | -18/+9 |
| | |||||
* | Remove unused method | Harald Musum | 2020-11-18 | 1 | -11/+3 |
| | |||||
* | Add back @Ignore and add timeout for tests | Harald Musum | 2020-10-02 | 1 | -0/+5 |
| | |||||
* | Cleanup | Harald Musum | 2020-10-02 | 3 | -44/+33 |
| | |||||
* | Name the transport threads to understand how things are interconnected. | Henning Baldersheim | 2020-08-04 | 3 | -3/+3 |
| | |||||
* | use fixed port for rebinding test | Arne Juul | 2020-07-06 | 1 | -1/+1 |
| | |||||
* | Downgrade log level when node watch races with internal state resets | Tor Brede Vekterli | 2020-05-05 | 1 | -1/+4 |
| | | | | | | | This appears to happen if pending data node watches are triggered with failures after the cluster controller has already reset the internal voting state due to connectivity issues. Internal state is rebuilt automatically, so no need to scream errors into the logs for this. | ||||
* | Add dependency to vespalog (directly used by these modules) | gjoranv | 2020-04-26 | 1 | -1/+7 |
| | |||||
* | Use correct log Level class where search & replace has failed. | gjoranv | 2020-04-25 | 1 | -1/+1 |
| | |||||
* | Replace remaining LogLevel.<level> with corresponding Level | gjoranv | 2020-04-25 | 5 | -16/+16 |
| | |||||
* | Map remaining DEBUG/SPAM/ERROR/FATAL -> Level.FINE/FINEST/SEVERE | gjoranv | 2020-04-25 | 5 | -9/+9 |
| | |||||
* | LogLevel -> Level for isLoggable() | gjoranv | 2020-04-25 | 3 | -9/+9 |
| | |||||
* | LogLevel.ERROR -> Level.SEVERE | gjoranv | 2020-04-25 | 3 | -8/+8 |
| | |||||
* | LogLevel.WARNING -> Level.WARNING | gjoranv | 2020-04-25 | 10 | -32/+32 |
| | |||||
* | LogLevel.INFO -> Level.INFO | gjoranv | 2020-04-25 | 13 | -80/+80 |
| | |||||
* | LogLevel.SPAM -> Level.FINEST | gjoranv | 2020-04-25 | 9 | -14/+14 |
| | |||||
* | LogLevel.DEBUG -> Level.FINE | gjoranv | 2020-04-25 | 17 | -146/+146 |
| | |||||
* | Import java.util.logging.Level instead of com.yahoo.log.LogLevel | gjoranv | 2020-04-25 | 24 | -24/+24 |
| | |||||
* | Merge branch 'master' into hakonhall/remove-use-bucket-space-metric-feature-flag | Håkon Hallingstad | 2020-04-20 | 1 | -2/+3 |
|\ | |||||
| * | Increase ZooKeeper test server tick time | Tor Brede Vekterli | 2020-03-27 | 1 | -2/+3 |
| | | | | | | | | | | | | | | | | | | | | Tests running under heavy concurrent CI build load appear to struggle maintaining ZK sessions, causing flakiness. This is an attempt to mitigate that by bumping the ZK server's tick interval (and therefore session timeout) by 20x. According to local testing, this does not increase testing time in the common case. | ||||
* | | Remove stray parameter doc | Håkon Hallingstad | 2020-01-27 | 1 | -4/+2 |
| | |