Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Remove usage of the @Component annotation for jersey resources. | gjoranv | 2021-07-08 | 1 | -11/+4 |
| | |||||
* | Revert "Revert "Send setNodeState to CC even if storage node is down"" | Håkon Hallingstad | 2021-05-14 | 6 | -35/+32 |
| | |||||
* | Revert "Send setNodeState to CC even if storage node is down" | Håkon Hallingstad | 2021-05-14 | 6 | -32/+35 |
| | |||||
* | Send setNodeState to CC even if storage node is down | Håkon Hallingstad | 2021-05-14 | 6 | -35/+32 |
| | | | | | | Note that this will be wrong if there is a single CC colocated with a content node, i.e. there's only one content node in the cluster, and the CC is down. In this case an operator must intervene to bring up the CC. | ||||
* | Allow fifty percent of host-admin cluster nodes down in cd-like systems | Harald Musum | 2021-05-03 | 9 | -21/+60 |
| | |||||
* | More lazy debug log message generation | Jon Marius Venstad | 2021-04-28 | 2 | -15/+15 |
| | |||||
* | Use restapi test driver in existing unit tests | Bjørn Christian Seime | 2021-04-23 | 4 | -127/+87 |
| | |||||
* | Rename class to follow convention of similar exception types | Bjørn Christian Seime | 2021-04-13 | 3 | -9/+9 |
| | |||||
* | Use RestApiException instead of old JAX-RS equivalent | Bjørn Christian Seime | 2021-04-13 | 1 | -3/+1 |
| | |||||
* | Rename class to match naming convention of other handlers | Bjørn Christian Seime | 2021-04-12 | 2 | -14/+14 |
| | |||||
* | Convert remaining JAX-RS resources to request handlers | Bjørn Christian Seime | 2021-04-12 | 10 | -599/+693 |
| | |||||
* | Rewrite HealthResource as request handler | Bjørn Christian Seime | 2021-04-09 | 1 | -30/+27 |
| | |||||
* | Convert HostSuspensionResource to request handler | Bjørn Christian Seime | 2021-03-26 | 4 | -120/+197 |
| | |||||
* | Decouple orchestrator resources into separate rest-api definitions | Bjørn Christian Seime | 2021-03-26 | 8 | -18/+26 |
| | |||||
* | Use Object::equals on non-enums | Håkon Hallingstad | 2021-03-23 | 1 | -1/+1 |
| | |||||
* | Require 3 config server (and controller) hosts | Håkon Hallingstad | 2021-03-23 | 3 | -10/+68 |
| | | | | | | | | | We already require 3 config server (and controller) nodes, but it is not sufficient to protect the hosts from being left with only 1 healthy host: Say the config server host application contains 2 nodes. An upgrade of host-admin on one of those nodes is allowed, since only the host is suspended and none of the 2 nodes are down. This is fixed by handling config server hosts similar to config servers: assume 3 nodes. | ||||
* | Remove duplicate headers | Jon Bratseth | 2021-03-18 | 1 | -1/+1 |
| | |||||
* | Add copyright headers | Jon Bratseth | 2021-03-18 | 1 | -1/+2 |
| | |||||
* | Merge pull request #16928 from ↵ | Martin Polden | 2021-03-15 | 4 | -1/+329 |
|\ | | | | | | | | | vespa-engine/hakonhall/test-orchestration-of-config-server-reprovisioning Test orchestration of config server reprovisioning | ||||
| * | Test orchestration of config server reprovisioning | Håkon Hallingstad | 2021-03-12 | 4 | -1/+329 |
| | | |||||
* | | Remove redundant logging (and log text generation) | Jon Marius Venstad | 2021-03-12 | 1 | -6/+1 |
|/ | |||||
* | Revert "Revert "Enable group suspension by default [run-systemtest]"" | Håkon Hallingstad | 2021-02-26 | 1 | -2/+2 |
| | |||||
* | Avoid sleeping for 10s during unit test | Jon Marius Venstad | 2021-02-22 | 2 | -17/+20 |
| | |||||
* | Use special orchestrator context for mass probe | Jon Marius Venstad | 2021-02-20 | 3 | -2/+9 |
| | |||||
* | Implement isQuiescent by probing for M for all content services | Jon Marius Venstad | 2021-02-19 | 7 | -46/+146 |
| | |||||
* | Remove unused code, and fix doc | Jon Marius Venstad | 2021-02-19 | 5 | -25/+2 |
| | |||||
* | Obtain quiescence status from Orchestrator | Jon Marius Venstad | 2021-02-19 | 1 | -0/+2 |
| | |||||
* | Revert "Enable group suspension by default [run-systemtest]" | Arnstein Ressem | 2021-02-16 | 1 | -2/+2 |
| | |||||
* | Enable group suspension by default [run-systemtest] | Håkon Hallingstad | 2021-02-16 | 1 | -2/+2 |
| | |||||
* | Allow one node for both branches | Harald Musum | 2021-02-08 | 1 | -5/+2 |
| | |||||
* | Allow only one node down at a time for cluster controller clusters | Harald Musum | 2021-02-08 | 1 | -1/+1 |
| | | | | | | | | | If we are replacing nodes and have 3 old and 3 new nodes in a cluster, allowing 50 per cent down would lead to 3 being allowed to go down. When deploying to remove the 3 old nodes there might be (for a short time) config that says that there should be 6 nodes in the zookeeper cluster. This mean 4 of them need to be up for the cluster to have a quorum. This will not work in this case. | ||||
* | Actually test new codepath | Håkon Hallingstad | 2021-01-25 | 1 | -2/+3 |
| | |||||
* | Support delegating content node suspension to cluster controller | Håkon Hallingstad | 2021-01-22 | 9 | -77/+198 |
| | | | | | | | | | | | | | | | | | | | | | | | This PR introduces a new flag group-suspension, which if true, enables: - Instead of allowing at most one storagenode to suspend at any given time, it will now ignore storagenode, searchnode, and distributor service clusters, and rely on the cluster controller to allow or deny the request to suspend. This will increase the load on the cluster controllers. Combined with earlier changes to the cluster controller, this new flag effectively guard the feature of allowing all nodes within a hierarchical group to suspend concurrently. I also took the opportunity to tune related policies: - Allow at most one config server and controller to be down at any given time. This is actually a no-op, since it was effectivelly equal to the older policy of 10% down. - Allows 20% of all host-admins to be down, not just tenant host-admins. This is effectively equal to the old policy of 10% except that it may allow 2 proxy host-admins to go down at the same time. Should be fine. | ||||
* | Always use permanently down status | Håkon Hallingstad | 2021-01-18 | 4 | -39/+14 |
| | |||||
* | Remove debug code | Jon Marius Venstad | 2021-01-11 | 1 | -2/+0 |
| | |||||
* | Add suspension mojo | Jon Marius Venstad | 2021-01-11 | 2 | -0/+3 |
| | |||||
* | Log host status removals | Håkon Hallingstad | 2021-01-06 | 2 | -1/+4 |
| | |||||
* | Revert "Revert "Bjorncs/config convergence checker preps"" | Bjørn Christian Seime | 2020-11-25 | 2 | -0/+2 |
| | |||||
* | Revert "Bjorncs/config convergence checker preps" | Arnstein Ressem | 2020-11-25 | 2 | -2/+0 |
| | |||||
* | Deprecate VespaClientBuilderFactory + VespaJerseyJaxRsClientFactory | Bjørn Christian Seime | 2020-11-24 | 2 | -0/+2 |
| | |||||
* | Revert "Bjorncs/rewrite config convergence checker client" | Jon Marius Venstad | 2020-11-10 | 2 | -2/+0 |
| | |||||
* | Deprecate VespaClientBuilderFactory + VespaJerseyJaxRsClientFactory | Bjørn Christian Seime | 2020-11-09 | 2 | -0/+2 |
| | |||||
* | Revert "Bjorncs/rewrite config convergence checker client" | Harald Musum | 2020-11-09 | 2 | -2/+0 |
| | |||||
* | Deprecate VespaClientBuilderFactory + VespaJerseyJaxRsClientFactory | Bjørn Christian Seime | 2020-11-09 | 2 | -0/+2 |
| | |||||
* | Revert "Bjorncs/rewrite config convergence checker client" | Jon Marius Venstad | 2020-11-07 | 2 | -2/+0 |
| | |||||
* | Deprecate VespaClientBuilderFactory + VespaJerseyJaxRsClientFactory | Bjørn Christian Seime | 2020-11-06 | 2 | -0/+2 |
| | |||||
* | Remove locating code | Håkon Hallingstad | 2020-10-20 | 1 | -0/+3 |
| | |||||
* | Close orchestrator locks | Håkon Hallingstad | 2020-10-20 | 3 | -60/+99 |
| | |||||
* | Remove hack to listen to ephemeral port | Bjørn Christian Seime | 2020-10-16 | 1 | -15/+1 |
| | |||||
* | Move lock metrics to MetricsReporter | Håkon Hallingstad | 2020-10-03 | 1 | -1/+1 |
| | | | | | | | | | | | | | | | Adds two new metrics: - The load of acquiring each lock path: The average number of threads waiting to acquire the lock within the last minute (or unit of time). Aka the lock queue (depth). - The load of the lock for each lock path: The average number of threads holding the lock within the last minute (or unit of time). This is always <= 1. Aka the lock utilization. Changes the LockCounters to LockMetrics, and exporting those once every minute through MetricReporter which is designed for this. |