aboutsummaryrefslogtreecommitdiffstats
path: root/clustercontroller-core/src/test/java/com/yahoo/vespa/clustercontroller/core/StateChangeTest.java
Commit message (Collapse)AuthorAgeFilesLines
* Use fake ZooKeeper database implementation for subset of CC testsTor Brede Vekterli2023-12-041-3/+5
| | | | | | The fake impl acts "as if" a single-node ZK quorum is present, so it cannot be directly used with most multi-node tests that require multiple nodes to actually participate in leader elections.
* Update copyrightJon Bratseth2023-10-091-1/+1
|
* Simplify and remove some test methodsHarald Musum2023-06-011-2/+2
|
* Require non-null zooKeeperServerAddress in FleetControllerOptionsHarald Musum2023-06-011-113/+114
|
* Simplify and minor cleanupHarald Musum2023-05-151-125/+133
|
* Create slobrok in constructor and simplify setupHarald Musum2023-05-121-10/+2
|
* Inject timer from test classes instead of inheritingHarald Musum2023-05-121-13/+10
|
* Remove testname and logging related to starting and stoppingHarald Musum2023-05-121-8/+0
| | | | Not used, reintroduce using junit TestInfo class if needed
* StatusPageServerInterface has just one implementation, simplifyHarald Musum2022-12-281-3/+1
|
* Cleanup by using supervisor in superclassHarald Musum2022-09-151-19/+3
|
* SimplifyHarald Musum2022-09-151-23/+23
|
* Use a list of fleet controllers in test superclassHarald Musum2022-09-081-10/+10
|
* Remove support for 'setsystemstate2' RPC method in cluster controllerHarald Musum2022-09-051-2/+2
|
* Exgtract CleanupZookeeperLogsOnSuccess into its own classHarald Musum2022-09-011-0/+2
|
* Use node type instead of boolean, simplifyHarald Musum2022-09-011-2/+2
|
* Make FleetControllerOptions immutable and support builder patternHarald Musum2022-08-311-155/+156
|
* Require non-null status page serverHarald Musum2022-08-291-1/+4
|
* Remove unused method and classHarald Musum2022-08-151-1/+1
|
* Make sure to get timeout in seconds as a doubleHarald Musum2022-08-121-9/+9
|
* Use one timeout and cleanup timeout usage a bitHarald Musum2022-08-111-9/+9
|
* Remove the need for storing an instance variable in FleetControllerTestHarald Musum2022-08-111-6/+8
|
* Simplify and cleanupHarald Musum2022-08-101-1/+1
|
* Convert clustercontroller-core to junit5Bjørn Christian Seime2022-07-291-153/+158
|
* Trigger saveWantedState when nodes are removed or orphaned wanted states are ↵Håkon Hallingstad2022-04-201-2/+2
| | | | loaded
* Use FleetControllerContext in ZooKeeperDatabaseHåkon Hallingstad2021-12-131-3/+3
|
* Fixes after review roundHåkon Hallingstad2021-10-191-1/+1
|
* Improve logging of FleetController and DatabaseHandlerHåkon Hallingstad2021-10-151-14/+15
|
* Update 2017 copyright notices.gjoranv2021-10-071-1/+1
|
* Add remote task queue size metric in cluster controllerHåkon Hallingstad2021-04-011-1/+1
|
* Shrink the size of the NodeState object by using float over double for ↵Henning Baldersheim2021-03-111-19/+19
| | | | initProgress and capacity. Also gc unused 'reliability' member.
* Enforce that no cluster state can be published unless confirmed written to ↵Tor Brede Vekterli2021-02-261-2/+2
| | | | | | | | | | | | | | | | | ZooKeeper This avoids a subtle edge case where the underlying ZK integration code may fail silently a write, leaving the core controller logic to think that it had actually durably persisted a particular state version. In case of reelections racing with broadcasts, it would be possible for leader-edge readbacks from ZK to retrieve a _lower_ version than one that had already been published. This would cause the cluster controller to get very confused about which cluster states nodes had already observed. If a newly produced state version overlapped with a previously broadcast state, the controller would not push the updated state to the nodes, as it would (with good reason) assume the node had already observed it, seeing that it had already ACKed the particular version number.
* Remove unused aguments and methodsHarald Musum2021-02-211-3/+2
|
* Remove use-bucket-space-metric feature flagHåkon Hallingstad2020-01-261-1/+1
| | | | | | | | | | The flag controlled config read by the Cluster Controller. Therefore, I have left the ModelContextImpl.Properties method and implementation (now always returning true), but the model has stopped using that method internally, and the config is no longer used in the CC. The field in the fleetcontroller.def is left unchanged and documented as deprecated.
* Use bucket_space metric in retirementHåkon Hallingstad2020-01-171-1/+1
| | | | | | | | | | | | This makes the Cluster Controller use the vds.datastored.bucket_space.buckets_total, dimension bucketSpace=default, to determine whether a content node manages zero buckets, and if so, will allow the node to go permanently down. This is used when a node is retiring, and it is to be removed from the application. The change is guarded by the use-bucket-space-metric, default true. If the new metric doesn't work as expected, we can revert to using the current/old metric by flipping the flag. The flag can be controlled per application.
* Add non-converged nodes to task deadline exceeded messagesTor Brede Vekterli2019-11-041-10/+48
| | | | | Makes it easier for an external observer to understand what set of nodes is causing the cluster state to not converge.
* Move grace period event edge from timer to event diff calculatorTor Brede Vekterli2019-10-301-6/+9
| | | | | | | | Ensures that event is only emitted when we're actually publishing a state containing the state transition. Emitting events in the timer code is fragile when it does not modify any state, risking emitting the same event an unbounded amount of times if the condition keeps holding for each timer cycle.
* Cleanup tests, no functional changesHarald Musum2019-09-031-23/+30
|
* Activation reply processing must inspect actual version returnedTor Brede Vekterli2019-03-211-1/+0
| | | | | | | | Version mismatches in backend do not return explicit RPC errors, so actual vs. desired versions must be checked in order to avoid potentially spurious activation of other versions. Also do some minor code cleanup.
* Explicitly enable two-phase transitions in tests, disable in default optionsTor Brede Vekterli2019-03-201-24/+24
| | | | Mirrors the default values in the actual underlying config definitions.
* Support configurable two-phase state transitions in cluster controllerTor Brede Vekterli2019-03-141-3/+16
|
* ZooKeeper-persist and load published cluster state bundlesTor Brede Vekterli2018-04-241-3/+5
| | | | | | | | | | | | | | | | Store synchronously upon each new versioned state, load whenever controller is elected master. Effectively carries over visible node states from one controller's lifetime to the next. This removes the edge case where default bucket space content nodes are marked as in Maintainence until their global merge status is known. To avoid controller tripping over its own feet, state bundles are now _not_ versioned at all until the initial send time period has passed. This prevents overwriting the state persisted from a previous controller with a transient state where all nodes are down due to not having Slobrok contact yet. A new cluster state recompute+send edge has been added when the master passes its initial state send time period.
* Add configurable deadline for cluster controller tasksTor Brede Vekterli2017-09-251-1/+36
| | | | | | Prevents an unstable cluster from potentially holding up all container request processing threads indefinitely. Deadline errors are translated into HTTP 504 errors to REST API clients.
* Immediately complete failed remote tasksTor Brede Vekterli2017-09-251-0/+18
| | | | | | We check both for master status and task failure, as we otherwise place a potentially dangerous silent dependency on the task always failing itself if the controller is not a master.
* Immediately complete remote tasks when not leaderTor Brede Vekterli2017-09-191-0/+15
| | | | | | Avoids edge case where set-node-state requests sent to followers would have their response delayed indefinitely due to controller not publishing any versions that the task's ACK barrier could be released by.
* Change wording for operations without observable side-effectsTor Brede Vekterli2017-09-121-17/+17
|
* Break node version ACK check out into separately called logicTor Brede Vekterli2017-09-121-0/+19
| | | | | | | | Removes dependency on having to invoke broadcastNewState before being able to observe that all distributors are in sync. Invocations of broadcastNewState are gated by a grace period between each time, so unless this is done we get artificial delays before a synchronous task can be considered complete.
* Test multiple scheduled synchronous tasksTor Brede Vekterli2017-09-111-0/+18
|
* Move leadership test code into fixtureTor Brede Vekterli2017-09-111-11/+22
|
* Test automatic task failing on controller leadership lossTor Brede Vekterli2017-09-111-8/+52
|
* Add support for version ACK-dependent tasks in cluster controllerTor Brede Vekterli2017-09-111-1/+178
| | | | | | | | | Used to enable synchronous operation for set-node-state calls, which ensure that side-effects of the call are visible when the response returns. If controller leadership is lost before state is published, tasks will be failed back to the client.