aboutsummaryrefslogtreecommitdiffstats
path: root/clustercontroller-core
Commit message (Collapse)AuthorAgeFilesLines
* Fix comparison between manually deployed package and not, and remove ↵Jon Marius Venstad2022-01-131-1/+1
| | | | outdated safeguard
* GC use of deprecated junit assertThat and unifyHenning Baldersheim2021-12-2115-104/+84
|
* Failing to find zk system state aborts tick [run-systemtest]Håkon Hallingstad2021-12-142-14/+15
|
* Use FleetControllerContext in ZooKeeperDatabaseHåkon Hallingstad2021-12-1316-110/+195
|
* Update 2019 Oath copyrights.gjoranv2021-10-274-4/+4
|
* Log version to complete remote taskHåkon Hallingstad2021-10-253-2/+12
| | | | | | | | | | | | | | Normally, if a SetNodeStateRequest changes the state of a node, scheduleVersionDependentTasksForFutureCompletion(FleetController.java:1003) will ensure that the request waits for the successful publication of the next cluster state version before returning 200. There are reasons to believe there is an edge case, likely triggered by losing the ZooKeeper connection just prior to trying to set the new wanted state in ZK, that makes scheduleVersionDependentTasksForFutureCompletion() complete the request at the current version. This PR will make it possible to prove or disprove the theory.
* Remove config generation -1/0 from CC at :19050/status/<clustername>/configHåkon Hallingstad2021-10-209-43/+22
|
* Revert changes to config generationHåkon Hallingstad2021-10-209-29/+45
|
* Fixes after review roundHåkon Hallingstad2021-10-1912-95/+90
|
* Improve logging of FleetController and DatabaseHandlerHåkon Hallingstad2021-10-1520-255/+370
|
* Some optimizations of RpcServerTestHåkon Hallingstad2021-10-144-22/+31
|
* Reduce running time of MasterElectionTest from 28 to 12sHåkon Hallingstad2021-10-142-1/+11
|
* Update Verizon Media copyright notices.gjoranv2021-10-0768-68/+68
|
* Update 2018 copyright notices.gjoranv2021-10-0732-32/+32
|
* Update 2017 copyright notices.gjoranv2021-10-0799-99/+99
|
* Revert "Revert "Avoid copying data just to compress them when it is not ↵Henning Baldersheim2021-08-301-3/+6
| | | | necessary.""
* Revert "Avoid copying data just to compress them when it is not necessary."Henning Baldersheim2021-08-301-6/+3
|
* Use explicit import.Henning Baldersheim2021-08-301-1/+5
|
* Update ↵Henning Baldersheim2021-08-301-1/+0
| | | | | | | | clustercontroller-core/src/main/java/com/yahoo/vespa/clustercontroller/core/rpc/SlimeClusterStateBundleCodec.java Remove unused Co-authored-by: Harald Musum <musum@verizonmedia.com>
* Avoid copying data just to compress them when it is not necessary.Henning Baldersheim2021-08-301-1/+1
|
* Add metric for didWork in FleetController tickHåkon Hallingstad2021-06-255-64/+83
|
* drop empty buffers instead of using small buffersHåvard Pettersen2021-06-153-3/+3
|
* No functional changesJon Bratseth2021-06-011-1/+1
|
* GC some unused methods and simplifyHenning Baldersheim2021-05-231-2/+5
|
* Set forkCount paramter for maven-surefire-plugin to speed up testsgjoranv2021-05-141-0/+7
|
* Let the supervisor owner set the small buffer optionJon Marius Venstad2021-05-031-0/+1
|
* One more lazyJon Marius Venstad2021-04-281-1/+2
|
* More lazy debug log message generationJon Marius Venstad2021-04-2814-133/+123
|
* Reapply "add more logging" (new and updated slobrok logging)Arne Juul2021-04-212-0/+2
| | | | This reverts commit 9aa3d6fe6567e3eee9108d6fffbc50d5874e72e3.
* Revert "add more logging"Harald Musum2021-04-202-2/+0
|
* track API change in mockArne Juul2021-04-192-0/+2
|
* Improve test namesHåkon Hallingstad2021-04-161-4/+4
|
* Disallow >1 group to suspendHåkon Hallingstad2021-04-165-27/+277
| | | | | | | If there is more than one group, disallow suspending a node if there is a node in another group that has a user wanted state != UP. If there is 1 group, disallow suspending more than 1 node.
* No longer allow suspension if in maintenanceHåkon Hallingstad2021-04-153-17/+14
| | | | | | If a storage node falls out of Slobrok, it will change from UP to Maintenance after 60s, then after further 30s go to Down. Avoid allowing suspension in the 30s grace period just because it is Maintenance mode.
* Merge branch 'master' into hmusum/cleanup-7Harald Musum2021-04-086-22/+32
|\
| * Add remote task queue size metric in cluster controllerHåkon Hallingstad2021-04-016-22/+32
| |
* | Cleanup tests a bitHarald Musum2021-04-083-43/+49
| |
* | Fix typo in class nameHarald Musum2021-04-081-1/+1
|/
* Log when transitioning out of CC moratoriumHåkon Hallingstad2021-03-261-6/+3
|
* Make default deadline to first broadcast 30sHåkon Hallingstad2021-03-243-3/+5
|
* Revert "Revert "Avoid safe mutations in master moratorium and increase first ↵Håkon Hallingstad2021-03-2413-17/+71
| | | | cluster state broadcast deadline [run-systemtest]""
* Revert "Avoid safe mutations in master moratorium and increase first cluster ↵Håkon Hallingstad2021-03-2413-71/+17
| | | | state broadcast deadline [run-systemtest]"
* Merge pull request #17085 from ↵Håkon Hallingstad2021-03-2413-17/+71
|\ | | | | | | | | vespa-engine/hakonhall/increase-the-minimum-time-before-first-cluster-state-broadcast-run-systemtest Avoid safe mutations in master moratorium and increase first cluster state broadcast deadline [run-systemtest]
| * Avoid safe-set-node-state in master moratoriumHåkon Hallingstad2021-03-2412-16/+68
| |
| * Increase the minimum time before first cluster state broadcast [run-systemtest]Håkon Hallingstad2021-03-191-1/+3
| |
* | Revert deferred ZK connectivity for nowTor Brede Vekterli2021-03-223-21/+2
| | | | | | | | | | | | Instead, we'll want to create a more generalized solution that considers all sources of node information (Slobrok _and_ explicit health check RPCs) before potentially publishing a state or processing tasks.
* | Make sure to reset any election shortcuts if we go from !ZK -> ZKTor Brede Vekterli2021-03-191-5/+13
| |
* | Use local leader state for decisions rather than election handlerTor Brede Vekterli2021-03-191-5/+7
| | | | | | | | | | | | | | | | | | Avoids potentially publishing cluster states _before_ we have triggered our own leadership election edge handling code. Could happen if code called prior to the election edge logic checked the election handler state and erroneously thought we had performed the prerequisite actions we're supposed to do when assuming leadership (such as reading back current state from ZK).
* | Don't allow short-circuiting election phase if only one node configured if ↵Tor Brede Vekterli2021-03-192-2/+10
| | | | | | | | using ZK
* | Inhibit ZooKeeper connections until our local Slobrok mirror is ready.Tor Brede Vekterli2021-03-196-2/+41
|/ | | | | | | | Otherwise, if there are transient Slobrok issues during CC startup and we end up winning the election, we risk publishing a cluster state where the entire cluster appears down (since we do not have any knowledge of Slobrok node mapping state). This will adversely affect availability for all the obvious reasons.