summaryrefslogtreecommitdiffstats
path: root/clustercontroller-core
Commit message (Collapse)AuthorAgeFilesLines
...
* Split parent + container-dependency-versions from root pom.gjoranv2017-11-291-0/+1
| | | | | | - Add missing dependencies so that all provided non-yahoo jars are listed in container-dependency-versions. - Add relativePath for all child poms of parent.
* Log when a cluster state version is publishedTor Brede Vekterli2017-10-301-0/+1
| | | | | | Makes it much easier to reason about which state transitions have been made visible in the cluster, and which ones have just been internal state transitions in the controller.
* Update wanted state on description changes, and fix method namesHåkon Hallingstad2017-10-241-13/+17
|
* Also set the distributor wanted state when safe-setting the storage node stateHåkon Hallingstad2017-10-214-12/+247
| | | | | | | | | | This is done as part of the SAFE REST API call to set the node state of a storage node to ensure atomicity of the state change, reduce the number of state changes, and minimize the time to complete the state changes. The right way to think about the safe-set is then: In order to safely set a storage node to (e.g.) maintenance, the distributor will also have to be set to down. And so on for the various permutations of state transitions.
* Ignore current wanted state when safely setting state to upHåkon Hallingstad2017-10-202-10/+5
|
* Merge pull request #3525 from ↵Tor Brede Vekterli2017-10-128-21/+165
|\ | | | | | | | | vespa-engine/vekterli/re-enable-synchronous-set-node-state Re-enable synchronous set node state with additional safeguards
| * Add configurable deadline for cluster controller tasksTor Brede Vekterli2017-09-257-14/+113
| | | | | | | | | | | | Prevents an unstable cluster from potentially holding up all container request processing threads indefinitely. Deadline errors are translated into HTTP 504 errors to REST API clients.
| * Immediately complete failed remote tasksTor Brede Vekterli2017-09-256-8/+53
| | | | | | | | | | | | We check both for master status and task failure, as we otherwise place a potentially dangerous silent dependency on the task always failing itself if the controller is not a master.
* | Config-retired should not override explicit Down or Maintenance statesTor Brede Vekterli2017-10-124-6/+62
| | | | | | | | | | | | Previously, a config-retired node marked as Down by the Orchestrator would remain as Retired in the cluster state until the node was actually taken down entirely.
* | Avoid busy-looping when distributors fail to ACK state versionTor Brede Vekterli2017-10-112-7/+3
| |
* | Revert "Revert "Aressem/remove post install script""Arnstein Ressem2017-09-271-0/+2
| |
* | Revert "Aressem/remove post install script"Arnstein Ressem2017-09-271-2/+0
| |
* | Remove global install of files and put this in the modules that owns them.Arnstein Ressem2017-09-251-0/+2
|/
* Merge branch 'master' into bratseth/nonfunctional-changes-4Arne Juul2017-09-224-4/+23
|\ | | | | | | | | Conflicts: vespajlib/src/main/java/com/yahoo/concurrent/lock/Locks.java
| * Temporarily disable set-node-state version ACK dependencyTor Brede Vekterli2017-09-202-3/+6
| | | | | | | | | | | | Effectively reverts to legacy behavior while some more thinking is done on how to deal with blocking requests during leader elections and non-converging clusters.
| * Immediately complete remote tasks when not leaderTor Brede Vekterli2017-09-192-1/+17
| | | | | | | | | | | | Avoids edge case where set-node-state requests sent to followers would have their response delayed indefinitely due to controller not publishing any versions that the task's ACK barrier could be released by.
* | Merge with masterJon Bratseth2017-09-1550-57/+117
|/
* Refactor deferred version task completion to take in version explicitlyTor Brede Vekterli2017-09-121-13/+14
|
* Change wording for operations without observable side-effectsTor Brede Vekterli2017-09-123-22/+23
|
* Break node version ACK check out into separately called logicTor Brede Vekterli2017-09-123-14/+49
| | | | | | | | Removes dependency on having to invoke broadcastNewState before being able to observe that all distributors are in sync. Invocations of broadcastNewState are gated by a grace period between each time, so unless this is done we get artificial delays before a synchronous task can be considered complete.
* Test multiple scheduled synchronous tasksTor Brede Vekterli2017-09-112-1/+19
|
* Move leadership test code into fixtureTor Brede Vekterli2017-09-111-11/+22
|
* Test automatic task failing on controller leadership lossTor Brede Vekterli2017-09-113-11/+54
|
* Add support for version ACK-dependent tasks in cluster controllerTor Brede Vekterli2017-09-119-18/+426
| | | | | | | | | Used to enable synchronous operation for set-node-state calls, which ensure that side-effects of the call are visible when the response returns. If controller leadership is lost before state is published, tasks will be failed back to the client.
* Do not use import x.y.*;Henning Baldersheim2017-09-041-1/+10
|
* Try to shut down fleetcontroller in a controlled manner without relying on ↵Henning Baldersheim2017-09-041-21/+29
| | | | the infamous thread.interrupt.
* Update copyright headersJon Bratseth2017-06-14158-156/+158
|
* Revert "Update copyright headers"Jon Bratseth2017-06-14158-158/+156
|
* Update copyright headersJon Bratseth2017-06-14158-156/+158
|
* Remove carriage returnJon Bratseth2017-06-144-4/+4
|
* Revert "Copyright header"Jon Bratseth2017-06-13158-163/+160
|
* Copyright headerJon Bratseth2017-06-13158-160/+163
|
* Ignore test that hangs on mac.gjoranv2017-06-121-0/+1
|
* Merge pull request #2494 from yahoo/hakon/adds-safe-setting-of-wanted-state-downhakonhall2017-05-235-46/+256
|\ | | | | Safely set storage node to DOWN
| * Extract common checkHåkon Hallingstad2017-05-212-23/+31
| |
| * Dedup test codeHåkon Hallingstad2017-05-211-64/+46
| |
| * Verify version and reported stateHåkon Hallingstad2017-05-212-11/+76
| |
| * Safely set storage node to DOWNHåkon Hallingstad2017-05-185-33/+188
| | | | | | | | | | | | | | Setting a storage node to DOWN is considered safe if it can be permantenly set down (e.g. removed from the application): - The node is RETIRED - There are no managed buckets
* | Don't reset interrupt flagTor Brede Vekterli2017-05-221-1/+0
| |
* | Write to ZooKeeper must be timing invariantTor Brede Vekterli2017-05-222-4/+17
| | | | | | | | | | | | Previously could risk that state transition grace period would elide write to ZooKeeper if state changes happened within previous grace period.
* | Merge pull request #2506 from yahoo/arnej/remove-extra-gitignoreJon Bratseth2017-05-191-0/+0
|\ \ | | | | | | Arnej/remove extra gitignore
| * | remove old unused ignoresArne Juul2017-05-191-0/+0
| |/
* / Always write new cluster state versions to ZooKeeperTor Brede Vekterli2017-05-126-52/+95
|/ | | | | | | | | | | Previously, the controller would not write the version to ZK unless the version was published to at least one node. This could lead to problems due to un-written version numbers being visible via the controller's REST APIs. External observers could see versions that were not present in ZK and that would not be stable across reelections. As a consequence, invariants for strictly increasing version numbers would be violated from the perspective of these external observers (in particular, our system test framework).
* Log ZooKeeper cluster state version reads and writes with INFO levelTor Brede Vekterli2017-05-091-2/+4
|
* Improve Spec APIHåkon Hallingstad2017-02-226-20/+26
| | | | | | - Removes Spec.getLocalHostName - Removes distinction between listening- and connect- address for Spec - Makes all usage of connect w/Spec specify hostname
* Makes clustercontroller-core work on WiFiHåkon Hallingstad2017-02-205-45/+79
|
* Use relative URLs in Cluster Controller status pageHåkon Hallingstad2017-02-174-19/+17
|
* Add/improve README'sJon Bratseth2017-01-191-0/+5
|
* Merge pull request #1301 from yahoo/bratseth/indexed-tensorJon Bratseth2016-12-131-0/+1
|\ | | | | Bratseth/indexed tensor
| * MapTensor -> MappedTensorJon Bratseth2016-12-121-0/+1
| |