aboutsummaryrefslogtreecommitdiffstats
path: root/clustercontroller-core
Commit message (Collapse)AuthorAgeFilesLines
* Revert "Revert "Aressem/remove post install script""Arnstein Ressem2017-09-271-0/+2
|
* Revert "Aressem/remove post install script"Arnstein Ressem2017-09-271-2/+0
|
* Remove global install of files and put this in the modules that owns them.Arnstein Ressem2017-09-251-0/+2
|
* Merge branch 'master' into bratseth/nonfunctional-changes-4Arne Juul2017-09-224-4/+23
|\ | | | | | | | | Conflicts: vespajlib/src/main/java/com/yahoo/concurrent/lock/Locks.java
| * Temporarily disable set-node-state version ACK dependencyTor Brede Vekterli2017-09-202-3/+6
| | | | | | | | | | | | Effectively reverts to legacy behavior while some more thinking is done on how to deal with blocking requests during leader elections and non-converging clusters.
| * Immediately complete remote tasks when not leaderTor Brede Vekterli2017-09-192-1/+17
| | | | | | | | | | | | Avoids edge case where set-node-state requests sent to followers would have their response delayed indefinitely due to controller not publishing any versions that the task's ACK barrier could be released by.
* | Merge with masterJon Bratseth2017-09-1550-57/+117
|/
* Refactor deferred version task completion to take in version explicitlyTor Brede Vekterli2017-09-121-13/+14
|
* Change wording for operations without observable side-effectsTor Brede Vekterli2017-09-123-22/+23
|
* Break node version ACK check out into separately called logicTor Brede Vekterli2017-09-123-14/+49
| | | | | | | | Removes dependency on having to invoke broadcastNewState before being able to observe that all distributors are in sync. Invocations of broadcastNewState are gated by a grace period between each time, so unless this is done we get artificial delays before a synchronous task can be considered complete.
* Test multiple scheduled synchronous tasksTor Brede Vekterli2017-09-112-1/+19
|
* Move leadership test code into fixtureTor Brede Vekterli2017-09-111-11/+22
|
* Test automatic task failing on controller leadership lossTor Brede Vekterli2017-09-113-11/+54
|
* Add support for version ACK-dependent tasks in cluster controllerTor Brede Vekterli2017-09-119-18/+426
| | | | | | | | | Used to enable synchronous operation for set-node-state calls, which ensure that side-effects of the call are visible when the response returns. If controller leadership is lost before state is published, tasks will be failed back to the client.
* Do not use import x.y.*;Henning Baldersheim2017-09-041-1/+10
|
* Try to shut down fleetcontroller in a controlled manner without relying on ↵Henning Baldersheim2017-09-041-21/+29
| | | | the infamous thread.interrupt.
* Update copyright headersJon Bratseth2017-06-14158-156/+158
|
* Revert "Update copyright headers"Jon Bratseth2017-06-14158-158/+156
|
* Update copyright headersJon Bratseth2017-06-14158-156/+158
|
* Remove carriage returnJon Bratseth2017-06-144-4/+4
|
* Revert "Copyright header"Jon Bratseth2017-06-13158-163/+160
|
* Copyright headerJon Bratseth2017-06-13158-160/+163
|
* Ignore test that hangs on mac.gjoranv2017-06-121-0/+1
|
* Merge pull request #2494 from yahoo/hakon/adds-safe-setting-of-wanted-state-downhakonhall2017-05-235-46/+256
|\ | | | | Safely set storage node to DOWN
| * Extract common checkHåkon Hallingstad2017-05-212-23/+31
| |
| * Dedup test codeHåkon Hallingstad2017-05-211-64/+46
| |
| * Verify version and reported stateHåkon Hallingstad2017-05-212-11/+76
| |
| * Safely set storage node to DOWNHåkon Hallingstad2017-05-185-33/+188
| | | | | | | | | | | | | | Setting a storage node to DOWN is considered safe if it can be permantenly set down (e.g. removed from the application): - The node is RETIRED - There are no managed buckets
* | Don't reset interrupt flagTor Brede Vekterli2017-05-221-1/+0
| |
* | Write to ZooKeeper must be timing invariantTor Brede Vekterli2017-05-222-4/+17
| | | | | | | | | | | | Previously could risk that state transition grace period would elide write to ZooKeeper if state changes happened within previous grace period.
* | Merge pull request #2506 from yahoo/arnej/remove-extra-gitignoreJon Bratseth2017-05-191-0/+0
|\ \ | | | | | | Arnej/remove extra gitignore
| * | remove old unused ignoresArne Juul2017-05-191-0/+0
| |/
* / Always write new cluster state versions to ZooKeeperTor Brede Vekterli2017-05-126-52/+95
|/ | | | | | | | | | | Previously, the controller would not write the version to ZK unless the version was published to at least one node. This could lead to problems due to un-written version numbers being visible via the controller's REST APIs. External observers could see versions that were not present in ZK and that would not be stable across reelections. As a consequence, invariants for strictly increasing version numbers would be violated from the perspective of these external observers (in particular, our system test framework).
* Log ZooKeeper cluster state version reads and writes with INFO levelTor Brede Vekterli2017-05-091-2/+4
|
* Improve Spec APIHåkon Hallingstad2017-02-226-20/+26
| | | | | | - Removes Spec.getLocalHostName - Removes distinction between listening- and connect- address for Spec - Makes all usage of connect w/Spec specify hostname
* Makes clustercontroller-core work on WiFiHåkon Hallingstad2017-02-205-45/+79
|
* Use relative URLs in Cluster Controller status pageHåkon Hallingstad2017-02-174-19/+17
|
* Add/improve README'sJon Bratseth2017-01-191-0/+5
|
* Merge pull request #1301 from yahoo/bratseth/indexed-tensorJon Bratseth2016-12-131-0/+1
|\ | | | | Bratseth/indexed tensor
| * MapTensor -> MappedTensorJon Bratseth2016-12-121-0/+1
| |
* | Use latest candidate cluster state when comparing against reported node statesTor Brede Vekterli2016-12-093-1/+52
|/ | | | | | | | | | | Using just the versioned cluster state instead can cause the code to erroneously believe that it is seeing repeated reported state changes for the first time. This happens when the diffs in the reported node states are not in and by themselves enough to trigger a new cluster state version containing the changes. This can in turn spam the logs and event buffers until a new cluster state has been versioned.
* Reduce disconnect errors to wraning as they are likely during shutdown.Henning Baldersheim2016-10-121-3/+2
|
* Rewrite and refactor core cluster controller state generation logicTor Brede Vekterli2016-10-0538-1378/+3626
| | | Cluster controller will now generate the new cluster state on-demand in a "pure functional" way instead of conditionally patching a working state over time. This makes understanding (and changing) the state generation logic vastly easier than it previously was.
* Yahoo sets up mac wireless networks such that the local hostname points to anJon Bratseth2016-09-291-2/+1
| | | | | | ip which does not resolve. This works around that problem by finding a resolvable address (while still falling back to localhost if we only get ipv6 addresses, as that causes other problems in docker containers).
* Need to figure out what to do with the tests using DockerOperationsHåkon Hallingstad2016-09-011-0/+2
|
* Less verbose/duplicate logging per fetched child node from ZooKeeperTor Brede Vekterli2016-07-041-2/+2
|
* Always request data for all znodes on master election dir watch callbackTor Brede Vekterli2016-07-011-24/+9
| | | | | | | | | | | | | The previous version of the code attempted to optimize by only requesting node data for nodes that had changed, but there existed an edge case where it would mistakenly fail to request new data for nodes that _had_ changed. This could happen if the callback was invoked when nextMasterData already contained entries for the same set of node indices returned as part of the directory callback. Always clearing our internal state and requesting all znodes is a more robust option. The number of cluster controllers should always be so low that the expected added overhead is negligible.
* Merge pull request #56 from yahoo/vekterli/configurable-group-auto-takedownTor Brede Vekterli2016-06-2716-176/+1410
|\ | | | | Add configurable automatic group up/down feature based on node availability
| * Clarify predicate on isRpcAddressOutdated() for clearing node stateTor Brede Vekterli2016-06-221-4/+14
| | | | | | | | | | | | Logic is unchanged, but added comment with rationale and cross-reference to other method that we're trying to be symmetrical with in terms of state transition behavior.
| * Don't reintroduce already observed timestamps in cluster stateTor Brede Vekterli2016-06-172-9/+60
| | | | | | | | Also address code review comments.