vespa - An engine for low-latency computation over large data sets

	Commit message (Collapse)	Author	Age	Files	Lines
*	Log when a cluster state version is published	Tor Brede Vekterli	2017-10-30	1	-0/+1
\| \| \| \| \| \|	Makes it much easier to reason about which state transitions have been made visible in the cluster, and which ones have just been internal state transitions in the controller.
*	Update wanted state on description changes, and fix method names	Håkon Hallingstad	2017-10-24	1	-13/+17
\|
*	Also set the distributor wanted state when safe-setting the storage node state	Håkon Hallingstad	2017-10-21	4	-12/+247
\| \| \| \| \| \| \| \| \| \|	This is done as part of the SAFE REST API call to set the node state of a storage node to ensure atomicity of the state change, reduce the number of state changes, and minimize the time to complete the state changes. The right way to think about the safe-set is then: In order to safely set a storage node to (e.g.) maintenance, the distributor will also have to be set to down. And so on for the various permutations of state transitions.
*	Ignore current wanted state when safely setting state to up	Håkon Hallingstad	2017-10-20	2	-10/+5
\|
*	Merge pull request #3525 from ↵	Tor Brede Vekterli	2017-10-12	8	-21/+165
\|\ \| \| \| \| \| \| \| \|	vespa-engine/vekterli/re-enable-synchronous-set-node-state Re-enable synchronous set node state with additional safeguards
\| *	Add configurable deadline for cluster controller tasks	Tor Brede Vekterli	2017-09-25	7	-14/+113
\| \| \| \| \| \| \| \| \| \| \| \|	Prevents an unstable cluster from potentially holding up all container request processing threads indefinitely. Deadline errors are translated into HTTP 504 errors to REST API clients.
\| *	Immediately complete failed remote tasks	Tor Brede Vekterli	2017-09-25	6	-8/+53
\| \| \| \| \| \| \| \| \| \| \| \|	We check both for master status and task failure, as we otherwise place a potentially dangerous silent dependency on the task always failing itself if the controller is not a master.
* \|	Config-retired should not override explicit Down or Maintenance states	Tor Brede Vekterli	2017-10-12	4	-6/+62
\| \| \| \| \| \| \| \| \| \| \| \|	Previously, a config-retired node marked as Down by the Orchestrator would remain as Retired in the cluster state until the node was actually taken down entirely.
* \|	Avoid busy-looping when distributors fail to ACK state version	Tor Brede Vekterli	2017-10-11	2	-7/+3
\|/
*	Merge branch 'master' into bratseth/nonfunctional-changes-4	Arne Juul	2017-09-22	4	-4/+23
\|\ \| \| \| \| \| \| \| \|	Conflicts: vespajlib/src/main/java/com/yahoo/concurrent/lock/Locks.java
\| *	Temporarily disable set-node-state version ACK dependency	Tor Brede Vekterli	2017-09-20	2	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Effectively reverts to legacy behavior while some more thinking is done on how to deal with blocking requests during leader elections and non-converging clusters.
\| *	Immediately complete remote tasks when not leader	Tor Brede Vekterli	2017-09-19	2	-1/+17
\| \| \| \| \| \| \| \| \| \| \| \|	Avoids edge case where set-node-state requests sent to followers would have their response delayed indefinitely due to controller not publishing any versions that the task's ACK barrier could be released by.
* \|	Merge with master	Jon Bratseth	2017-09-15	50	-57/+117
\|/
*	Refactor deferred version task completion to take in version explicitly	Tor Brede Vekterli	2017-09-12	1	-13/+14
\|
*	Change wording for operations without observable side-effects	Tor Brede Vekterli	2017-09-12	3	-22/+23
\|
*	Break node version ACK check out into separately called logic	Tor Brede Vekterli	2017-09-12	3	-14/+49
\| \| \| \| \| \| \| \|	Removes dependency on having to invoke broadcastNewState before being able to observe that all distributors are in sync. Invocations of broadcastNewState are gated by a grace period between each time, so unless this is done we get artificial delays before a synchronous task can be considered complete.
*	Test multiple scheduled synchronous tasks	Tor Brede Vekterli	2017-09-11	2	-1/+19
\|
*	Move leadership test code into fixture	Tor Brede Vekterli	2017-09-11	1	-11/+22
\|
*	Test automatic task failing on controller leadership loss	Tor Brede Vekterli	2017-09-11	3	-11/+54
\|
*	Add support for version ACK-dependent tasks in cluster controller	Tor Brede Vekterli	2017-09-11	9	-18/+426
\| \| \| \| \| \| \| \| \|	Used to enable synchronous operation for set-node-state calls, which ensure that side-effects of the call are visible when the response returns. If controller leadership is lost before state is published, tasks will be failed back to the client.
*	Do not use import x.y.*;	Henning Baldersheim	2017-09-04	1	-1/+10
\|
*	Try to shut down fleetcontroller in a controlled manner without relying on ↵	Henning Baldersheim	2017-09-04	1	-21/+29
\| \| \| \|	the infamous thread.interrupt.
*	Update copyright headers	Jon Bratseth	2017-06-14	157	-155/+157
\|
*	Revert "Update copyright headers"	Jon Bratseth	2017-06-14	157	-157/+155
\|
*	Update copyright headers	Jon Bratseth	2017-06-14	157	-155/+157
\|
*	Remove carriage return	Jon Bratseth	2017-06-14	4	-4/+4
\|
*	Revert "Copyright header"	Jon Bratseth	2017-06-13	157	-161/+159
\|
*	Copyright header	Jon Bratseth	2017-06-13	157	-159/+161
\|
*	Ignore test that hangs on mac.	gjoranv	2017-06-12	1	-0/+1
\|
*	Merge pull request #2494 from yahoo/hakon/adds-safe-setting-of-wanted-state-down	hakonhall	2017-05-23	5	-46/+256
\|\ \| \| \| \|	Safely set storage node to DOWN
\| *	Extract common check	Håkon Hallingstad	2017-05-21	2	-23/+31
\| \|
\| *	Dedup test code	Håkon Hallingstad	2017-05-21	1	-64/+46
\| \|
\| *	Verify version and reported state	Håkon Hallingstad	2017-05-21	2	-11/+76
\| \|
\| *	Safely set storage node to DOWN	Håkon Hallingstad	2017-05-18	5	-33/+188
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Setting a storage node to DOWN is considered safe if it can be permantenly set down (e.g. removed from the application): - The node is RETIRED - There are no managed buckets
* \|	Don't reset interrupt flag	Tor Brede Vekterli	2017-05-22	1	-1/+0
\| \|
* \|	Write to ZooKeeper must be timing invariant	Tor Brede Vekterli	2017-05-22	2	-4/+17
\| \| \| \| \| \| \| \| \| \| \| \|	Previously could risk that state transition grace period would elide write to ZooKeeper if state changes happened within previous grace period.
* \|	Merge pull request #2506 from yahoo/arnej/remove-extra-gitignore	Jon Bratseth	2017-05-19	1	-0/+0
\|\ \ \| \| \| \| \| \|	Arnej/remove extra gitignore
\| * \|	remove old unused ignores	Arne Juul	2017-05-19	1	-0/+0
\| \|/
* /	Always write new cluster state versions to ZooKeeper	Tor Brede Vekterli	2017-05-12	5	-52/+90
\|/ \| \| \| \| \| \| \| \| \| \|	Previously, the controller would not write the version to ZK unless the version was published to at least one node. This could lead to problems due to un-written version numbers being visible via the controller's REST APIs. External observers could see versions that were not present in ZK and that would not be stable across reelections. As a consequence, invariants for strictly increasing version numbers would be violated from the perspective of these external observers (in particular, our system test framework).
*	Log ZooKeeper cluster state version reads and writes with INFO level	Tor Brede Vekterli	2017-05-09	1	-2/+4
\|
*	Improve Spec API	Håkon Hallingstad	2017-02-22	6	-20/+26
\| \| \| \| \| \|	- Removes Spec.getLocalHostName - Removes distinction between listening- and connect- address for Spec - Makes all usage of connect w/Spec specify hostname
*	Makes clustercontroller-core work on WiFi	Håkon Hallingstad	2017-02-20	5	-45/+79
\|
*	Use relative URLs in Cluster Controller status page	Håkon Hallingstad	2017-02-17	4	-19/+17
\|
*	Merge pull request #1301 from yahoo/bratseth/indexed-tensor	Jon Bratseth	2016-12-13	1	-0/+1
\|\ \| \| \| \|	Bratseth/indexed tensor
\| *	MapTensor -> MappedTensor	Jon Bratseth	2016-12-12	1	-0/+1
\| \|
* \|	Use latest candidate cluster state when comparing against reported node states	Tor Brede Vekterli	2016-12-09	3	-1/+52
\|/ \| \| \| \| \| \| \| \| \| \|	Using just the versioned cluster state instead can cause the code to erroneously believe that it is seeing repeated reported state changes for the first time. This happens when the diffs in the reported node states are not in and by themselves enough to trigger a new cluster state version containing the changes. This can in turn spam the logs and event buffers until a new cluster state has been versioned.
*	Reduce disconnect errors to wraning as they are likely during shutdown.	Henning Baldersheim	2016-10-12	1	-3/+2
\|
*	Rewrite and refactor core cluster controller state generation logic	Tor Brede Vekterli	2016-10-05	38	-1378/+3626
\| \| \|	Cluster controller will now generate the new cluster state on-demand in a "pure functional" way instead of conditionally patching a working state over time. This makes understanding (and changing) the state generation logic vastly easier than it previously was.
*	Yahoo sets up mac wireless networks such that the local hostname points to an	Jon Bratseth	2016-09-29	1	-2/+1
\| \| \| \| \| \|	ip which does not resolve. This works around that problem by finding a resolvable address (while still falling back to localhost if we only get ipv6 addresses, as that causes other problems in docker containers).
*	Less verbose/duplicate logging per fetched child node from ZooKeeper	Tor Brede Vekterli	2016-07-04	1	-2/+2
\|