vespa - An engine for low-latency computation over large data sets

	Commit message (Collapse)	Author	Age	Files	Lines
*	Allow one node for both branches	Harald Musum	2021-02-08	1	-5/+2
\|
*	Allow only one node down at a time for cluster controller clusters	Harald Musum	2021-02-08	1	-1/+1
\| \| \| \| \| \| \| \| \|	If we are replacing nodes and have 3 old and 3 new nodes in a cluster, allowing 50 per cent down would lead to 3 being allowed to go down. When deploying to remove the 3 old nodes there might be (for a short time) config that says that there should be 6 nodes in the zookeeper cluster. This mean 4 of them need to be up for the cluster to have a quorum. This will not work in this case.
*	Actually test new codepath	Håkon Hallingstad	2021-01-25	1	-2/+3
\|
*	Support delegating content node suspension to cluster controller	Håkon Hallingstad	2021-01-22	9	-77/+198
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This PR introduces a new flag group-suspension, which if true, enables: - Instead of allowing at most one storagenode to suspend at any given time, it will now ignore storagenode, searchnode, and distributor service clusters, and rely on the cluster controller to allow or deny the request to suspend. This will increase the load on the cluster controllers. Combined with earlier changes to the cluster controller, this new flag effectively guard the feature of allowing all nodes within a hierarchical group to suspend concurrently. I also took the opportunity to tune related policies: - Allow at most one config server and controller to be down at any given time. This is actually a no-op, since it was effectivelly equal to the older policy of 10% down. - Allows 20% of all host-admins to be down, not just tenant host-admins. This is effectively equal to the old policy of 10% except that it may allow 2 proxy host-admins to go down at the same time. Should be fine.
*	Always use permanently down status	Håkon Hallingstad	2021-01-18	4	-39/+14
\|
*	Remove debug code	Jon Marius Venstad	2021-01-11	1	-2/+0
\|
*	Add suspension mojo	Jon Marius Venstad	2021-01-11	2	-0/+3
\|
*	Log host status removals	Håkon Hallingstad	2021-01-06	2	-1/+4
\|
*	Revert "Revert "Bjorncs/config convergence checker preps""	Bjørn Christian Seime	2020-11-25	2	-0/+2
\|
*	Revert "Bjorncs/config convergence checker preps"	Arnstein Ressem	2020-11-25	2	-2/+0
\|
*	Deprecate VespaClientBuilderFactory + VespaJerseyJaxRsClientFactory	Bjørn Christian Seime	2020-11-24	2	-0/+2
\|
*	Revert "Bjorncs/rewrite config convergence checker client"	Jon Marius Venstad	2020-11-10	2	-2/+0
\|
*	Deprecate VespaClientBuilderFactory + VespaJerseyJaxRsClientFactory	Bjørn Christian Seime	2020-11-09	2	-0/+2
\|
*	Revert "Bjorncs/rewrite config convergence checker client"	Harald Musum	2020-11-09	2	-2/+0
\|
*	Deprecate VespaClientBuilderFactory + VespaJerseyJaxRsClientFactory	Bjørn Christian Seime	2020-11-09	2	-0/+2
\|
*	Revert "Bjorncs/rewrite config convergence checker client"	Jon Marius Venstad	2020-11-07	2	-2/+0
\|
*	Deprecate VespaClientBuilderFactory + VespaJerseyJaxRsClientFactory	Bjørn Christian Seime	2020-11-06	2	-0/+2
\|
*	Remove locating code	Håkon Hallingstad	2020-10-20	1	-0/+3
\|
*	Close orchestrator locks	Håkon Hallingstad	2020-10-20	3	-60/+99
\|
*	Remove hack to listen to ephemeral port	Bjørn Christian Seime	2020-10-16	1	-15/+1
\|
*	Move lock metrics to MetricsReporter	Håkon Hallingstad	2020-10-03	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adds two new metrics: - The load of acquiring each lock path: The average number of threads waiting to acquire the lock within the last minute (or unit of time). Aka the lock queue (depth). - The load of the lock for each lock path: The average number of threads holding the lock within the last minute (or unit of time). This is always <= 1. Aka the lock utilization. Changes the LockCounters to LockMetrics, and exporting those once every minute through MetricReporter which is designed for this.
*	Add metrics to lock attempts	Håkon Hallingstad	2020-10-01	2	-21/+5
\|
*	30s down-moratorium before allowing suspension	Håkon Hallingstad	2020-09-18	17	-75/+310
\|
*	Orchestrator should assume 3 controllers	Håkon Hallingstad	2020-06-22	1	-1/+1
\|
*	Ignore missing children from optimistic read of suspended hosts	Håkon Hallingstad	2020-05-15	1	-13/+26
\| \| \| \| \| \| \|	Also: Remove test for existence of path, which would normally turn up in the negative, and instead catch NoNodeException on the next if path does not exist, at the expense of exception thrown/caught. This should be cheaper than actually hitting ZK.
*	Replace remaining LogLevel.<level> with corresponding Level	gjoranv	2020-04-25	1	-1/+1
\|
*	LogLevel.ERROR -> Level.SEVERE	gjoranv	2020-04-25	2	-3/+3
\|
*	LogLevel.WARNING -> Level.WARNING	gjoranv	2020-04-25	2	-2/+2
\|
*	LogLevel.INFO -> Level.INFO	gjoranv	2020-04-25	1	-6/+6
\|
*	LogLevel.DEBUG -> Level.FINE	gjoranv	2020-04-25	5	-19/+19
\|
*	Import java.util.logging.Level instead of com.yahoo.log.LogLevel	gjoranv	2020-04-25	9	-9/+9
\|
*	Revert "Reduce host admin suspension concurrency from 20% to 10%"	Håkon Hallingstad	2020-04-08	2	-2/+2
\|
*	Reduce logging in service-monitor and orchestrator	Håkon Hallingstad	2020-03-23	4	-17/+0
\|
*	Merge pull request #12520 from ↵	Valerij Fredriksen	2020-03-21	6	-51/+26
\|\ \| \| \| \| \| \| \| \|	vespa-engine/hakonhall/always-do-status-service-cleanup Always do status service cleanup
\| *	Always do status service cleanup	Håkon Hallingstad	2020-03-09	6	-51/+26
\| \|
* \|	Silence orchestrator	Håkon Hallingstad	2020-03-16	1	-11/+11
\| \|
* \|	Reduce host admin suspension concurrencty from 20% to 10%	Håkon Hallingstad	2020-03-13	2	-2/+2
\|/
*	Avoid building lots of ApplicationInstances	Håkon Hallingstad	2020-03-08	3	-18/+23
\| \| \| \| \| \|	Avoid building a full ApplicationInstance for each node... - for all nodes in the node repo when reporting metrics repo every minute, and - for all nodes in any /nodes/v1/node response
*	Throw if accessing duper model while holding status service application lock	Håkon Hallingstad	2020-03-07	8	-21/+113
\| \| \| \| \| \| \| \| \| \| \| \|	When the duper model is updated, ZooKeeper is atomically updated to e.g. remove extraneous hosts. This is done by acquiring the duper model lock first, then the relevant application lock in the status service. Acquiring these two locks in the reverse order may lead to a deadlock. This PR throws an IllegalStateException when detecting the current thread is about to acquire the duper model lock when the current thread has acquired the application lock.
*	Remove service model cache	Håkon Hallingstad	2020-03-06	2	-6/+2
\|
*	Support cleanup of status service	Håkon Hallingstad	2020-03-05	14	-60/+313
\|
*	Remove InstanceLookupService	Håkon Hallingstad	2020-03-03	14	-191/+122
\| \| \| \|	The lower-level methods on ServiceMonitor has removed the need for InstanceLookupService.
*	Rename MutableStatusService to ApplicationLock	Håkon Hallingstad	2020-03-02	15	-158/+136
\| \| \| \| \|	The result of acquiring the application lock in the status service is now named ApplicationLock instead of MutableStatusService.
*	Align names with status service	Håkon Hallingstad	2020-03-02	17	-208/+221
\|
*	Merge pull request #12390 from vespa-engine/hakonhall/extract-HostInfo-ops	Håkon Hallingstad	2020-03-02	2	-94/+129
\|\ \| \| \| \|	Extract low-level ZK HostInfo ops to HostInfosServiceImpl
\| *	Update ↵	Håkon Hallingstad	2020-03-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	orchestrator/src/main/java/com/yahoo/vespa/orchestrator/status/HostInfosServiceImpl.java Co-Authored-By: Valerij Fredriksen <freva@users.noreply.github.com>
\| *	Extract low-level ZK HostInfo ops to HostInfosServiceImpl	Håkon Hallingstad	2020-02-27	2	-94/+129
\| \|
* \|	Only build part of application instance for host resource	Håkon Hallingstad	2020-02-28	5	-20/+16
\| \|
* \|	Moved to more specific methods on ServiceMonitor	Håkon Hallingstad	2020-02-28	8	-13/+39
\|/
*	Improve suspension denied reason	Håkon Hallingstad	2020-02-24	6	-59/+64
\|