aboutsummaryrefslogtreecommitdiffstats
path: root/orchestrator
Commit message (Collapse)AuthorAgeFilesLines
* Orchestrator should assume 3 controllersHåkon Hallingstad2020-06-221-1/+1
|
* Ignore missing children from optimistic read of suspended hostsHåkon Hallingstad2020-05-151-13/+26
| | | | | | | Also: Remove test for existence of path, which would normally turn up in the negative, and instead catch NoNodeException on the next if path does not exist, at the expense of exception thrown/caught. This should be cheaper than actually hitting ZK.
* Replace remaining LogLevel.<level> with corresponding Levelgjoranv2020-04-251-1/+1
|
* LogLevel.ERROR -> Level.SEVEREgjoranv2020-04-252-3/+3
|
* LogLevel.WARNING -> Level.WARNINGgjoranv2020-04-252-2/+2
|
* LogLevel.INFO -> Level.INFOgjoranv2020-04-251-6/+6
|
* LogLevel.DEBUG -> Level.FINEgjoranv2020-04-255-19/+19
|
* Import java.util.logging.Level instead of com.yahoo.log.LogLevelgjoranv2020-04-259-9/+9
|
* Revert "Reduce host admin suspension concurrency from 20% to 10%"Håkon Hallingstad2020-04-082-2/+2
|
* Reduce logging in service-monitor and orchestratorHåkon Hallingstad2020-03-234-17/+0
|
* Merge pull request #12520 from ↵Valerij Fredriksen2020-03-216-51/+26
|\ | | | | | | | | vespa-engine/hakonhall/always-do-status-service-cleanup Always do status service cleanup
| * Always do status service cleanupHåkon Hallingstad2020-03-096-51/+26
| |
* | Silence orchestratorHåkon Hallingstad2020-03-161-11/+11
| |
* | Reduce host admin suspension concurrencty from 20% to 10%Håkon Hallingstad2020-03-132-2/+2
|/
* Avoid building lots of ApplicationInstancesHåkon Hallingstad2020-03-083-18/+23
| | | | | | Avoid building a full ApplicationInstance for each node... - for all nodes in the node repo when reporting metrics repo every minute, and - for all nodes in any /nodes/v1/node response
* Throw if accessing duper model while holding status service application lockHåkon Hallingstad2020-03-078-21/+113
| | | | | | | | | | | | When the duper model is updated, ZooKeeper is atomically updated to e.g. remove extraneous hosts. This is done by acquiring the duper model lock first, then the relevant application lock in the status service. Acquiring these two locks in the reverse order may lead to a deadlock. This PR throws an IllegalStateException when detecting the current thread is about to acquire the duper model lock when the current thread has acquired the application lock.
* Remove service model cacheHåkon Hallingstad2020-03-062-6/+2
|
* Support cleanup of status serviceHåkon Hallingstad2020-03-0514-60/+313
|
* Remove InstanceLookupServiceHåkon Hallingstad2020-03-0314-191/+122
| | | | The lower-level methods on ServiceMonitor has removed the need for InstanceLookupService.
* Rename MutableStatusService to ApplicationLockHåkon Hallingstad2020-03-0215-158/+136
| | | | | The result of acquiring the application lock in the status service is now named ApplicationLock instead of MutableStatusService.
* Align names with status serviceHåkon Hallingstad2020-03-0217-208/+221
|
* Merge pull request #12390 from vespa-engine/hakonhall/extract-HostInfo-opsHåkon Hallingstad2020-03-022-94/+129
|\ | | | | Extract low-level ZK HostInfo ops to HostInfosServiceImpl
| * Update ↵Håkon Hallingstad2020-03-021-2/+2
| | | | | | | | | | orchestrator/src/main/java/com/yahoo/vespa/orchestrator/status/HostInfosServiceImpl.java Co-Authored-By: Valerij Fredriksen <freva@users.noreply.github.com>
| * Extract low-level ZK HostInfo ops to HostInfosServiceImplHåkon Hallingstad2020-02-272-94/+129
| |
* | Only build part of application instance for host resourceHåkon Hallingstad2020-02-285-20/+16
| |
* | Moved to more specific methods on ServiceMonitorHåkon Hallingstad2020-02-288-13/+39
|/
* Improve suspension denied reasonHåkon Hallingstad2020-02-246-59/+64
|
* Remove use of fest-assertBjørn Christian Seime2020-02-242-17/+11
| | | | Motivation: remove the number of 'fluent test assertion' libraries in use.
* Merge pull request #12275 from ↵Valerij Fredriksen2020-02-244-11/+3
|\ | | | | | | | | vespa-engine/hakonhall/remove-unnecessary-gethostinfo Remove unnecessary getHostInfo
| * Remove unnecessary getHostInfoHåkon Hallingstad2020-02-204-11/+3
| |
* | Return 409 on Orchestrator timeout instead of 504Håkon Hallingstad2020-02-233-8/+8
| |
* | Remove large-orchestrator-locks flagHåkon Hallingstad2020-02-234-27/+6
|/
* Add host info to orchestrator REST APIHåkon Hallingstad2020-02-177-24/+55
|
* Support setting PERMANENTLY_DOWN at end of retirementHåkon Hallingstad2020-02-077-27/+68
|
* Skip removing status nodes at old zk pathsHåkon Hallingstad2020-02-061-46/+20
|
* Reduce access loggingHåkon Hallingstad2020-02-051-0/+1
| | | | | | | | | | | | Avoids writing access logs in various tests. 1. Disables by-default access logging with Application, since it is used in unit tests. 2. However many tests create additional DeployState which renders this ineffective, and so this PR also explicitly disables access logging in services.xml of some tests. (1) might be unnecessary if we anyway have to do (2) everywhere, but this is not clear to me.
* Remove stray testHåkon Hallingstad2020-02-051-10/+0
|
* Keep track of locks in OrchestratorContextHåkon Hallingstad2020-02-045-47/+138
|
* Support large orchestrator lockHåkon Hallingstad2020-02-0310-50/+363
|
* Avoid wrapping timeout exception (causing 504) with batch internal error ↵Håkon Hallingstad2020-02-031-0/+2
| | | | (causing 500)
* Remove unused suspendedHostnamesHåkon Hallingstad2020-01-305-33/+7
|
* Prepare for setting PERMANENTLY_DOWNHåkon Hallingstad2020-01-3014-43/+95
|
* Remove unnecessary mapsHåkon Hallingstad2020-01-292-22/+8
|
* Make new host-status canonicalHåkon Hallingstad2020-01-291-32/+3
| | | | | | | | | | | | | | | In moving from old host-status-service/APP/hosts to new host-status/APP/hosts: 1. Currently and before this PR, the ALLOWED_TO_BE_DOWN hosts are the union of the hosts in new and old. Other hosts are NO_REMARKS. 2. With this PR, we'll stop writing ALLOWED_TO_BE_DOWN in old, but still remove them. This PR is not supposed to have any user-visible effect for clients of the Orchestrator. Once this PR has rolled out, A. The remaining code accessing old can be removed B. All data in old can be removed.
* Pull host info through out of orchestrator and through nodes responseJon Marius Venstad2020-01-272-5/+5
|
* Define suspended attribute on HostStatus, plus minor review fixesHåkon Hallingstad2020-01-245-12/+18
|
* Since-timestamp on Orc suspended statusHåkon Hallingstad2020-01-2313-65/+474
| | | | | | | | | | | | - Introduce a new type of suspension status: PERMANENTLY_DOWN, which is set when a node is scheduled to be removed from the application. A normal resume call will NOT clear PERMANENTLY_DOWN. - Store a JSON for each suspended host: Contains status, and a since timestamp for the time when the node was suspended. The suspension timestamp is preserved when switching statuses between suspension statuses. - The JSON is stored in a new path, void of any zone. This means that we eventually can get rid of the zone-part of the application instance reference.
* Create zookeeper client config file only when necessaryHarald Musum2020-01-091-1/+1
|
* Revert "Revert "Bjorncs/apache commons libraries cleanup""Bjørn Christian Seime2020-01-082-7/+4
|
* Revert "Bjorncs/apache commons libraries cleanup"Harald Musum2020-01-082-4/+7
|