Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Add count metrics of orchestrator application lock to get rates | Håkon Hallingstad | 2019-10-24 | 1 | -0/+4 |
| | |||||
* | Revert "Revert "Add Orchestrator application lock metrics"" | Håkon Hallingstad | 2019-10-22 | 5 | -13/+111 |
| | |||||
* | Revert "Add Orchestrator application lock metrics" | Harald Musum | 2019-10-22 | 5 | -111/+13 |
| | |||||
* | Extract duration calculation into separate method | Håkon Hallingstad | 2019-10-21 | 1 | -2/+6 |
| | |||||
* | Add Orchestrator application lock metrics | Håkon Hallingstad | 2019-10-21 | 5 | -13/+107 |
| | |||||
* | Merge pull request #11031 from ↵ | Harald Musum | 2019-10-21 | 5 | -12/+23 |
|\ | | | | | | | | | vespa-engine/hakonhall/return-504-gateway-timeout-on-lock-timeout-from-orchestrator Return 504 Gateway Timeout on lock timeout from Orchestrator | ||||
| * | Return 504 Gateway Timeout on lock timeout from Orchestrator | Håkon Hallingstad | 2019-10-21 | 5 | -12/+23 |
| | | |||||
* | | Use mockito-core 3.1.0 | Håkon Hallingstad | 2019-10-18 | 5 | -8/+8 |
|/ | |||||
* | Assume at least 3 config server in Orchestrator | Håkon Hallingstad | 2019-08-13 | 3 | -9/+115 |
| | |||||
* | Use VespaJerseyJaxRsClient in cluster controller client factory | Bjørn Christian Seime | 2019-07-25 | 2 | -8/+13 |
| | |||||
* | Replace 'jdisc' with 'container' in orchestrator | gjoranv | 2019-07-11 | 1 | -2/+2 |
| | |||||
* | Clean up in orchestrator | Valerij Fredriksen | 2019-06-07 | 3 | -19/+0 |
| | |||||
* | Comment zone-app specific code that should be removed after the migration | Valerij Fredriksen | 2019-06-04 | 3 | -1/+3 |
| | |||||
* | Special handle the tenant-host application where needed | Valerij Fredriksen | 2019-06-04 | 3 | -1/+15 |
| | |||||
* | Keep the spec final. | Henning Baldersheim | 2019-05-28 | 2 | -4/+4 |
| | | | | | | Create the address when needed in the async connect thread. Implement hash/equal/compareTo for Spec to avoid toString. Use Spec as key and avoid creating it every time. | ||||
* | Orchestrator Support for metrics proxy | Håkon Hallingstad | 2019-03-22 | 3 | -36/+5 |
| | |||||
* | Put accessors back inside MutableStatusRegistry | Jon Marius Venstad | 2019-02-11 | 10 | -29/+48 |
| | |||||
* | Use a more specific ZK path for cache counter | Jon Marius Venstad | 2019-02-11 | 1 | -2/+2 |
| | |||||
* | Add note | Jon Marius Venstad | 2019-02-08 | 1 | -0/+5 |
| | |||||
* | Remove ReadOnlyStatusService | Jon Marius Venstad | 2019-02-08 | 8 | -92/+32 |
| | |||||
* | Expose host status cache, and use it for all bulk operations | Jon Marius Venstad | 2019-02-08 | 13 | -72/+153 |
| | |||||
* | Replace test usage of InMemoryStatusService | Jon Marius Venstad | 2019-02-08 | 3 | -21/+27 |
| | |||||
* | Remove unused annotation | Jon Marius Venstad | 2019-02-08 | 3 | -21/+0 |
| | |||||
* | Make all StatusService-es ZookeeperStatusService-es | Jon Marius Venstad | 2019-02-08 | 2 | -41/+16 |
| | |||||
* | Remove InMemoryStatusService, and clean up curator usage in ↵ | Jon Marius Venstad | 2019-02-08 | 3 | -165/+35 |
| | | | | ZookeeperStatusServiceTest | ||||
* | Fix locking in InMemoryStatusService | Jon Marius Venstad | 2019-02-08 | 1 | -5/+6 |
| | |||||
* | Invalidate cache only on changes or exceptions | Jon Marius Venstad | 2019-02-08 | 1 | -8/+17 |
| | |||||
* | Actually make sure cache is up-to-date while lock is held | Jon Marius Venstad | 2019-02-08 | 1 | -0/+1 |
| | |||||
* | Simplify, using pre-computed host-to-application map | Jon Marius Venstad | 2019-02-08 | 1 | -28/+3 |
| | |||||
* | Read host statuses in bulk per app, and cache until any changes | Jon Marius Venstad | 2019-02-07 | 1 | -6/+40 |
| | |||||
* | Remove unused or pointless code | Jon Marius Venstad | 2019-02-07 | 2 | -41/+4 |
| | |||||
* | Health rest API | Håkon Hallingstad | 2019-01-31 | 3 | -0/+125 |
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Makes a new REST API /orchestrator/v1/health/<ApplicationId> that shows the list of services that are monitored for health. This information is currently a bit difficult to infer from /orchestrator/v1/instances/<ApplicationInstanceReference> since it is the combined view of health and Slobrok. There are already APIs for Slobrok. Example content: $ curl -s localhost:19071/orchestrator/v1/health/hosted-vespa:zone-config-serve\ rs:default|jq . { "services": [ { "clusterId": "zone-config-servers", "serviceType": "configserver", "configId": "zone-config-servers/cfg6", "status": { "serviceStatus": "UP", "lastChecked": 1548939111.708718, "since": 1548939051.686223, "endpoint": "http://cfg4.prod.cd-us-central-1.vespahosted.ne1.yahoo.com:19071/state/v1/health" } }, ... ] } This view is slightly different from the application model view, just because that's exactly how the health monitoring is structured (individual monitors against endpoints). The "endpoint" information will also be added to /instances if the status comes from health and not Slobrok. | ||||
* | Metadata about /state/v1/health status | Håkon Hallingstad | 2019-01-25 | 3 | -14/+26 |
| | | | | | | | | | | | | | The service monitor uses /state/v1/health to monitor config servers and the host admins (but not yet tenant host admins). This commit adds some metadata about the status of a service: - The time the status was last checked - The time the status changed to the current This can be used to e.g. make more intelligent decisions in the Orchestrator, e.g. only allowing a service to suspend if it has been DOWN longer than X seconds (to avoid spurious DOWN to break redundancy and uptime guarantees). | ||||
* | Nonfunctional changes only | Jon Bratseth | 2019-01-21 | 9 | -19/+24 |
| | |||||
* | 6-SNAPSHOT -> 7-SNAPSHOT | Arnstein Ressem | 2019-01-21 | 1 | -2/+2 |
| | |||||
* | Avoid stacktrace in log on timeouts | Håkon Hallingstad | 2018-11-09 | 3 | -8/+66 |
| | |||||
* | Fix probe message | Håkon Hallingstad | 2018-11-05 | 1 | -2/+2 |
| | |||||
* | Send probe when suspending many nodes | Håkon Hallingstad | 2018-11-05 | 16 | -121/+157 |
| | | | | | | | | When suspending all nodes on a host, first do a suspend-all probe that will try to suspend the nodes as normal in Orchestrator and cluster controller, but actually not commit anything. A probe failure will result in the same failure as a non-probe failure: A 409 response with description is sent back to the client. | ||||
* | Wrap CC HTTP failures in 409 | Håkon Hallingstad | 2018-11-01 | 9 | -159/+91 |
| | |||||
* | Revert "Revert "Revert "Revert "Enforce CC timeouts in Orchestrator 4"""" | Håkon Hallingstad | 2018-11-01 | 19 | -308/+496 |
| | |||||
* | Revert "Revert "Revert "Enforce CC timeouts in Orchestrator 4""" | Håkon Hallingstad | 2018-11-01 | 19 | -496/+308 |
| | |||||
* | Revert "Revert "Enforce CC timeouts in Orchestrator [4]"" | Håkon Hallingstad | 2018-11-01 | 19 | -308/+496 |
| | |||||
* | Revert "Enforce CC timeouts in Orchestrator [4]" | Harald Musum | 2018-10-31 | 19 | -496/+308 |
| | |||||
* | Retry twice if only 1 CC | Håkon Hallingstad | 2018-10-30 | 5 | -20/+22 |
| | | | | | | | | | | | | | | Caught in the systemtests: If there's only one CC, there will only be 1 request with timeout ~5s, whereas today the real timeout is 10s. This appears to make a difference to the systemtests as converging to a cluster state may take several seconds. There are 2 solutions: 1. Allocate ~10s to CC call, or 2. Make another ~5s call to CC if the first one fails. (2) is simpler to implement for now. To implement (1), the timeout calculation could receive the number of backends as a parameter, but that would make the already complex logic here even worse. Or, we could only reserve enough time for 1 call (abandon 2 calls logic). TBD later. | ||||
* | Revert "Revert "Revert "Revert "Enforce CC timeouts in Orchestrator 2"""" | Håkon Hallingstad | 2018-10-30 | 19 | -306/+492 |
| | |||||
* | Revert "Revert "Revert "Enforce CC timeouts in Orchestrator 2""" | Håkon Hallingstad | 2018-10-30 | 19 | -492/+306 |
| | |||||
* | Revert "Revert "Enforce CC timeouts in Orchestrator 2"" | Håkon Hallingstad | 2018-10-29 | 19 | -306/+492 |
| | |||||
* | Revert "Enforce CC timeouts in Orchestrator 2" | Håkon Hallingstad | 2018-10-29 | 19 | -492/+306 |
| | |||||
* | Merge branch 'master' into hakonhall/enforce-cc-timeouts-in-orchestrator-2 | Håkon Hallingstad | 2018-10-26 | 3 | -38/+58 |
|\ | |||||
| * | Use dynamic port in tests | Harald Musum | 2018-10-24 | 2 | -38/+56 |
| | |