Commit message (Collapse) | Author | Age | Files | Lines | |
---|---|---|---|---|---|
* | Find current load more reliably | Jon Bratseth | 2022-07-14 | 2 | -4/+17 |
| | |||||
* | Make smaller resource changes | Jon Bratseth | 2022-07-14 | 1 | -1/+1 |
| | |||||
* | No functional changes | Jon Bratseth | 2022-07-14 | 2 | -2/+3 |
| | |||||
* | Always include deferOsUpgrade field for hosts | Martin Polden | 2022-07-13 | 1 | -3/+3 |
| | |||||
* | Merge pull request #23474 from ↵ | Håkon Hallingstad | 2022-07-12 | 1 | -1/+1 |
|\ | | | | | | | | | vespa-engine/hakonhall/run-nodefailer-every-3m-instead-of-5m Run NodeFailer every 3m instead of 5m | ||||
| * | Run NodeFailer every 3m instead of 5m | Håkon Hallingstad | 2022-07-12 | 1 | -1/+1 |
| | | |||||
* | | Add total cost to stats | Jon Bratseth | 2022-07-12 | 2 | -3/+16 |
| | | |||||
* | | Allow failed node to be deprovisioned | Martin Polden | 2022-07-12 | 1 | -1/+2 |
| | | |||||
* | | Node with allocation must be parked to allow deprovisioning | Martin Polden | 2022-07-12 | 1 | -16/+29 |
| | | |||||
* | | Reapply "Allow deprovision of parked host & node w/alloc when node has ↵ | Martin Polden | 2022-07-12 | 1 | -1/+1 |
|/ | | | | | | wantToDeprovision" This reverts commit d097cb3bf2808bb05f2dc4fc2e7cf771246ba1a9. | ||||
* | Only use wantToFail just before activate | Håkon Hallingstad | 2022-07-11 | 1 | -5/+10 |
| | |||||
* | Revert "Revert "Avoid the host lock while failing the children"" | Håkon Hallingstad | 2022-07-11 | 1 | -46/+54 |
| | |||||
* | Revert "Avoid the host lock while failing the children" | Håkon Hallingstad | 2022-07-11 | 1 | -54/+46 |
| | | | | This reverts commit 2cdaef56e18ace2ee2269d28f959f5a534bd68ee. | ||||
* | Revert update of comment | Håkon Hallingstad | 2022-07-11 | 1 | -2/+2 |
| | |||||
* | Merge pull request #23440 from ↵v8.15.63 | Harald Musum | 2022-07-08 | 1 | -2/+2 |
|\ | | | | | | | | | vespa-engine/hakonhall/define-main-chain-graph-flag Define main-chain-graph flag | ||||
| * | Define main-chain-graph flag | Håkon Hallingstad | 2022-07-08 | 1 | -2/+2 |
| | | |||||
* | | Reduce scope of unallocated lock and avoid deadlock | Martin Polden | 2022-07-08 | 1 | -21/+15 |
| | | | | | | | | | | | | Before this change a call to `failOrMarkRecursively` could cause a deadlock because we would then take the application lock while holding unallocatedLock, but a deployment (e.g. by `InfrastructureProvisioner`) does the opposite. | ||||
* | | Add deferOsUpgrade field to node response | Martin Polden | 2022-07-08 | 2 | -6/+15 |
| | | |||||
* | | Limit grace period to RetiringOsUpgrader | Martin Polden | 2022-07-08 | 4 | -14/+20 |
| | | |||||
* | | Add a grace period before upgrading new nodes | Jon Bratseth | 2022-07-07 | 6 | -0/+23 |
|/ | |||||
* | Avoid the host lock while failing the children | Håkon Hallingstad | 2022-07-05 | 1 | -46/+54 |
| | |||||
* | Autoscaling should happen within 5 minutes | Jon Bratseth | 2022-07-05 | 1 | -1/+1 |
| | |||||
* | Merge pull request #23345 from ↵ | Harald Musum | 2022-07-04 | 1 | -1/+1 |
|\ | | | | | | | | | vespa-engine/hmusum/use-correct-agent-when-failing-nodes Use correct agent when failing nodes | ||||
| * | Use correct agent when failing nodes | Harald Musum | 2022-07-04 | 1 | -1/+1 |
| | | |||||
* | | Update javadoc and reduce log level | Harald Musum | 2022-07-04 | 1 | -3/+4 |
|/ | |||||
* | Reuse fully retired nodes faster | Martin Polden | 2022-06-28 | 9 | -60/+98 |
| | |||||
* | Merge pull request #23164 from vespa-engine/hmusum/add-getActivatedTime | Jon Bratseth | 2022-06-20 | 1 | -1/+1 |
|\ | | | | | Use getActivatedTime() for last deployed time for an app [run-systemtest] | ||||
| * | Add getActivatedTime() for a session | Harald Musum | 2022-06-20 | 1 | -1/+1 |
| | | | | | | | | | | | | Use getActivatedTime() instead of getCreatedTime in lastDeployTime(). getCreatedTime() gives time a new session was created, not when it was activated, which is what we usually want. | ||||
* | | Revert "Allow deprovision of parked host & node w/alloc when node has ↵ | Martin Polden | 2022-06-20 | 1 | -1/+1 |
|/ | | | | wantToDeprovision" | ||||
* | Reduce interval for DynamicProvisioningMaintainer to 3 minutes | Harald Musum | 2022-06-18 | 1 | -1/+1 |
| | | | | | Waiting for provisinong hosts takes a really long time, resuming provisioning more often might help a little | ||||
* | Merge pull request #23107 from vespa-engine/freva/do-not-clear-wtd | Håkon Hallingstad | 2022-06-15 | 1 | -0/+1 |
|\ | | | | | Do not reset node status if wantToDeprovision | ||||
| * | Do not reset node status if wantToDeprovision | Valerij Fredriksen | 2022-06-15 | 1 | -0/+1 |
| | | |||||
* | | Remove cluster from autoscaling advice messages | Harald Musum | 2022-06-15 | 1 | -4/+4 |
|/ | |||||
* | Cosmetix fix, id.toString() already contains "cluster " | Harald Musum | 2022-06-14 | 2 | -5/+3 |
| | |||||
* | Merge pull request #23061 from ↵ | Henning Baldersheim | 2022-06-13 | 1 | -2/+6 |
|\ | | | | | | | | | vespa-engine/hmusum/add-application-id-to-illegal-argument-exception Add application id to IllegalArgumentException in AutoscalingMaintainer | ||||
| * | Chain exceptions | Harald Musum | 2022-06-13 | 1 | -3/+1 |
| | | |||||
| * | Add application id to IllegalArgumentException in AutoscalingMaintainer | Harald Musum | 2022-06-13 | 1 | -1/+7 |
| | | |||||
* | | More info in autoscaling advice | Harald Musum | 2022-06-13 | 1 | -5/+7 |
|/ | | | | | Hard to debug why autoscaling does or does not happen, add some more info | ||||
* | Deprovision host with host lock and sanity-check | Håkon Hallingstad | 2022-06-09 | 1 | -6/+12 |
| | |||||
* | Remove '.sum' form vds sum metrics. | Henning Baldersheim | 2022-06-08 | 1 | -3/+3 |
| | | | | | | | | Remove '.sum' from metric names for storage node and also remove the average metrics for the same. Remove '.sum' from distributor metrics set and remove distributor average metrics. GC '.sum' from distributor metric names. Remove '.alldisks' from metric names and update tests. GC '.alldisks' from filestor metrics. | ||||
* | Mark host as wantToDeprovision before deprovisioning | Valerij Fredriksen | 2022-06-07 | 1 | -1/+5 |
| | |||||
* | Remove cloud account restriction | Martin Polden | 2022-06-02 | 1 | -4/+0 |
| | |||||
* | Implement HostRetirer | Martin Polden | 2022-06-01 | 4 | -1/+86 |
| | |||||
* | Allow deprovision of parked host & node w/alloc when node has wantToDeprovision | Håkon Hallingstad | 2022-06-01 | 1 | -1/+1 |
| | |||||
* | Define smallest node resources that will work in GCP | Harald Musum | 2022-05-28 | 1 | -14/+29 |
| | |||||
* | Add some debug logging when provisoning nodes for a cluster | Harald Musum | 2022-05-28 | 1 | -0/+5 |
| | | | | Helps finding cause of provisoning failures | ||||
* | Allow patching wantedOsVersion | Martin Polden | 2022-05-20 | 3 | -4/+9 |
| | |||||
* | Choose node resources with a matching host flavor when exclusive | Martin Polden | 2022-05-19 | 3 | -20/+33 |
| | | | | | | | | When using a custom cloud account (always exclusive) we cannot choose a too small flavor because there may not be any matching host flavor. This currently works in our own zones because there is always a shared host that can be used for admin nodes (feature flag is set in all zones) and there is no way to set exclusivity requirement for those clusters. | ||||
* | Never downsize if allocating exclusively | Martin Polden | 2022-05-18 | 2 | -9/+9 |
| | |||||
* | Let CapacityPolicies decide exclusivity based on cloud account | Martin Polden | 2022-05-18 | 3 | -2/+6 |
| |