aboutsummaryrefslogtreecommitdiffstats
path: root/service-monitor
Commit message (Collapse)AuthorAgeFilesLines
* Remove unused rotations parameterMartin Polden2019-06-261-1/+1
|
* Remove ZoneApplicationValerij Fredriksen2019-06-073-202/+5
|
* Simplify ApplicationInstanceGeneratorValerij Fredriksen2019-06-072-92/+2
|
* Simplify health monitoringValerij Fredriksen2019-06-074-118/+3
|
* Remove enable-tenant-host-app flagValerij Fredriksen2019-06-071-18/+2
|
* Make tenant host application supported in DuperModel with feature flagValerij Fredriksen2019-06-032-5/+30
|
* Only deploy supported infrastructure applicationsValerij Fredriksen2019-06-012-0/+10
|
* Only allow activate/remove legal infra applications in duper modelValerij Fredriksen2019-06-012-102/+54
|
* Remove MONITOR_TENANT_HOST_HEALTH flagValerij Fredriksen2019-06-015-90/+24
|
* Remove nodeAdminInContainer from configserver.defValerij Fredriksen2019-06-011-3/+1
|
* Add logserver container to slobrok monitor managerHarald Musum2019-05-291-9/+6
|
* Tenant hosts only have node-admin serviceHåkon Hallingstad2019-05-232-0/+85
| | | | | | | | | | | | Tenant hosts are allocated to the zone (routing) app in the node-admin cluster. These hosts also have various other services like logd, config-sentinel, etc that do not run. The metricsproxy container is a new service that will be added to all hosts, and that will be Slobrok monitored. Except, it will not run on tenant hosts as above. To avoid having metrics cluster spanning the whole of the zone app, and have them DOWN on all tenant hosts, we'll filter those service away from the model when generating the ApplicationInstance of the service model.
* Change interface from Mirror.Entry[] to List<Mirror.Entry> as you already ↵Henning Baldersheim2019-04-221-2/+2
| | | | | | have a list. Avoid having to do an array copy that is not necessary.
* Remove unused parameterBjørn Christian Seime2019-04-081-2/+2
|
* Don't override connection managerBjørn Christian Seime2019-04-082-17/+4
|
* Use VespaHttpClientBuilder in service-monitorBjørn Christian Seime2019-04-082-2/+8
|
* Order dependencies on scopeBjørn Christian Seime2019-04-081-7/+13
| | | | Also change scope of 'annotations' to provided.
* Orchestrator Support for metrics proxyHåkon Hallingstad2019-03-221-0/+1
|
* Merge pull request #8609 from ↵Jon Marius Venstad2019-02-262-6/+1
|\ | | | | | | | | vespa-engine/jvenstad/fix-config-model-inconsitency Jvenstad/fix config model inconsitency
| * Fix imports and lintingJon Marius Venstad2019-02-262-6/+1
| |
* | Merge pull request #8585 from vespa-engine/bratseth/nonfunctional-changesJon Bratseth2019-02-223-0/+6
|\ \ | |/ |/| Nonfunctional changes only
| * Nonfunctional changes onlyJon Bratseth2019-02-223-0/+6
| |
* | Always reset updatePossiblyInProgress on leaving scopeHåkon Hallingstad2019-02-201-4/+6
|/
* Simplify, using pre-computed host-to-application mapJon Marius Venstad2019-02-082-13/+3
|
* Invert neste map once, and store itJon Marius Venstad2019-02-081-0/+23
|
* Merge pull request #8364 from vespa-engine/mpolden/rotations-elementMorten Tokle2019-02-041-1/+1
|\ | | | | Add support for rotations element
| * Add rotations to cluster specMartin Polden2019-02-041-1/+1
| |
* | Make port tag comparison case insensitiveHåkon Hallingstad2019-02-033-3/+34
|/
* Health rest APIHåkon Hallingstad2019-01-3113-29/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Makes a new REST API /orchestrator/v1/health/<ApplicationId> that shows the list of services that are monitored for health. This information is currently a bit difficult to infer from /orchestrator/v1/instances/<ApplicationInstanceReference> since it is the combined view of health and Slobrok. There are already APIs for Slobrok. Example content: $ curl -s localhost:19071/orchestrator/v1/health/hosted-vespa:zone-config-serve\ rs:default|jq . { "services": [ { "clusterId": "zone-config-servers", "serviceType": "configserver", "configId": "zone-config-servers/cfg6", "status": { "serviceStatus": "UP", "lastChecked": 1548939111.708718, "since": 1548939051.686223, "endpoint": "http://cfg4.prod.cd-us-central-1.vespahosted.ne1.yahoo.com:19071/state/v1/health" } }, ... ] } This view is slightly different from the application model view, just because that's exactly how the health monitoring is structured (individual monitors against endpoints). The "endpoint" information will also be added to /instances if the status comes from health and not Slobrok.
* Export UnionMonitorManagerHåkon Hallingstad2019-01-251-0/+8
|
* Metadata about /state/v1/health statusHåkon Hallingstad2019-01-2523-135/+130
| | | | | | | | | | | | | The service monitor uses /state/v1/health to monitor config servers and the host admins (but not yet tenant host admins). This commit adds some metadata about the status of a service: - The time the status was last checked - The time the status changed to the current This can be used to e.g. make more intelligent decisions in the Orchestrator, e.g. only allowing a service to suspend if it has been DOWN longer than X seconds (to avoid spurious DOWN to break redundancy and uptime guarantees).
* Don't expect docprocservice as service typeJon Bratseth2019-01-221-1/+0
|
* Nonfunctional changes onlyJon Bratseth2019-01-216-2/+11
|
* 6-SNAPSHOT -> 7-SNAPSHOTArnstein Ressem2019-01-211-2/+2
|
* Support monitoring health of tenant hostsHåkon Hallingstad2019-01-1614-53/+353
|
* Remove healthmonitor-monitorinfra, dupermodel-contains-infra, ↵Håkon Hallingstad2019-01-148-180/+37
| | | | dupermodel-use-configserverconfig, proxyhost-uses-real-orchestrator, and confighost-uses-real-orchestrator flags
* Typed flag classesHåkon Hallingstad2019-01-032-7/+6
| | | | | | | | | | This reintroduces the non-generic flag classes: - a value() returns the primitive type for flags wrapping a primitive type - easier to use in testing - Serializer is moved to internals of typed class Defines the flag backed by boolean BooleanFlag instead of FeatureFlag since not all boolean flags are necessarily guarding a feature.
* Configserver flags REST APIHåkon Hallingstad2018-12-303-16/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds a new ZooKeeper backed flag source. It is defined in a new module configserver-flags to allow as many as possible config server modules to depend on it by minimizing dependencies. The content of the ZK backed flag source can be viewed and modified through REST API on the config server/controller. The data stored per flag looks like { "rules": [ { "conditions": [ { "type": "whitelist", "dimension": "hostname", "values": ["host1"] } ], "value": true } ] } typical for enabling a feature flag on host1. 2 types of conditions are so far supported: whitelist and blacklist. All the conditions must match in order for the value to apply. If the value is null (or absent), the default value will be used. At the time the flag's value is retrieved, it is resolved against the conditions with the current zone, hostname, and/or application. The same data structure is used for FileFlagSource for files in /etc/vespa/flags with the ".2" extension. The FlagSource component injected in the config server is changed to: 1. Return the flag value if specified in /etc/vespa/flags, or otherwise 2. return flag value from ZooKeeper (same as REST API) The current flags (module) is also changed: - All flags must be defined in com.yahoo.vespa.flags.Flags. This allows the ZK backed flag source additional sanity checking when modifying flags. - If it makes sense to have different flag value depending on e.g. the application, then at some point before the value is retrieved, one has to bind the flag to that application (using with() to set up the fetch vector). Future changes would be to 0. make a merged FlagSource in host admin, 1. add support for viewing and modifying feature flags in dashboard, 2. in hv tool.
* ThreadLocalRandom is recommended over Random in multithreaded environments, ↵Håkon Hallingstad2018-12-201-5/+2
| | | | try 2
* Use thread pool for health monitoring in service-monitorHåkon Hallingstad2018-12-1732-539/+1231
| | | | | This is necessary to avoid using too many threads when monitoring the host-admin on the tenant Docker hosts.
* Monitor health of host infra applicationsHåkon Hallingstad2018-12-087-26/+135
|
* Simplify infrastructure applicationsHåkon Hallingstad2018-12-0716-156/+168
|
* Remove infra app from duper model only if it is supposed to be in duper modelHåkon Hallingstad2018-12-073-8/+15
|
* Make service monitors aware of infra applications in duper model.Håkon Hallingstad2018-12-0659-574/+772
| | | | | | | | | | | | | | | - Notify monitors of infrastructure application activation. Live-flipping the content of the duper model is non-trivial and has been removed. - Split out DuperModel as a simple mutable and thread-unsafe container of the applications in the duper model, that also handles calls listeners on changes. The previous DuperModel has been renamed to DuperModelManager. - Replace SuperModelProvider::snapshot method (fast but difficult to use right) with registerListener. - Shorten the fully qualified package names by 1-2 levels for mosts classes. Next steps: - Make HA query the real orchestrator - Start experimenting with health monitoring of infra apps
* Use config server from ConfigserverConfig in DuperModel for controllerHåkon Hallingstad2018-12-033-35/+62
|
* Revert "Revert "Add infrastructure applications to DuperModel""Håkon Hallingstad2018-12-0325-217/+474
|
* Revert "Add infrastructure applications to DuperModel"Harald Musum2018-12-0325-474/+217
|
* Fixes after review roundHåkon Hallingstad2018-12-031-1/+1
|
* Add infrastructure applications to DuperModelHåkon Hallingstad2018-11-3025-217/+474
| | | | | | | | | | | | | | | | | | | | | | | DuperModel is (will be) responsible for both active tenant applications (through SuperModel) and infrastructure applications. This PR is one step in that direction: - All infrastructure applications (config, confighost, controller, controllerhost, and proxyhost) are owned and managed by DuperModel. - The InfrastructureProvisioner retrieves all possible infra apps from the DuperModel (through a reduced API), and "activates" each of them if target is set and there are any nodes etc. - The InfrastructureProvisioner then notifies the DuperModel which apps have been activated, and with which hosts. - The DuperModel can then build delegate artificially create ApplicationInfo, which gets translated into the application model, and finally the service model. - The resulting service model has NOT_CHECKED for each hostadmin service instance. This is sufficient for goal 1 of this sprint. - The config server application currently has health, so that's kept as-is for now. - Feature flags have been tried and works and allows 1. to disable adding the infra apps in the DuperModel, and 2. to enable the infra configserver instead of the currently created configserver w/health.
* CleanupHarald Musum2018-11-201-1/+1
|