aboutsummaryrefslogtreecommitdiffstats
path: root/metrics
Commit message (Collapse)AuthorAgeFilesLines
* Remove rather pointless metrics stress testTor Brede Vekterli2024-05-132-138/+0
| | | | | Test has not served much of a purpose other than burning CPU cycles and seemingly greatly confusing Valgrind's thread scheduler.
* Add embedder metrics to vespa9 metricsetYngve Aasheim2024-05-102-4/+8
|
* Include BILLING_WEBHOOK_FAILURES in infrastructure metric setOla Aunronning2024-04-171-0/+1
|
* Make metrics generic to webhook, not filterBjørn Christian Seime2024-04-151-2/+2
|
* Unify on List.ofHenning Baldersheim2024-04-111-3/+2
|
* Support pipelining (batching) of mutating ops to same bucketTor Brede Vekterli2024-04-092-8/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bucket operations require either exclusive (single writer) or shared (multiple readers) access. Prior to this commit, this means that many enqueued feed operations to the same bucket introduce pipeline stalls due to each operation having to wait for all prior operations to the bucket to complete entirely (including fsync of WAL append). This is a likely scenario when feeding a document set that was previously acquired through visiting, as such documents will inherently be output in bucket-order. With this commit, a configurable number of feed operations (put, remove and update) bound for the exact same bucket may be sent asynchronously to the persistence provider in the context of the _same_ write lock. This mirrors how merge operations work for puts and removes. Batching is fairly conservative, and will _not_ batch across further messages when any of the following holds: * A non-feed operation is encountered * More than one mutating operation is encountered for the same document ID * No more persistence throttler tokens can be acquired * Max batch size has been reached Updating the bucket DB, assigning bucket info and sending replies is deferred until _all_ batched operations complete. Max batch size is (re-)configurable live and defaults to a batch size of 1, which shall have the exact same semantics as the legacy behavior. Additionally, clock sampling for persistence threads have been abstracted away to allow for mocking in tests (no need for sleep!).
* Emit suspended seconds. Update metrics for non-active nodesOla Aunronning2024-03-271-0/+1
|
* Wire Prometheus metric export to state V1 APIsTor Brede Vekterli2024-03-213-24/+64
| | | | | | | | | | Extends metric producer classes with the requested exposition format. As a consequence, the State API server has been changed to allow emitting other content types than just `application/json`. Add custom Prometheus rendering for Slobrok, as it does its own domain-specific metric tracking. However, since it has non-destructive sampling properties, we can actually use proper `counter` types.
* Simplify sample emplacementTor Brede Vekterli2024-03-191-5/+5
|
* Support internal metric rendering in Prometheus text format in C++Tor Brede Vekterli2024-03-196-31/+531
| | | | | | | | | | | Maps all internal metrics to one or more labelled time series. Due to poor compatibility between the data model (and sampling strategy) of the legacy metrics framework and that of Prometheus, all time series are emitted as `untyped` metrics. This is a stop-gap solution on the way to "properly" supporting Prometheus exposition, and the output of this renderer should therefore only be used for internal purposes.
* Expiry metrics are counters, not gaugesOla Aunronning2024-03-121-5/+5
|
* Update metrics/src/main/java/ai/vespa/metrics/Labels.javaYngve Aasheim2024-02-131-1/+1
| | | Co-authored-by: Ola Aunrønning <olaa@yahooinc.com>
* Add legacy names to label enumYngve Aasheim2024-02-121-18/+29
|
* Add enum for coredumps.processedYngve Aasheim2024-02-121-0/+1
|
* Add missing dependency.Yngve Aasheim2024-02-121-0/+1
|
* Add metric needed for the stand-up dashboardYngve Aasheim2024-02-121-0/+2
|
* Merge pull request #30178 from vespa-engine/bjormel/expose_expire_metricsBjørn Meland2024-02-052-0/+10
|\ | | | | Expose expire metrics
| * Now, really expose the metricsbjormel2024-02-051-0/+5
| |
| * Expose expire metricsbjormel2024-02-051-0/+5
| |
* | Change ai.vespa.instance_id to ai.vespa.instance alsoYngve Aasheim2024-02-051-1/+1
| |
* | Add ai.vespa.nodeOla Aunronning2024-02-051-0/+1
|/
* Use stored entry count rather than bucket count for (dis-)allowing permanent ↵Tor Brede Vekterli2024-01-261-0/+1
| | | | | | | | | | | | | | | | | node down edge The stored entry count encompasses both visible documents and tombstones. Using this count rather than bucket count avoids any issues where a node only containing empty buckets (i.e. no actual data) is prohibited from being marked as permanently down. Entry count is cross-checked with the visible document count; if the former is zero, the latter should always be zero as well. Since entry/doc counts were only recently introduced as part of the HostInfo payload, we have to handle the case where these do not exist. If entry count is not present, the decision to allow or disallow the transition falls back to the bucket count check.
* Add enum skeleton for labelsYngve Aasheim2024-01-231-0/+52
|
* Track correct metricMartin Polden2024-01-181-1/+1
|
* Add temporary metric for tracking grid usageBjørn Christian Seime2024-01-161-0/+2
|
* Expose clusterAutoscaled metricYngve Aasheim2024-01-031-0/+1
|
* Emit metric counting autoscale eventsMartin Polden2023-12-111-0/+1
|
* Merge pull request #29571 from vespa-engine/mpolden/detect-redistJon Bratseth2023-12-061-1/+5
|\ | | | | Let distributor metric decide cluster stability
| * Add merge pending metricMartin Polden2023-12-061-1/+5
| |
* | Update ClusterControllerMetrics.javaYngve Aasheim2023-12-051-2/+2
| |
* | Add enums for kinesislogger metricsYngve Aasheim2023-12-051-1/+7
|/
* Add metric for job runner executor size, to compute utiljonmv2023-11-302-0/+2
|
* Add deployment job duration metricjonmv2023-11-272-1/+3
|
* Merge pull request #29447 from ↵Henning Baldersheim2023-11-233-0/+9
|\ | | | | | | | | vespa-engine/vekterli/expose-remove-by-gid-metrics Expose `remove_by_gid` persistence-level metrics
| * Expose `remove_by_gid` persistence-level metricsTor Brede Vekterli2023-11-233-0/+9
| |
* | Add metrics for billing webhook filterBjørn Christian Seime2023-11-221-0/+2
| |
* | Add new metricØyvind Grønnesby2023-11-211-0/+1
|/
* Include empty exclusive hosts metricOla Aunronning2023-11-071-0/+1
|
* Export estimated merge memory usage metricTor Brede Vekterli2023-11-033-0/+3
| | | | | Having visibility of this number will make it easier to choose sensible defaults based on observations of existing systems.
* Merge pull request #28972 from ↵Ola Aunrønning2023-10-171-1/+7
|\ | | | | | | | | vespa-engine/yngveaasheim/add-description-to-metric-set-reference-doc Add a short description to metric set reference documentation
| * Add a short description to metric set reference documentationYngve Aasheim2023-10-171-1/+7
| |
* | Introduce metrics for mail sendingBjørn Christian Seime2023-10-162-1/+10
|/
* Add .min suffix for singleton.is_activeYngve Aasheim2023-10-121-1/+1
|
* Add .min suffix for singleton.is_activeYngve Aasheim2023-10-121-1/+1
|
* Revert "Merge pull request #28879 from ↵jonmv2023-10-112-0/+4
| | | | | | | vespa-engine/revert-28869-jonmv/job-runner-thread-metrics" This reverts commit 67351aa3e2adbbb4872097ed799f1ca837f35e6d, reversing changes made to aed7902ee0371efb89747d467c4a2f8124ddc08d.
* Revert "Jonmv/job runner thread metrics"Harald Musum2023-10-112-4/+0
|
* Add metrics for job-runner threadsjonmv2023-10-112-0/+4
|
* Correct copyright headersJon Bratseth2023-10-094-5/+4
|
* Update copyrightJon Bratseth2023-10-0995-73/+95
|
* Remove some metrics from Vesa9vespa metricset.yngveaasheim2023-10-061-5/+1
|