aboutsummaryrefslogtreecommitdiffstats
path: root/storage
Commit message (Collapse)AuthorAgeFilesLines
* Remove redundant bucket DB lookup in persistence reply handlingTor Brede Vekterli2020-04-163-54/+10
| | | | | | | | | | | | | | | | | | | Bucket DB updating happened unconditionally anyway; this was only used for failing operations in an overly pessimistic way. Removing this lookup has two benefits: - Less CPU spent in DB - Less impact expected during feeding during node state transitions since fewer operations will have to be needlessly retried by the client. Rationale: an operation towards a given bucket completes (i.e. is ACKed by all its replica nodes) at time t and the bucket is removed from the DB at time T. There is no fundamental change in correctness or behavior from the client's perspective if the order of events is tT or Tt. Both are equally valid, as the state transition edge happens independently of any reply processing.
* Only update bucket DB memory statistics at certain intervalsTor Brede Vekterli2020-04-073-11/+73
| | | | | B-tree/datastore stats can be expensive to sample, so don't do this after every full DB iteration. For now, wait at least 30s.
* Merge branch 'master' into ↵Henning Baldersheim2020-04-055-32/+19
|\ | | | | | | balder/move-sequenced-task-executors-to-staging_vespalib
| * Revert "Revert "Revert "Balder/rearrange threads"""Henning Baldersheim2020-04-053-18/+6
| |
| * Revert "Bypass communicationmanager Q"Henning Baldersheim2020-04-054-10/+14
| |
| * Revert "Prefer latency"Henning Baldersheim2020-04-051-1/+1
| |
| * Revert "Control mbus worker threads and network threads separately."Henning Baldersheim2020-04-052-6/+1
| |
| * Revert "Avoid task switch on decode."Henning Baldersheim2020-04-051-1/+1
| |
| * Revert "Use 2 network threads"Henning Baldersheim2020-04-051-1/+1
| |
| * Revert "Balder/control naptime"Henning Baldersheim2020-04-051-2/+2
| |
| * Revert "Stick to 1 network thread."Henning Baldersheim2020-04-051-1/+1
| |
| * Revert "Restore default"Henning Baldersheim2020-04-051-1/+1
| |
| * Restore defaultHenning Baldersheim2020-04-051-1/+1
| |
| * Stick to 1 network thread.Henning Baldersheim2020-04-051-1/+1
| |
* | Also allow for testing of the adaptive task executor.Henning Baldersheim2020-04-041-1/+1
|/
* Go back to initial and rerun tests.Henning Baldersheim2020-04-041-2/+2
|
* Use 2 network threadsHenning Baldersheim2020-04-041-1/+1
|
* Avoid task switch on decode.Henning Baldersheim2020-04-041-1/+1
|
* Merge pull request #12821 from ↵Henning Baldersheim2020-04-032-1/+6
|\ | | | | | | | | vespa-engine/balder/control-net-and-worker-threads-independent Control mbus worker threads and network threads separately.
| * ofworkers => of workersHenning Baldersheim2020-04-031-1/+1
| |
| * Control mbus worker threads and network threads separately.Henning Baldersheim2020-04-032-1/+6
| |
* | Merge pull request #12810 from ↵Tor Brede Vekterli2020-04-0312-6/+123
|\ \ | |/ |/| | | | | vespa-engine/vekterli/add-distributor-bucket-db-memory-usage-metrics Add distributor bucket db memory usage metrics
| * Improve dimension naming and metric descriptionsTor Brede Vekterli2020-04-031-1/+1
| |
| * Add memory usage metrics for distributor bucket databasesTor Brede Vekterli2020-04-0212-6/+123
| |
* | Prefer latencyHenning Baldersheim2020-04-031-1/+1
| |
* | Bypass communicationmanager QHenning Baldersheim2020-04-024-14/+10
| |
* | Revert "Revert "Balder/rearrange threads""Henning Baldersheim2020-04-023-6/+18
|/
* Reduce code duplication in test code.Tor Egge2020-03-302-14/+4
|
* Handle newer gtest versions where the legacy API is deprecated.Tor Egge2020-03-292-0/+10
|
* Revert "Balder/rearrange threads"Harald Musum2020-03-273-18/+6
|
* Update best defaults.Henning Baldersheim2020-03-271-4/+4
|
* tcpnodelay is dead. Both worse latency and throughput.Henning Baldersheim2020-03-272-2/+14
| | | | Use optimize_for to select executor type instead.
* optimization => optimize_forHenning Baldersheim2020-03-252-2/+2
|
* Add config control over tcpnodelay for c++ too.Henning Baldersheim2020-03-252-0/+3
|
* Do not copy empty trace.Henning Baldersheim2020-03-251-8/+7
|
* Merge pull request #12667 from ↵Tor Brede Vekterli2020-03-245-4/+54
|\ | | | | | | | | vespa-engine/vekterli/add-metric-coverage-of-new-update-phases Track metrics for new inconsistent update phases
| * Track metrics for new inconsistent update phasesTor Brede Vekterli2020-03-245-4/+54
| | | | | | | | | | | | Reuses the old update-get metric for the single full Get sent after the initial metadata-only phase. Adds new metric set for the initial metadata Gets.
* | Do not bring along empty trace.Henning Baldersheim2020-03-241-2/+6
|/
* Use c++11 for loops and use std::move.Henning Baldersheim2020-03-239-80/+70
|
* Inhibit all merge-related commands and repliesTor Brede Vekterli2020-03-191-3/+20
| | | | | | | Just inhibiting MergeCommand itself does not inhibit all message variants that may cause significant read or write load for any given bucket. This should lessen the competition between merge backend activity and client operations.
* Merge pull request #12586 from ↵Tor Brede Vekterli2020-03-1914-423/+894
|\ | | | | | | | | vespa-engine/vekterli/add-cheap-metadata-fetch-phase-for-updates Add initial metadata-only phase to inconsistent update handling
| * Add comments and some extra safety handling of single Get commandTor Brede Vekterli2020-03-174-2/+26
| |
| * Add initial metadata-only phase to inconsistent update handlingTor Brede Vekterli2020-03-1614-422/+869
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If bucket replicas are inconsistent, the common case is that only a small subset of documents contained in the buckets are actually mutually out of sync. The added metadata phase optimizes for such a case by initially sending Get requests to all divergent replicas that only ask for the timestamps (and no fields). This is a very cheap and fast operation. If all returned timestamps are in sync, the update can be restarted in the fast path. Otherwise, a full Get will _only_ be sent to the newest replica, and its result will be used for performing the update on the distributor itself, before pushing out the result as Puts. This is in contrast to today's behavior where full Gets are sent to all replicas. For users with large documents this can be very expensive. In addition, the metadata Get operations are sent with weak internal read consistency (as they do not need to read any previously written, possibly in-flight fields). This lets them bypass the main commit queue entirely.
* | Account for stripe index when performing a multi-lockTor Brede Vekterli2020-03-182-9/+33
|/ | | | | | | | | | | | Legacy locking code had a hidden assumption that each disk had only a single queue associated with it and locking requests could therefore be deduped at the disk level. Since we now only have a single logical disk with a number of queues striped over it, this would introduce race conditions when splits/joins would cross stripe boundaries for their target/source buckets. Use a composite disk/stripe multi lock key to ensure we only dedupe locking requests at the stripe-level.
* Limit merges per stripe, not globallyTor Brede Vekterli2020-03-042-36/+32
| | | | | | | | | | | With a sufficient, even thread count this will ensure that no stripes end up completely blocked on processing merges, which can starve client operations. Having a global limit means that it was possible for stripes to completely fill up with merges. As an added bonus, moving the limit tracking to individual stripes means that we no longer have to track this as an atomic, since all access already happens under the Stripe lock.
* Use max on stripes, instead of threadsHenning Baldersheim2020-03-041-2/+2
|
* Use 2 threads per stripe.Henning Baldersheim2020-03-031-2/+2
|
* Add count metric for number of documents garbage collectedTor Brede Vekterli2020-02-2414-48/+100
| | | | | | | | | | | | | | | New distributor metric available as: ``` vds.idealstate.garbage_collection.documents_removed ``` Add documents removed statistics to `RemoveLocation` responses, which is what GC is currently built around. Could technically have been implemented as a diff of before/after BucketInfo, but GC is very low priority so many other mutating ops may have changed the bucket document set in the time span between sending the GC ops and receiving the replies. This relates to issue #12139
* Avoid copying BucketState, when you only need BucketInfo.Henning Baldersheim2020-02-152-3/+3
|
* Twice as many stripes.Henning Baldersheim2020-02-146-64/+31
|