diff options
author | Tor Brede Vekterli <vekterli@vespa.ai> | 2024-04-05 14:59:51 +0000 |
---|---|---|
committer | Tor Brede Vekterli <vekterli@vespa.ai> | 2024-04-09 12:31:46 +0000 |
commit | a8dd709dceca4c53096be285f35686439a7902eb (patch) | |
tree | 5754badd60b8654426694ec83d9d6dac110eb364 /configdefinitions | |
parent | 7a5047b9cb7c1ad40bc69dbacfbbbeafbe15b83a (diff) |
Support pipelining (batching) of mutating ops to same bucket
Bucket operations require either exclusive (single writer) or
shared (multiple readers) access. Prior to this commit, this
means that many enqueued feed operations to the same bucket
introduce pipeline stalls due to each operation having to wait
for all prior operations to the bucket to complete entirely
(including fsync of WAL append). This is a likely scenario when
feeding a document set that was previously acquired through
visiting, as such documents will inherently be output in
bucket-order.
With this commit, a configurable number of feed operations
(put, remove and update) bound for the exact same bucket may
be sent asynchronously to the persistence provider in the
context of the _same_ write lock. This mirrors how merge
operations work for puts and removes.
Batching is fairly conservative, and will _not_ batch across
further messages when any of the following holds:
* A non-feed operation is encountered
* More than one mutating operation is encountered for the
same document ID
* No more persistence throttler tokens can be acquired
* Max batch size has been reached
Updating the bucket DB, assigning bucket info and sending
replies is deferred until _all_ batched operations complete.
Max batch size is (re-)configurable live and defaults to a
batch size of 1, which shall have the exact same semantics as
the legacy behavior.
Additionally, clock sampling for persistence threads have been
abstracted away to allow for mocking in tests (no need for sleep!).
Diffstat (limited to 'configdefinitions')
-rw-r--r-- | configdefinitions/src/vespa/stor-filestor.def | 9 |
1 files changed, 8 insertions, 1 deletions
diff --git a/configdefinitions/src/vespa/stor-filestor.def b/configdefinitions/src/vespa/stor-filestor.def index de67d4336e9..a5d86cc91ba 100644 --- a/configdefinitions/src/vespa/stor-filestor.def +++ b/configdefinitions/src/vespa/stor-filestor.def @@ -29,7 +29,7 @@ response_sequencer_type enum {LATENCY, THROUGHPUT, ADAPTIVE} default=ADAPTIVE re ## Should follow stor-distributormanager:splitsize (16MB). bucket_merge_chunk_size int default=16772216 restart -## Whether or not to use async message handling when scheduling storage messages from FileStorManager. +## Whether to use async message handling when scheduling storage messages from FileStorManager. ## ## When turned on, the calling thread (e.g. FNET network thread when using Storage API RPC) ## gets the next async message to handle (if any) as part of scheduling a storage message. @@ -61,3 +61,10 @@ async_operation_throttler.window_size_backoff double default=0.95 async_operation_throttler.min_window_size int default=20 async_operation_throttler.max_window_size int default=-1 # < 0 implies INT_MAX async_operation_throttler.resize_rate double default=3.0 + +## Maximum number of enqueued put/remove/update operations towards a given bucket +## that can be dispatched asynchronously as a batch under the same write lock. +## This prevents pipeline stalls when many write operations are in-flight to the +## same bucket, as each operation would otherwise have to wait for the completion +## of all prior writes to the bucket. +max_feed_op_batch_size int default=1 |