diff options
author | Tor Brede Vekterli <vekterli@vespa.ai> | 2023-11-09 15:56:30 +0000 |
---|---|---|
committer | Tor Brede Vekterli <vekterli@vespa.ai> | 2023-11-10 13:10:59 +0000 |
commit | b4ca69ae534534f4f3c36b96aa2423f93001b05f (patch) | |
tree | 5d636274dcfaf5b27e10baa52ba1661637a21ac3 /application | |
parent | d1a69ad4cf19eae5efb7ff5ba3854d33551221bc (diff) |
Implement DeleteBucket with throttled per-document async removal
Previous (legacy) behavior was to immediately async schedule a
full bucket deletion in the persistence backend, which incurs a very
disproportionate cost when documents are backed by many and/or
heavy indexes (such as HNSW). This risked swamping the backend with
tens to hundreds of thousands of concurrent document deletes.
New behavior splits deletion into three phases:
1. Metadata enumeration for all documents present in the bucket
2. Persistence-throttled async remove _per individual document_
that was returned in the iteration result. This blocks the
persistence thread (by design) if the throttling window is
not sufficiently large to accomodate all pending deletes.
3. Once all async removes have been ACKed, schedule the actual
`DeleteBucket` operation towards the backend. This will clean
up any remaining (cheap) tombstone entries as well as the meta
data store. Operation reply is sent as before once the delete
has completed.
Diffstat (limited to 'application')
0 files changed, 0 insertions, 0 deletions