summaryrefslogtreecommitdiffstats
path: root/storage
Commit message (Collapse)AuthorAgeFilesLines
* Merge pull request #11692 from ↵Henning Baldersheim2020-01-081-1/+1
|\ | | | | | | | | vespa-engine/toregge/system-time-and-steady-time-might-have-different-duration-types std::chrono::system_clock and std::chrono::steady_clock might have different duration types.
| * Use default constructor for time point when duration since epoch is zero.Tor Egge2020-01-081-1/+1
| |
| * system_time and steady_time might have different duration types.Tor Egge2020-01-081-1/+1
| |
* | Fix format strings.Tor Egge2020-01-072-4/+4
|/
* Ensure missing documents on replicas are not erroneously considered consistentTor Brede Vekterli2019-12-204-6/+38
| | | | | | | | | | | | | | | | | | Introducing a new member in the category "stupid bugs that I should have added explicit tests for when adding the feature initially". There was ambiguity in the GetOperation code where a timestamp sentinel value of zero was used to denote not having received any replies yet, but where a timestamp of zero also means "document not found on replica". This means that if the first reply was from a replica _without_ a document and the second reply was from a replica _with_ a document, the code would act as if the first reply effectively did not exist. Consequently the Get operation would be tagged as consistent. This had very bad consequences for the two-phase update operation logic that relied on this information to be correct. This change ensures there is no ambiguity between not having received a reply and having a received a reply with a missing document.
* Disable fast update path restarts by defaultTor Brede Vekterli2019-12-204-25/+30
| | | | | | | | | | Even with the fix in #11561 we are still observing replica divergence warnings in the logs. Disabling this feature entirely until the issue has been fully investigated and a complete fix has been implemented. Also emit a log message when the distributor has forced convergence of a detected inconsistent update.
* Merge branch 'master' into balder/reduce-timestamp-usageHenning Baldersheim2019-12-201-1/+1
|\
| * Multiple slashes in include paths messes up the mechanism in rpmbuild when ↵Arnstein Ressem2019-12-171-1/+1
| | | | | | | | extracting debuginfo.
* | Drop timestamp.hHenning Baldersheim2019-12-168-75/+27
|/
* Avoid fast past update restart race with concurrently created replicaTor Brede Vekterli2019-12-135-3/+58
| | | | | | | | | | | | | | | | After the recent change to allow safe path updates to be restarted as fast path updates iff all observed document timestamps are equal, a race condition regression was introduced. If the bucket that the update operation was scheduled towards got a new replica concurrently created _between_ the time that safe path Gets were sent and received, it was possible for updates to be sent to inconsistent replicas. This is because the Get and Update operations use the current database state at _their_ start time, not a stable snapshot state from the start time of the two phase update operation itself. Add an explicit check that the replica state between sending Gets and Updates is unchanged. If it has changed, a fast path restart is _not_ permitted.
* Merge pull request #11507 from ↵Henning Baldersheim2019-12-0527-194/+170
|\ | | | | | | | | vespa-engine/balder/use-duration-in-messagebus-and-storageapi-rebased-1 timeout as duration
| * Merge branch 'master' into ↵Henning Baldersheim2019-12-0512-29/+46
| |\ | | | | | | | | | balder/use-duration-in-messagebus-and-storageapi-rebased-1
| * | Use getMessageNowHenning Baldersheim2019-12-041-2/+0
| | |
| * | Use larger than for time compare. Not equality with zero.Henning Baldersheim2019-12-041-28/+15
| | |
| * | timeout as durationHenning Baldersheim2019-12-0427-166/+157
| | | | | | | | | | | | | | | Conflicts: messagebus/src/vespa/messagebus/testlib/testserver.cpp
* | | Merge pull request #11509 from vespa-engine/balder/use-system-time-in-traceHenning Baldersheim2019-12-051-7/+9
|\ \ \ | |_|/ |/| | Balder/use system time in trace
| * | Use system_time in trace instead of int64_t count of milliseconds.Henning Baldersheim2019-12-051-7/+9
| |/
* / FastOS_THread::Sleep -> std::chrono::sleep_forHenning Baldersheim2019-12-0412-29/+46
|/ | | | | Renamed Timer -> ScheduledExecutor. Do not include thread.h when not needed in header files.
* Fix ever-growing message tracker for concurrent Get operationsTor Brede Vekterli2019-11-288-21/+80
| | | | | | | | | | | | | | | | | | When Get requests initiated outside the main distributor thread are sent via the MessageSender that is implemented by the main Distributor instance, they would be implicitly registered with the pending message tracker. Not only was this thread unsafe, the registrations would never be cleared away since the reply pipeline bypassed it entirely. This would cause a silent memory leak that would build up many small allocations over time. We now dispatch Get requests directly through the storage link chain, bypassing the message tracking component. This both fixes the leak and avoids extra overhead for the Get requests. Note: the concurrent Get feature is _not_ enabled by default. Also fixes an issue where concurrent Get operations weren't correctly gracefully aborted when the node shuts down.
* Merge pull request #11381 from ↵Tor Brede Vekterli2019-11-253-47/+94
|\ | | | | | | | | vespa-engine/vekterli/defer-gc-bucket-info-merge-until-all-responses-received Defer GC bucket info merge until all responses have been received
| * Initialize all fields in constructorTor Brede Vekterli2019-11-251-1/+2
| |
| * Fix node index typeTor Brede Vekterli2019-11-211-1/+1
| |
| * Defer GC bucket info merge until all responses have been receivedTor Brede Vekterli2019-11-213-46/+92
| | | | | | | | | | | | | | | | | | Avoids causing false positives in the merge pending metrics due to partial (and mismatching) bucket info being merged into the DB one by one as responses are received. Instead, wait until all responses have been received and merge as one atomic operation. This fixes #11373
* | Merge branch 'master' into balder/milliseconds-in-config-rebased-1Henning Baldersheim2019-11-222-1/+1
|\ \
| * | Reduce the number of different ways to get the time.Henning Baldersheim2019-11-212-1/+1
| | |
* | | And that ends the life of FastOS_Time.Henning Baldersheim2019-11-203-636/+0
|/ /
* | Address comments from code review.Henning Baldersheim2019-11-201-1/+1
| |
* | Address comment by specifying timeunit in the type.Henning Baldersheim2019-11-202-2/+2
| |
* | Use timeouts typed with unit.Henning Baldersheim2019-11-204-16/+16
| |
* | Use C++11 chrono instead prehistoric homegrown stuff.Henning Baldersheim2019-11-201-4/+1
|/
* Use fast updates when replica metadata is out of sync but document itself is ↵Tor Brede Vekterli2019-11-1512-37/+202
| | | | | | | | | | | | | | | | | | | | | | | in sync When a bucket has replicas with mismatching metadata (i.e. they are out of sync), the distributor will initiate a write-repair for updates to avoid divergence of replica content. This is done by first sending a Get to all diverging replica sets, picking the highest timestamp and applying the update locally. The updated document is then sent out as a Put. This can be very expensive if document Put operations are disproportionally more expensive than partial updates, and also makes the distributor thread part of a contended critical path. This commit lets `TwoPhaseUpdateOperation` restart an update as a "fast path" update (partial updates sent directly to the nodes) if the initial read phase returns the same timestamp for the document across all replicas. It also removes an old (but now presumed unsafe) optimization where Get operations are only sent to replicas marked "trusted" even if others are out of sync with it. Since trustedness is a transient state that does not persist across restarts or bucket handoffs, it's not robust enough to be used for such purposes. Gets will now be sent to all out of sync replica groups regardless of trusted status.
* Remove unused codeHenning Baldersheim2019-11-012-58/+3
|
* Reduce amount of inlining for large methodsHenning Baldersheim2019-10-141-0/+3
|
* Don't recompute bucket key inside merge functionTor Brede Vekterli2019-10-101-3/+8
|
* Always process Get replies to avoid racing with reconfigsTor Brede Vekterli2019-10-102-4/+11
|
* Add unit tests for starting Gets outside distributor coreTor Brede Vekterli2019-10-092-9/+55
|
* Rewrite read-only DB updating to use the linear merge-based APITor Brede Vekterli2019-10-092-5/+42
| | | | Avoids O(n) explicit inserts in favor of a bulk load.
* Support thread-safe metric updatesTor Brede Vekterli2019-10-094-20/+45
| | | | Currently only used for code paths touched by Get operations.
* Allow executing Get operations outside the main distributor threadTor Brede Vekterli2019-10-088-11/+83
| | | | Requires _both_ B-tree DB to be used _and_ stale reads to be enabled.
* Add test-and-set failures as own distributor metricTor Brede Vekterli2019-10-073-4/+31
| | | | | | | Would otherwise be counted under mysterious "storagefailure" catch-all category. Currently not tracked under aggregate failure sum metric, as these are not really "failures" since TaS-failures are expected to happen and do not indicate problems in the backend.
* Remove currently unused member variableTor Brede Vekterli2019-10-073-9/+4
|
* Rewrite Get operation starting to use explicit snapshottingTor Brede Vekterli2019-10-0317-114/+185
|
* Add support for snapshotting all state required for routing a bucket operationTor Brede Vekterli2019-10-0110-11/+490
| | | | | | Let BucketDBUpdater expose a snapshotting function which will handle database routing based on the requested bucket and any pending cluster state transition.
* Add memory load-fences that match existing corresponding store-fencesTor Brede Vekterli2019-09-271-1/+4
|
* Let GetOperation take in explicit database read guardTor Brede Vekterli2019-09-275-11/+19
| | | | | Use a `shared_ptr` to enable multiple operations to share the same logical snapshot.
* Disable old, non-deterministic testTor Brede Vekterli2019-09-271-1/+1
| | | | Needs to be rewritten or discarded.
* Add config option for using B-tree bucket DB in distributorTor Brede Vekterli2019-09-2416-18/+89
| | | | | | Still disabled by default; this will be swapped later. Expose read guard generation for easier debugging. Add some explicit tests for read guard snapshot semantics.
* Add config override for simulating bucket info request processing latencyTor Brede Vekterli2019-09-204-10/+22
| | | | | | | | Simulates added request latency caused by the BucketManager computing bucket ownership for a very large number of buckets. Fetched at BucketManager init only, so not a dynamic config. This is only meant for internal testing so should not have any practical consequences.
* Inhibit merges when ideal node is unavailable in pending stateTor Brede Vekterli2019-09-194-34/+64
| | | | | | | | | | Upon entering a cluster state transition edge the distributor will prune all replicas from its DB that are on nodes that are unavailable in the _pending_ state. As long as this state is pending, the _current_ state will include these nodes as available. But since replicas for the unavailable node(s) have been pruned away, started merges that involve these nodes as part of their chain are doomed to fail. We therefore inhibit such merges from being started in the first place.
* Allow Get operations through when content node is in Maintenance modeTor Brede Vekterli2019-09-182-4/+26
| | | | | If Gets are bounced by Maintenance nodes, operations that take place in a two-phase state transition window Up->Maintenance will be aborted.