| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
Cluster state should state N nodes and distribution config should
state <N nodes in order to properly check that attempting to use
node N fails due to the _config_ and not the _state_.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If a cluster state bundle contains distribution config, this is
internally propagated via the `StateManager` component to all registered
state listeners. One such state listener is `FileStorManager`, which
updates the content node-internal bucket space repository.
All `SetSystemStateCommand` and internal config-aware components
(`StateManager` and `ChangedBucketOwnershipHandler`) now explicitly
track whether the cluster controller provides distribution config,
or if the internally provided config should be used (including
fallback to internal config if necessary).
|
|\
| |
| |
| |
| | |
vespa-engine/vekterli/decode-cluster-state-bundle-distribution-config-cpp
Support decoding distribution config as part of cluster state bundles in C++
|
| |
| |
| |
| |
| |
| | |
Actually _encoding_ config in the same format as that used for
decoding config payloads is not directly supported, so we do our
own roundabout conversion as part of testing.
|
|/
|
|
| |
out of source builds.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds a (live) config that specifies if content nodes and distributors
shall reject cluster state versions that are lower than the one that
is currently active on the node. This prevents "last write wins"
ordering problems when multiple cluster controllers have partially
overlapping leadership periods.
In the name of pragmatism, we try to auto-detect the case where
ZooKeeper state must have been lost on the cluster controller cluster,
and accept the state even with lower version number. Otherwise,
the content cluster would be effectively stalled until all its
processes had been manually restarted.
Adds wiring of live config to the `StateManager` component on both
content and distributor nodes.
|
|
|
|
|
|
| |
Introduce a distinct `StorageConfigSet` which wraps the actual
underlying config objects and exposes them through a unified
`ConfigUri`.
|
|
|
|
|
|
|
|
|
|
| |
Once upon a time, VDS roamed the lands. It used real disk IO as
part of tests. Then came the meteor and in-memory dummy persistence
took over. Now it is time for the fossils to be moved into a museum
where they belong.
Also make PID file writing conditional on a config that is set to
`false` during unit testing (but `true` as default).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoids potentially having to deserialize the entire update just
to get to a single bit of information that is technically
metadata existing orthogonally to the document update itself.
To ensure backwards/forwards compatibility, the flag is
propagated as a Protobuf `enum` where the default value is
a special "unspecified" sentinel, implying an old sender.
Since the Java protocol implementation always eagerly
deserializes messages, it unconditionally assigns the
`create_if_missing` field when sending and completely ignores
it when receiving.
The C++ protocol implementation observes and propagates the
field iff set. Otherwise the flag is deferred to the update
object as before. This applies to both the DocumentAPI and
StorageAPI protocols.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This plugs the hole where merges could enter the active window even
if doing so would exceeded the total memory limit, as dequeueing is
a separate code path from when a merge is initially evaluated for
inclusion in the active window.
There is a theoretical head-of-line blocking/queue starvation issue
if the merge at the front of the queue has an unrealistically large
footprint and the memory limit is unrealistically low. In practice
this is not expected to be a problem, and it should never cause merging
to stop (at least one merge is always guaranteed to be able to execute).
As such, not adding any kind of heuristics to deal with this for now.
|
|
|
|
|
|
|
|
| |
Add config for min/max capping of deduced limit, as well as a scaling
factor based on the memory available to the process. Defaults
have been chosen based on empirical observations over many years,
but having these as config means we can tune things live if
it should ever be required.
|
|
|
|
|
|
|
|
|
| |
If configured, the active merge window is limited so that the
sum of estimated memory usage for its merges does not go
beyond the configured soft memory limit. The window can
always fit a minimum of 1 merge regardless of its size to
ensure progress in the cluster (thus this is a soft limit,
not a hard limit).
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Removes own `ConfigFetcher` in favor of pushing reconfiguration
responsibilities onto the components owning the Bouncer instance.
The current "superclass calls into subclass" approach isn't
ideal, but the longer term plan is to hoist all config subscriptions
out of `StorageNode` and into the higher-level `Process` structure.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As far as I know, this config has not been used by anyone for
at least a decade (if it ever was used for anything truly useful).
Additionally, operation priorities are a foot-gun at the best of
times. The ability to dynamically change the meaning of priority
enums even more so.
This commit entirely removes configuration of Document API
priority mappings in favor of a fixed mapping that is equal to
the default config, i.e. what everyone's been using anyway.
This removes a thread per distributor/storage node process as
well as 1 mutex and 1 (presumably entirely unneeded `seq_cst`)
atomic load in the message hot path. Also precomputes a LUT for
the priority reverse mapping to avoid needing to lower-bound seek
an explicit map.
|
|
|
|
|
|
|
|
| |
This moves the responsibility for bootstrapping and updating config
for the `CommunicationManager` component to its owner. By doing this,
a dedicated `ConfigFetcher` can be removed. Since this is a
component used by both the distributor and storage nodes, this
reduces total thread count by 2 on a host.
|
|
|
|
|
|
|
|
|
|
| |
This moves RPC shutdown from being the _first_ thing that happens
to being the _last_ thing that happens during storage chain shutdown.
To avoid concurrent client requests from the outside reaching internal
components during the flushing phases, the Bouncer component will now
explicitly and immediately reject incoming RPCs after closing and all
replies will be silently swallowed (no one is listening for them at that
point anyway).
|
|
|
|
| |
subsystem, take 2"
|
|
|
|
|
|
|
|
|
|
| |
This moves RPC shutdown from being the _first_ thing that happens
to being the _last_ thing that happens during storage chain shutdown.
To avoid concurrent client requests from the outside reaching internal
components during the flushing phases, the Bouncer component will now
explicitly and immediately reject incoming RPCs after closing and all
replies will be silently swallowed (no one is listening for them at that
point anyway).
|
| |
|
|\
| |
| |
| |
| | |
vespa-engine/vekterli/ensure-internal-messages-flushed-prior-to-rpc-shutdown
Ensure internal messages are flushed before shutting down RPC subsystem
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This moves RPC shutdown from being the _first_ thing that happens
to being the _last_ thing that happens during storage chain shutdown.
To avoid concurrent client requests from the outside reaching internal
components during the flushing phases, the Bouncer component will now
explicitly and immediately reject incoming RPCs after closing and all
replies will be silently swallowed (no one is listening for them at that
point anyway).
|
|/ |
|
|
|
|
|
|
| |
Serialization code can safely be removed, as no revert-related
messages have ever flown across the wire in the new serialization
format.
|
| |
|
|
|
|
| |
Also clarify/update some comments.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Incoming cluster state versions are not applied locally on a content
node until all potentially conflicting operations running in the persistence
threads have completed and all potentially conflicting operations in
the persistence queues have been aborted.
This can take a relatively long time when running LID space compactions
etc via the persistence threads, and we'd risk blocking the main
CommunicationManager thread (which handles all cluster controller
communication) for prolonged periods of time.
Move state blocking and internal state propagation to a dedicated
task executor. The executor only has 1 thread, effectively turning
it into an asynchronous FIFO executor.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This covers both the entry points from the `storagenode` and
`searchnode` HTTP servers, though the former is mostly in the
name of legacy support.
Ideally, capability checking would exist as a property of the
HTTP server (Portal) bindings, but the abstractions for the
JSON request handling are sufficiently leaky that it ended up
making more sense to push things further down the hierarchy.
It's always a good thing to move away from using strings with
implicit semantics as return types anyway.
The `searchnode` state API handler mapping supports fine grained
capabilities. The legacy `storagenode` state API forwarding does
not; it uses a sledgehammer that expects the union of all possible
API capability requirements.
|
| |
|
|
|
|
|
|
| |
also get rid of some cleanup functions on reference counted classes
enable specifying low-level parameters to addref/subref (cnt/reserve)
|
| |
|
|
|
|
| |
safe way.
|
| |
|
|
|
|
| |
safer code.
|
|
|
|
| |
also remove some left-behind includes
|
|\
| |
| |
| |
| | |
vespa-engine/havardpe/avoid-fastos-thread-in-storage
avoid using fastos thread in storage
|
| | |
|
|/ |
|
|
|
|
| |
use std::thread directly instead
|
| |
|
|\
| |
| | |
General code healt, nodiscard, range loops etc
|
| | |
|
| | |
|