Fix concurrent join requests handling
There were some problems with join requests synchronization. Raft
forbids proposing a configuration change if there's another one
uncommitted (see [1]). In that case, it replaces an EntryConfChange
with an EntryNormal
. It could happen at any time even without bugs in
code due to the network partitioning, and its the repsonsibility of
the picodata product to handle it properly.
Earlier, there was no way to wait when raft leaves the joint state. It
used to slow down cluster assembling and made it race-prone. The waiting
for the cluster readiness is also important in tests. Some operations
(the most important amongst them is leader switching) are impossible
until instance finishes promotion to a voter. For instance, raft rejects
MsgTimeoutNow
unless the node is promotable (see [2]). It makes some
testing scenarios flaky.
This patch introduces new synchronization primitive - JointStateLatch
.
The latch is held on the leader and is locked upon
raw_node.propose_conf_change()
. It's unlocked only when the second
(implicit) conf change that represents leaving joint state is committed.
The latch also tracks the index of the corresponding EntryConfChange
.
Even if raft ignores it for any reason, the latch is still unlocked as
soon as the committed index exceeds the one of the latch.
- [1] https://github.com/tikv/raft-rs/blob/v0.6.0/src/raft.rs#L2014-L2026
- [2] https://github.com/tikv/raft-rs/blob/v0.6.0/src/raft.rs#L2314
Close #47 (closed), #53 (closed)