- Jul 08, 2022
-
-
In some states `raft-rs` ignores the ReadIndex request. Check it preliminary, don't wait for the timeout. See for details: - <https://github.com/tikv/raft-rs/blob/v0.6.0/src/raft.rs#L2058> - <https://github.com/tikv/raft-rs/blob/v0.6.0/src/raft.rs#L2323>
-
No need to DDoS a poor raft leader, especially when it's already under high load.
-
Join requests should be made without timeout restrictions. Otherways it's impossible to tell retried request from instance_id collision. Retried requests never succeed and return the "already joined" error.
-
- Jul 07, 2022
-
-
-
Georgy Moshkin authored
-
-
Valentin Syrovatskiy authored
-
- Jul 06, 2022
-
-
Yaroslav Dynnikov authored
It makes no sence to perform a wait, and `docs/clustering.md` already states that it's skipped. Anyway, the result is unused.
-
- Jul 05, 2022
-
-
Georgy Moshkin authored
-
Yaroslav Dynnikov authored
Initially, JoinResponse was supplied with every single peer in cluster. If the cluster is big enough, it results in the "max tuple size limit reached" error. In fact, only voters are needed for the raft to operate normally.
-
Georgy Moshkin authored
-
- Jul 01, 2022
-
-
Georgy Moshkin authored
-
Georgy Moshkin authored
-
- Jun 30, 2022
-
-
Georgy Moshkin authored
-
Georgy Moshkin authored
-
Yaroslav Dynnikov authored
-
- Jun 29, 2022
-
-
Yaroslav Dynnikov authored
-
Yaroslav Dynnikov authored
It allows us to use raft::EntryType directly without obscure i32 tricks.
-
- Jun 27, 2022
-
-
Yaroslav Dynnikov authored
-
Valentin Syrovatskiy authored
-
- Jun 23, 2022
-
-
deactivation means that instance - is demoted to learner (if it wasn't one already) - is marked "inactive" so that it's ignored when determining the number of voters required for the cluster
-
- Jun 20, 2022
-
-
Georgy Moshkin authored
-
-
-
Valentin Syrovatskiy authored
-
- Jun 17, 2022
-
-
Bootstrapping the replication was implemented in 2dac77c5. But it was configured on the new instance only. The old instance (that joined earlier) couldn't update `box.cfg({replication})` until now. Close https://git.picodata.io/picodata/picodata/picodata/-/issues/52
-
- Jun 01, 2022
-
-
Sergey V authored
* Make `--cluster-id` CLI mandatory. * Handle cluster_id mismatch in raft_join. When an instance attempts to join the cluster and the instances's `--instance-id` parameter mismatches the cluster_id of the cluster an error is raised inside the raft_join handler.
-
- May 31, 2022
-
-
Georgy Moshkin authored
Previously the discovery algorithm would try to reach each known peer sequentially requiring each consequent request to succeed until the next one can be attempted. This would not work in some cases (see test in previous commit). So the new algorithm instead makes a single attempt to reach each peer within a round, and if some failed they're retried in the next round of requests. This allows overall discovery to succeed in cases when some of the initial peers never respond. Closes #54
-
- May 30, 2022
-
-
-
Yaroslav Dynnikov authored
Picodata already assigns `replicaset_id` to an instance when it joins, but it wasn't used in Tarantool `box.cfg` yet. Now it is. It's also important to set up listen port in `start_join` immediately. Without it Tarantool will stuck waiting for connection to self. Part of https://git.picodata.io/picodata/picodata/picodata/-/issues/52
-
- May 23, 2022
-
-
Yaroslav Dynnikov authored
When bootstrapping an instance, there're two possible execution paths - `start_boot` and `start_join`. While `start_join` takes all uuids from JoinResponse, `start_boot` already deals with a bootstrapped `box.cfg` (it's done in `start_discover`, refer to [1]). In order to make uuids consistent across `box.cfg` and topology module, `start_boot` stage is preceded with rebootstrap. This case is also covered with a pytest. - [1] doc/clustering.md
-
Yaroslav Dynnikov authored
- Add corresponding field to the Peer struct. - Generate it in the topology module. - Use it in `box.cfg`. Close https://git.picodata.io/picodata/picodata/picodata/-/issues/51
-
Yaroslav Dynnikov authored
- Generate it in the topology module - Persist it in `raft_group` space - Transfer it in `JoinResponse` - Use it in `box.cfg` Close https://git.picodata.io/picodata/picodata/picodata/-/issues/50
-
- May 21, 2022
-
-
Yaroslav Dynnikov authored
It's necessary to incapsulate topology management logics away from main.
-
- May 20, 2022
-
-
Yaroslav Dynnikov authored
Both JoinRequest and JoinResponse are going to be used in other modules. Move them one level above from `traft::node::*` to `traft::*`.
-
- May 13, 2022
-
- May 12, 2022
-
-
Yaroslav Dynnikov authored
There were some problems with join requests synchronization. Raft forbids proposing a configuration change if there's another one uncommitted (see [1]). In that case, it replaces an `EntryConfChange` with an `EntryNormal`. It could happen at any time even without bugs in code due to the network partitioning, and its the repsonsibility of the picodata product to handle it properly. Earlier, there was no way to wait when raft leaves the joint state. It used to slow down cluster assembling and made it race-prone. The waiting for the cluster readiness is also important in tests. Some operations (the most important amongst them is leader switching) are impossible until instance finishes promotion to a voter. For instance, raft rejects `MsgTimeoutNow` unless the node is promotable (see [2]). It makes some testing scenarios flaky. This patch introduces new synchronization primitive - `JointStateLatch`. The latch is held on the leader and is locked upon `raw_node.propose_conf_change()`. It's unlocked only when the second (implicit) conf change that represents leaving joint state is committed. The latch also tracks the index of the corresponding `EntryConfChange`. Even if raft ignores it for any reason, the latch is still unlocked as soon as the committed index exceeds the one of the latch. [1] https://github.com/tikv/raft-rs/blob/v0.6.0/src/raft.rs#L2014-L2026 [2] https://github.com/tikv/raft-rs/blob/v0.6.0/src/raft.rs#L2314 Close https://git.picodata.io/picodata/picodata/picodata/-/issues/47 Close https://git.picodata.io/picodata/picodata/picodata/-/issues/53
-
Yaroslav Dynnikov authored
Waiting for a valid `leader_id` on a node isn't enough. It may already have one, but still be a Learner. Instead, the fixture should wait until the node is promoted to voter.
-
- Apr 27, 2022
-
-
Yaroslav Dynnikov authored
-