Commits · 384d7fd6a0c5080d22e561e7d84ab7d73fc97c7f · core / picodata

Jul 08, 2022

fix: return read_index error immediately · 3946ac00

Yaroslav Dynnikov authored 2 years ago and

Georgy Moshkin committed 2 years ago

In some states `raft-rs` ignores the ReadIndex request.
Check it preliminary, don't wait for the timeout.

See for details:
- <https://github.com/tikv/raft-rs/blob/v0.6.0/src/raft.rs#L2058>
- <https://github.com/tikv/raft-rs/blob/v0.6.0/src/raft.rs#L2323>

3946ac00

fix: adjust timeouts in posjoin · 05ab7cca
Yaroslav Dynnikov authored 2 years ago and Georgy Moshkin committed 2 years ago
```
No need to DDoS a poor raft leader, especially when it's already under
high load.
```
05ab7cca

fix: no join requests retrying · 3daeb759

Yaroslav Dynnikov authored 2 years ago and

Georgy Moshkin committed 2 years ago

Join requests should be made without timeout restrictions. Otherways
it's impossible to tell retried request from instance_id collision.
Retried requests never succeed and return the "already joined" error.

3daeb759

Jul 07, 2022
- fix: fix calculation of initial commit index · c2c2a4f5
  Georgy Moshkin authored 2 years ago and Georgy Moshkin committed 2 years ago
  
  c2c2a4f5
- feat: implement failure domain reconfiguration · 110c7957
  Georgy Moshkin authored 2 years ago
  
  110c7957
- refactor: set_active -> update_peer · 927462d4
  Georgy Moshkin authored 2 years ago and Georgy Moshkin committed 2 years ago
  
  927462d4
- feat: add --init-replication-factor CLI argument · fd16ea5f
  Valentin Syrovatskiy authored 2 years ago
  
  fd16ea5f
Jul 06, 2022

fix: skip discovery step if raft_id · 1f25fdac

Yaroslav Dynnikov authored 2 years ago

It makes no sence to perform a wait, and `docs/clustering.md` already
states that it's skipped. Anyway, the result is unused.

1f25fdac

Jul 05, 2022
- feat: failure domains are always capitalized · a22e5412
  Georgy Moshkin authored 2 years ago
  
  a22e5412
- feature: extend JoinRequest with failure domains · 0755c236
  Yaroslav Dynnikov authored 2 years ago and Georgy Moshkin committed 2 years ago
  
  Part of https://git.picodata.io/picodata/picodata/picodata/-/issues/98
  0755c236
- feature: limit number of peers in join response · 91366277
  Yaroslav Dynnikov authored 2 years ago
  
  Initially, JoinResponse was supplied with every single peer in cluster. If the cluster is big enough, it results in the "max tuple size limit reached" error. In fact, only voters are needed for the raft to operate normally.
  91366277
- fix: remove `voter` field from JoinRequest (not used anymore) · 929cead1
  Georgy Moshkin authored 2 years ago
  
  929cead1
Jul 01, 2022
- feat(switchover): implement voter switchover rules · 46567ebc
  Georgy Moshkin authored 2 years ago
  
  46567ebc
- refactor: use events instead of conds · 0a1c150f
  Georgy Moshkin authored 2 years ago
  
  0a1c150f
Jun 30, 2022
- refactor: extract init_common function · ce08ab98
  Georgy Moshkin authored 2 years ago
  
  ce08ab98
- refactor: use Health instead of bool · ba36dd62
  Georgy Moshkin authored 2 years ago
  
  ba36dd62
- fix: conftest cluster deploying · 72787c37
  Yaroslav Dynnikov authored 2 years ago
  
  72787c37
Jun 29, 2022
- refactor: get rid of join_loop · 3bd7547f
  Yaroslav Dynnikov authored 2 years ago
  
  3bd7547f
- refactor: implement serde traits for entry_type · 3c9bf916
  Yaroslav Dynnikov authored 2 years ago
  
  It allows us to use raft::EntryType directly without obscure i32 tricks.
  3c9bf916
Jun 27, 2022
- refactor: return raft op result · 0512946a
  Yaroslav Dynnikov authored 2 years ago
  
  0512946a
- feat: optional instance_id · 1002372c
  Valentin Syrovatskiy authored 2 years ago
  
  1002372c
Jun 23, 2022

feat(failover): deactivate instance on shutdown · f66392e2

Georgy Moshkin authored 2 years ago and

Georgy Moshkin committed 2 years ago

deactivation means that instance
- is demoted to learner (if it wasn't one already)
- is marked "inactive" so that it's ignored when determining
    the number of voters required for the cluster

f66392e2

Jun 20, 2022
- refactor: stringify_cfunc macro now accepts full path to function · 2b30f83a
  Georgy Moshkin authored 2 years ago
  
  2b30f83a
- refactor: remove redundant allocation · 09b756a6
  Georgy Moshkin authored 2 years ago and Georgy Moshkin committed 2 years ago
  
  09b756a6
- refactor(postjoin): assert leader_id didn't disappear · cf43c72a
  Georgy Moshkin authored 2 years ago and Georgy Moshkin committed 2 years ago
  
  cf43c72a
- refactor: call Topology in start_boot · 50dc3175
  Valentin Syrovatskiy authored 2 years ago
  
  50dc3175
Jun 17, 2022

feature: setup replication both ways · 1963893e

Yaroslav Dynnikov authored 2 years ago and

Yaroslav Dynnikov committed 2 years ago

Bootstrapping the replication was implemented in 2dac77c5. But it was
configured on the new instance only. The old instance (that joined
earlier) couldn't update `box.cfg({replication})` until now.

Close https://git.picodata.io/picodata/picodata/picodata/-/issues/52

1963893e

Jun 01, 2022

feat: --cluster-id parameter · f8ac1dbe

Sergey V authored 2 years ago

* Make `--cluster-id` CLI mandatory.
* Handle cluster_id mismatch in raft_join.
  When an instance attempts to join the cluster and the instances's
  `--instance-id` parameter mismatches the cluster_id of the cluster
  an error is raised inside the raft_join handler.

Unverified

f8ac1dbe

May 31, 2022

fix(discovery): fix hanging if some peers don't respond · 4d3116b0

Georgy Moshkin authored 2 years ago

Previously the discovery algorithm would try to reach each known peer
sequentially requiring each consequent request to succeed until the next
one can be attempted. This would not work in some cases (see test in
previous commit).

So the new algorithm instead makes a single attempt to reach each peer
within a round, and if some failed they're retried in the next round of
requests. This allows overall discovery to succeed in cases when some
of the initial peers never respond.

Closes #54

4d3116b0

May 30, 2022

fix: represent node leader_id as Option · d87dd4ca
VS authored 2 years ago and Yaroslav Dynnikov committed 2 years ago

d87dd4ca

feature: set up replication when instance joins · 2dac77c5

Yaroslav Dynnikov authored 2 years ago

Picodata already assigns `replicaset_id` to an instance when it joins,
but it wasn't used in Tarantool `box.cfg` yet. Now it is.

It's also important to set up listen port in `start_join` immediately.
Without it Tarantool will stuck waiting for connection to self.

Part of https://git.picodata.io/picodata/picodata/picodata/-/issues/52

2dac77c5

May 23, 2022

bug: uuid mismatch on bootstrap leader · b2b1b293

Yaroslav Dynnikov authored 2 years ago

When bootstrapping an instance, there're two possible execution paths -
`start_boot` and `start_join`. While `start_join` takes all uuids from
JoinResponse, `start_boot` already deals with a bootstrapped `box.cfg`
(it's done in `start_discover`, refer to [1]). In order to make uuids
consistent across `box.cfg` and topology module, `start_boot` stage is
preceded with rebootstrap.

This case is also covered with a pytest.

- [1] doc/clustering.md

Verified

b2b1b293

feature: support replicaset_uuid · c71e3351

Yaroslav Dynnikov authored 2 years ago

- Add corresponding field to the Peer struct.
- Generate it in the topology module.
- Use it in `box.cfg`.

Close https://git.picodata.io/picodata/picodata/picodata/-/issues/51

Verified

c71e3351

feature: support instance_uuid · 4aba385d

Yaroslav Dynnikov authored 2 years ago

- Generate it in the topology module
- Persist it in `raft_group` space
- Transfer it in `JoinResponse`
- Use it in `box.cfg`

Close https://git.picodata.io/picodata/picodata/picodata/-/issues/50

Verified

4aba385d

May 21, 2022
- refactor: employ topology module in start_boot · 33ac49d9
  Yaroslav Dynnikov authored 2 years ago
  
  It's necessary to incapsulate topology management logics away from main.
  Verified
  
  33ac49d9
May 20, 2022

chore: move code around · 9147c156

Yaroslav Dynnikov authored 2 years ago

Both JoinRequest and JoinResponse are going to be used in other modules.
Move them one level above from `traft::node::*` to `traft::*`.

Verified

9147c156

May 13, 2022
- feature: support log-level command-line args · 5cd2ff3a
  Yaroslav Dynnikov authored 2 years ago
  
  Close https://git.picodata.io/picodata/picodata/picodata/-/issues/28
  Verified
  
  5cd2ff3a
May 12, 2022

feature: concurrent join requests handling · 9b079eae

Yaroslav Dynnikov authored 2 years ago

There were some problems with join requests synchronization. Raft
forbids proposing a configuration change if there's another one
uncommitted (see [1]). In that case, it replaces an `EntryConfChange`
with an `EntryNormal`. It could happen at any time even without bugs in
code due to the network partitioning, and its the repsonsibility of
the picodata product to handle it properly.

Earlier, there was no way to wait when raft leaves the joint state. It
used to slow down cluster assembling and made it race-prone. The waiting
for the cluster readiness is also important in tests. Some operations
(the most important amongst them is leader switching) are impossible
until instance finishes promotion to a voter. For instance, raft rejects
`MsgTimeoutNow` unless the node is promotable (see [2]). It makes some
testing scenarios flaky.

This patch introduces new synchronization primitive - `JointStateLatch`.
The latch is held on the leader and is locked upon
`raw_node.propose_conf_change()`. It's unlocked only when the second
(implicit) conf change that represents leaving joint state is committed.
The latch also tracks the index of the corresponding `EntryConfChange`.
Even if raft ignores it for any reason, the latch is still unlocked as
soon as the committed index exceeds the one of the latch.

[1] https://github.com/tikv/raft-rs/blob/v0.6.0/src/raft.rs#L2014-L2026
[2] https://github.com/tikv/raft-rs/blob/v0.6.0/src/raft.rs#L2314

Close https://git.picodata.io/picodata/picodata/picodata/-/issues/47
Close https://git.picodata.io/picodata/picodata/picodata/-/issues/53

Verified

9b079eae

fix: pytest wait_ready implementation · 7b94717a

Yaroslav Dynnikov authored 2 years ago

Waiting for a valid `leader_id` on a node isn't enough. It may already
have one, but still be a Learner. Instead, the fixture should wait until
the node is promoted to voter.

Verified

7b94717a

Apr 27, 2022
- chore: remove unused clap arguments · 82c77a0b
  Yaroslav Dynnikov authored 2 years ago
  
  Verified
  
  82c77a0b