- May 17, 2022
-
-
This patch covers one more case when discovery request is handled by an instance that has the discovery module unitialized.
-
- May 13, 2022
-
-
- May 12, 2022
-
-
Yaroslav Dynnikov authored
-
Yaroslav Dynnikov authored
Commit 1a3b5233 missed a bug. Iteration over instances could be aborted by an exception during teardown. It resulted in garbage process remaining alive after pytest termination.
-
Yaroslav Dynnikov authored
There were some problems with join requests synchronization. Raft forbids proposing a configuration change if there's another one uncommitted (see [1]). In that case, it replaces an `EntryConfChange` with an `EntryNormal`. It could happen at any time even without bugs in code due to the network partitioning, and its the repsonsibility of the picodata product to handle it properly. Earlier, there was no way to wait when raft leaves the joint state. It used to slow down cluster assembling and made it race-prone. The waiting for the cluster readiness is also important in tests. Some operations (the most important amongst them is leader switching) are impossible until instance finishes promotion to a voter. For instance, raft rejects `MsgTimeoutNow` unless the node is promotable (see [2]). It makes some testing scenarios flaky. This patch introduces new synchronization primitive - `JointStateLatch`. The latch is held on the leader and is locked upon `raw_node.propose_conf_change()`. It's unlocked only when the second (implicit) conf change that represents leaving joint state is committed. The latch also tracks the index of the corresponding `EntryConfChange`. Even if raft ignores it for any reason, the latch is still unlocked as soon as the committed index exceeds the one of the latch. [1] https://github.com/tikv/raft-rs/blob/v0.6.0/src/raft.rs#L2014-L2026 [2] https://github.com/tikv/raft-rs/blob/v0.6.0/src/raft.rs#L2314 Close https://git.picodata.io/picodata/picodata/picodata/-/issues/47 Close https://git.picodata.io/picodata/picodata/picodata/-/issues/53
-
Yaroslav Dynnikov authored
Waiting for a valid `leader_id` on a node isn't enough. It may already have one, but still be a Learner. Instead, the fixture should wait until the node is promoted to voter.
-
Yaroslav Dynnikov authored
The assertion `status == "Leader"` was in the first place, and `raft_timeout_now` call was unreachable.
-
- May 11, 2022
-
-
Yaroslav Dynnikov authored
1. Print logs to the stderr so that they interleave with tarantool logs. 2. Fix `cluster.__repr__()`.
-
Yaroslav Dynnikov authored
1. Review `Pipfile`: - Remove unused `filelock`; - Install `mypy` - static type checker for Python. 2. Add new command `pipenv run lint`. 3. Enable `mypy` in CI. Fix reported errors in `test_basics.py`. 4. Renew readme.
-
Yaroslav Dynnikov authored
In `conftest.py`: - *Add* function `xdist_worker_number`. It converts `str(worker_id)` into `int`. It serves as a substitute for `session_data_mutex` for parallel test runs. - *Change* `normalize_net_box_result`. Replace a function with a decorator. Also, handle typical kinds of picodata responses. Extensively test it in `test_basics.py::test_eval/call_normalization`. - *Add* dataclass `RaftStatus`. It shouldn't be used outside `conftest.py`. It only makes assertions more brief in logs and understndable in code. - *Add* all raft stuff into the `Instance` class. This implies `raft_propose_eval`, `assert_raft_status` all `promote_or_fail` moved from `util.py`. - *Change* fixture `compile`. No need in extra logics since commit 59c31cb8. - *Preserve* fixture `binary_path`. - *Remove* fixtures `session_data_mutex` and `run_id`. Superseded with `xdist_worker_number`. - *Remove* fixtures `run_cluster` and `run_instance`. Superseded with `cluster.deploy(...)`. - *Remove* function `wait_tcp_port`. It's never enough to check raw socket. Superseded with `instance.wait_ready()`. - Give the instances clean names `i1, i2, ...`, and simple addresses `127.7.0.1:3301`. For the parallel test run use different IPs `127.7.N.1` etc. In `test_basics.py`: - *Add* `test_xdist_worker_number`. - *Add* `test_call_normalization` and `test_eval_normalization`. - *Add* `test_process_management`. It's brand new, never implemented in luatest before. - *Rename* `test_single_instance_raft_eval` to `test_propose_eval` and extend it with additional assertion from `single_test.lua`. - *Remove* `test_instance`. A part of its logics is moved to `test_call/eval_normalization`. The other part is rewritten and extended in `test_process_management`. - *Remove* `test_cluster`. It was completely useless because of inappropriate synchronization and no valuable assertions. - *Remove* `test_propose_eval`. It wasn't that useful, but failed because of inappropriate synchronization. In `test_couple.py`: - *Preserve* `test_follower_proposal` and `test_failover`. Just slightly refactor according to the new `conftest.py` API. In `util.py` (completely removed): - *Remove* decorator `retry`. Needless. - *Remove* decorator `retry_on_network_errors`. Inappropriate predicate didn't catch Lua errors. - *Remove* everything related to raft. Move it into `conftest.py`. Part of https://git.picodata.io/picodata/picodata/picodata/-/issues/59
-
- Apr 28, 2022
-
-
Yaroslav Dynnikov authored
-
Yaroslav Dynnikov authored
-
Sergey V authored
-
- Apr 24, 2022
-
-
Yaroslav Dynnikov authored
We don't want a child process to live without the supervisor. Usually, supervisor waits for child forever and retransmits termination signals. But if the parent is killed with a SIGKILL there's no way to pass anything. This patch supplies a child process with a `supervisor_fuse` fiber. It tries to read from a pipe (that supervisor never writes to), and if the writing end is closed, it means the supervisor has terminated. In this case, child process terminates too. Part of https://git.picodata.io/picodata/picodata/picodata/-/issues/56
- Apr 18, 2022
-
-
Yaroslav Dynnikov authored
Remove prctl dependency that is not available on mac.
-
Yaroslav Dynnikov authored
-
Yaroslav Dynnikov authored
This pach establishes a friendship between discovery and raft_join.
-
Yaroslav Dynnikov authored
feat: raft peer discovery PoC chore: prevent C functions from being optimized out feat: improve peer discovery fix: fix tests after making instance_id arg mandatory Smart supevision with fork Make it work Under development IPC messages to supervisor One more little step: entrypoint enum Arrange IPC from child to supervisor Remove tarantool_main macro Persist snapshot Fix some fresh bugs Implement postjoin Discovery under refactoring Enhance discovery Working on discovery Fix all discovery bugs known so far Draft join algorithm Fail applying snapshot Joining a learner works Cleanup snapshot generation Reorganize traft code and call join automatically Change peer.commit_index type from option to u64 Implement autopromotion to voter Implement read_index Take read_index before self-romotion Cleanup excess logs Cleanup logs and code Deep refactoring in progress Finish refactoring db schema Embed entries applying inside traft node Refactor raft node communication Replace fiber channel with a mailbox, which is a `Vec<_>` + fiber cond. It allows to batch raft commands in a more predictable way and makes the code less error-prone. Remove commented code Simplify raft nodes interaction over net_box Eliminate `traft::Message` struct because its internals aren't used. Instead, serialize `raft::Message` using protobuf. Batch ConnectionPool requests 1. Send messages in batches. 2. Allow changing connection uri. 3. Close unused connections after `inactivity_timeout`. Enhance the raft node 1. Collect results from raft node 2. Fix initial bootstrap which used to fail due to fiber race. 3. Wrap raft storage operations in a transaction. Bump tarantool module Add documentation draft Cleanup warnings Try fixing tests Fix test_storage_log Fix test_traft_pool Fix luatest single and couple Start fixing threesome test Implement concurrent join requests handling
-
- Apr 14, 2022
-
-
Sergey V authored
-
- Mar 10, 2022
-
-
Georgy Moshkin authored
-
- Mar 09, 2022
-
-
Georgy Moshkin authored
-
- Feb 22, 2022
-
-
Yaroslav Dynnikov authored
This patch introduces two tests: 1. `couple.test_failover` reproduces heatbeat timeout on a follower that leads to a new election. 2. `threesome.test_leader_dispuption` simulates disconnected follower. It shouldn't disrupt the leader in this case. Leader death is already tested in `threesome.test_log_rollback`. Close https://gitlab.com/picodata/picodata/picodata/-/issues/25
-
Yaroslav Dynnikov authored
Also, fix the code to pass the test: 1. Use `replace` instead of `insert` when appending entries. They may be overridden. 2. Increase pool worker queue size. Raft node may send several messages within a single tick. Without this change the second message is dropped. The second change is a poorly designed workaround. Constant limit doesn't seem to be both cost-effective and reliable at the same time, but we don't have a better solution at the moment. Close https://gitlab.com/picodata/picodata/picodata/-/issues/26
-
Yaroslav Dynnikov authored
-
- Feb 21, 2022
-
-
Yaroslav Dynnikov authored
Follower makes a proposal. It's applied on both instances. This patch also introduces a new API `picolib.raft_status`. Close https://gitlab.com/picodata/picodata/picodata/-/issues/24
-
- Feb 18, 2022
-
-
Georgy Moshkin authored
+ tarantool-sys submodule + tarantool-patches directory + build.rs build script to patch and build tarantool + static linking with tarantool + refactoring for cli arguments
-
- Feb 15, 2022
-