- May 30, 2022
-
-
Yaroslav Dynnikov authored
-
Yaroslav Dynnikov authored
-
Yaroslav Dynnikov authored
Pytest has a feature to segregate setup, test, and teardown logs. The setup phase is considered to be an intialization of the fixtures. In order to split logs properly `cluster.deploy()` is now called inside a fixture.
-
- May 26, 2022
-
-
Yaroslav Dynnikov authored
It's already formatted in conformity to usual `cargo test`. Also, remove unused command-line arguments from `picodata test` command. Close https://git.picodata.io/picodata/picodata/picodata/-/issues/61
-
- May 25, 2022
-
-
Sergey V authored
-
- May 24, 2022
-
-
Yaroslav Dynnikov authored
-
Yaroslav Dynnikov authored
The intention is to eliminate ambiguities in the `Instance` API. Make it more like `subprocess` module (as regards `kill` and `terminate` functions).
-
Yaroslav Dynnikov authored
Behavior of `killpg` slightly differs in Mac and Linux. For some reason, `killpg` returns error EPERM when sending a signal to a zomibie process. And that is the reason of `test_process_management` failure on mac - there's a small gap between killing child and and subreaper calls `waitpid`. Now pytest handles this exception properly. Close https://git.picodata.io/picodata/picodata/picodata/-/issues/70 See also: - Stackoverflow: Why would `killpg` return "not permitted" when ownership is correct? https://stackoverflow.com/questions/12521705/why-would-killpg-return-not-permitted-when-ownership-is-correct - Linux `man 2 killpg`: https://linux.die.net/man/2/killpg#Notes > Notes > > There are various differences between the permission checking in > BSD-type systems and System V-type systems. See the POSIX rationale > for kill(). A difference not mentioned by POSIX concerns the return > value EPERM: BSD documents that no signal is sent and EPERM returned > when the permission check failed for **at least one** target process, > while POSIX documents EPERM only when the permission check failed for > **all** target processes. - MacOS `man 2 killpg`: https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/killpg.2.html > [EPERM] The sending process is not the super-user and > **one or more** of the target processes has an effective > user ID different from that of the sending process. - Linux `man 2 kill`: https://linux.die.net/man/2/kill > EPERM The process does not have permission to send the signal > *to any* of the target processes. > - Process states in Linux: https://kerneltalks.com/linux/process-states-in-linux/ - Reproduce killpg returning EPERM on MacOS: https://git.picodata.io/picodata/picodata/picodata/-/snippets/7
-
- May 23, 2022
-
-
Yaroslav Dynnikov authored
Pytest supports running tests in parallel using the `xdist` plugin. In order to support it in Picodata, one should avoid ports collision. It assigns each worker a dedicated IP address `127.7.n.1`, where `n = xdist_worker_number`. Unfortunately, it doesn't work on MacOS, because Mac doesn't provide any loopback aliases except `127.0.0.1` by default. This patch provides another address generation logics. The `subnet` parameter is superseeded with a `base_port`, that is `3300 + n * 100`. In this way, every pytest (xdist) worker gets a dedicated port range `[3301, 3399]`, `[3401, 3499]` and so on. Close https://git.picodata.io/picodata/picodata/picodata/-/issues/65
-
Yaroslav Dynnikov authored
When bootstrapping an instance, there're two possible execution paths - `start_boot` and `start_join`. While `start_join` takes all uuids from JoinResponse, `start_boot` already deals with a bootstrapped `box.cfg` (it's done in `start_discover`, refer to [1]). In order to make uuids consistent across `box.cfg` and topology module, `start_boot` stage is preceded with rebootstrap. This case is also covered with a pytest. - [1] doc/clustering.md
-
Yaroslav Dynnikov authored
Follow-up for https://git.picodata.io/picodata/picodata/picodata/-/issues/50
-
- May 20, 2022
-
-
Yaroslav Dynnikov authored
Implementation of `net_box` in `tarantool-module` resolves hostnames with a `to_socket_addrs` function that is blocking. Pytest uses fake addresses in one test, and sometimes it results in 5-second blockage and consequent test failure. This patch only provides a workaround. It makes a connection to fail even before the blocking DNS request. See also: - https://git.picodata.io/picodata/picodata/tarantool-module/-/issues/81
-
- May 17, 2022
-
-
Yaroslav Dynnikov authored
Before this patch, pytest used to launch all instances in a clean environment. It prevented running with `PICODATA_LOG_LEVEL=verbose`.
-
This patch covers one more case when discovery request is handled by an instance that has the discovery module unitialized.
-
- May 13, 2022
-
-
- May 12, 2022
-
-
Yaroslav Dynnikov authored
-
Yaroslav Dynnikov authored
Commit 1a3b5233 missed a bug. Iteration over instances could be aborted by an exception during teardown. It resulted in garbage process remaining alive after pytest termination.
-
Yaroslav Dynnikov authored
There were some problems with join requests synchronization. Raft forbids proposing a configuration change if there's another one uncommitted (see [1]). In that case, it replaces an `EntryConfChange` with an `EntryNormal`. It could happen at any time even without bugs in code due to the network partitioning, and its the repsonsibility of the picodata product to handle it properly. Earlier, there was no way to wait when raft leaves the joint state. It used to slow down cluster assembling and made it race-prone. The waiting for the cluster readiness is also important in tests. Some operations (the most important amongst them is leader switching) are impossible until instance finishes promotion to a voter. For instance, raft rejects `MsgTimeoutNow` unless the node is promotable (see [2]). It makes some testing scenarios flaky. This patch introduces new synchronization primitive - `JointStateLatch`. The latch is held on the leader and is locked upon `raw_node.propose_conf_change()`. It's unlocked only when the second (implicit) conf change that represents leaving joint state is committed. The latch also tracks the index of the corresponding `EntryConfChange`. Even if raft ignores it for any reason, the latch is still unlocked as soon as the committed index exceeds the one of the latch. [1] https://github.com/tikv/raft-rs/blob/v0.6.0/src/raft.rs#L2014-L2026 [2] https://github.com/tikv/raft-rs/blob/v0.6.0/src/raft.rs#L2314 Close https://git.picodata.io/picodata/picodata/picodata/-/issues/47 Close https://git.picodata.io/picodata/picodata/picodata/-/issues/53
-
Yaroslav Dynnikov authored
Waiting for a valid `leader_id` on a node isn't enough. It may already have one, but still be a Learner. Instead, the fixture should wait until the node is promoted to voter.
-
Yaroslav Dynnikov authored
The assertion `status == "Leader"` was in the first place, and `raft_timeout_now` call was unreachable.
-
- May 11, 2022
-
-
Yaroslav Dynnikov authored
1. Print logs to the stderr so that they interleave with tarantool logs. 2. Fix `cluster.__repr__()`.
-
Yaroslav Dynnikov authored
1. Review `Pipfile`: - Remove unused `filelock`; - Install `mypy` - static type checker for Python. 2. Add new command `pipenv run lint`. 3. Enable `mypy` in CI. Fix reported errors in `test_basics.py`. 4. Renew readme.
-
Yaroslav Dynnikov authored
In `conftest.py`: - *Add* function `xdist_worker_number`. It converts `str(worker_id)` into `int`. It serves as a substitute for `session_data_mutex` for parallel test runs. - *Change* `normalize_net_box_result`. Replace a function with a decorator. Also, handle typical kinds of picodata responses. Extensively test it in `test_basics.py::test_eval/call_normalization`. - *Add* dataclass `RaftStatus`. It shouldn't be used outside `conftest.py`. It only makes assertions more brief in logs and understndable in code. - *Add* all raft stuff into the `Instance` class. This implies `raft_propose_eval`, `assert_raft_status` all `promote_or_fail` moved from `util.py`. - *Change* fixture `compile`. No need in extra logics since commit 59c31cb8. - *Preserve* fixture `binary_path`. - *Remove* fixtures `session_data_mutex` and `run_id`. Superseded with `xdist_worker_number`. - *Remove* fixtures `run_cluster` and `run_instance`. Superseded with `cluster.deploy(...)`. - *Remove* function `wait_tcp_port`. It's never enough to check raw socket. Superseded with `instance.wait_ready()`. - Give the instances clean names `i1, i2, ...`, and simple addresses `127.7.0.1:3301`. For the parallel test run use different IPs `127.7.N.1` etc. In `test_basics.py`: - *Add* `test_xdist_worker_number`. - *Add* `test_call_normalization` and `test_eval_normalization`. - *Add* `test_process_management`. It's brand new, never implemented in luatest before. - *Rename* `test_single_instance_raft_eval` to `test_propose_eval` and extend it with additional assertion from `single_test.lua`. - *Remove* `test_instance`. A part of its logics is moved to `test_call/eval_normalization`. The other part is rewritten and extended in `test_process_management`. - *Remove* `test_cluster`. It was completely useless because of inappropriate synchronization and no valuable assertions. - *Remove* `test_propose_eval`. It wasn't that useful, but failed because of inappropriate synchronization. In `test_couple.py`: - *Preserve* `test_follower_proposal` and `test_failover`. Just slightly refactor according to the new `conftest.py` API. In `util.py` (completely removed): - *Remove* decorator `retry`. Needless. - *Remove* decorator `retry_on_network_errors`. Inappropriate predicate didn't catch Lua errors. - *Remove* everything related to raft. Move it into `conftest.py`. Part of https://git.picodata.io/picodata/picodata/picodata/-/issues/59
-
- Apr 28, 2022
-
-
Yaroslav Dynnikov authored
-
Yaroslav Dynnikov authored
-
Sergey V authored
-
- Apr 24, 2022
-
-
Yaroslav Dynnikov authored
We don't want a child process to live without the supervisor. Usually, supervisor waits for child forever and retransmits termination signals. But if the parent is killed with a SIGKILL there's no way to pass anything. This patch supplies a child process with a `supervisor_fuse` fiber. It tries to read from a pipe (that supervisor never writes to), and if the writing end is closed, it means the supervisor has terminated. In this case, child process terminates too. Part of https://git.picodata.io/picodata/picodata/picodata/-/issues/56
- Apr 18, 2022
-
-
Yaroslav Dynnikov authored
Remove prctl dependency that is not available on mac.
-
Yaroslav Dynnikov authored
-
Yaroslav Dynnikov authored
This pach establishes a friendship between discovery and raft_join.
-
Yaroslav Dynnikov authored
feat: raft peer discovery PoC chore: prevent C functions from being optimized out feat: improve peer discovery fix: fix tests after making instance_id arg mandatory Smart supevision with fork Make it work Under development IPC messages to supervisor One more little step: entrypoint enum Arrange IPC from child to supervisor Remove tarantool_main macro Persist snapshot Fix some fresh bugs Implement postjoin Discovery under refactoring Enhance discovery Working on discovery Fix all discovery bugs known so far Draft join algorithm Fail applying snapshot Joining a learner works Cleanup snapshot generation Reorganize traft code and call join automatically Change peer.commit_index type from option to u64 Implement autopromotion to voter Implement read_index Take read_index before self-romotion Cleanup excess logs Cleanup logs and code Deep refactoring in progress Finish refactoring db schema Embed entries applying inside traft node Refactor raft node communication Replace fiber channel with a mailbox, which is a `Vec<_>` + fiber cond. It allows to batch raft commands in a more predictable way and makes the code less error-prone. Remove commented code Simplify raft nodes interaction over net_box Eliminate `traft::Message` struct because its internals aren't used. Instead, serialize `raft::Message` using protobuf. Batch ConnectionPool requests 1. Send messages in batches. 2. Allow changing connection uri. 3. Close unused connections after `inactivity_timeout`. Enhance the raft node 1. Collect results from raft node 2. Fix initial bootstrap which used to fail due to fiber race. 3. Wrap raft storage operations in a transaction. Bump tarantool module Add documentation draft Cleanup warnings Try fixing tests Fix test_storage_log Fix test_traft_pool Fix luatest single and couple Start fixing threesome test Implement concurrent join requests handling
-
- Apr 14, 2022
-
-
Sergey V authored
-
- Mar 10, 2022
-
-
Georgy Moshkin authored
-
- Mar 09, 2022
-
-
Georgy Moshkin authored
-
- Feb 22, 2022
-
-
Yaroslav Dynnikov authored
This patch introduces two tests: 1. `couple.test_failover` reproduces heatbeat timeout on a follower that leads to a new election. 2. `threesome.test_leader_dispuption` simulates disconnected follower. It shouldn't disrupt the leader in this case. Leader death is already tested in `threesome.test_log_rollback`. Close https://gitlab.com/picodata/picodata/picodata/-/issues/25
-
Yaroslav Dynnikov authored
Also, fix the code to pass the test: 1. Use `replace` instead of `insert` when appending entries. They may be overridden. 2. Increase pool worker queue size. Raft node may send several messages within a single tick. Without this change the second message is dropped. The second change is a poorly designed workaround. Constant limit doesn't seem to be both cost-effective and reliable at the same time, but we don't have a better solution at the moment. Close https://gitlab.com/picodata/picodata/picodata/-/issues/26
-
Yaroslav Dynnikov authored
-
- Feb 21, 2022
-
-
Yaroslav Dynnikov authored
Follower makes a proposal. It's applied on both instances. This patch also introduces a new API `picolib.raft_status`. Close https://gitlab.com/picodata/picodata/picodata/-/issues/24
-
- Feb 18, 2022
-
-
Georgy Moshkin authored
+ tarantool-sys submodule + tarantool-patches directory + build.rs build script to patch and build tarantool + static linking with tarantool + refactoring for cli arguments
-