- May 24, 2022
-
-
Yaroslav Dynnikov authored
Behavior of `killpg` slightly differs in Mac and Linux. For some reason, `killpg` returns error EPERM when sending a signal to a zomibie process. And that is the reason of `test_process_management` failure on mac - there's a small gap between killing child and and subreaper calls `waitpid`. Now pytest handles this exception properly. Close https://git.picodata.io/picodata/picodata/picodata/-/issues/70 See also: - Stackoverflow: Why would `killpg` return "not permitted" when ownership is correct? https://stackoverflow.com/questions/12521705/why-would-killpg-return-not-permitted-when-ownership-is-correct - Linux `man 2 killpg`: https://linux.die.net/man/2/killpg#Notes > Notes > > There are various differences between the permission checking in > BSD-type systems and System V-type systems. See the POSIX rationale > for kill(). A difference not mentioned by POSIX concerns the return > value EPERM: BSD documents that no signal is sent and EPERM returned > when the permission check failed for **at least one** target process, > while POSIX documents EPERM only when the permission check failed for > **all** target processes. - MacOS `man 2 killpg`: https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/killpg.2.html > [EPERM] The sending process is not the super-user and > **one or more** of the target processes has an effective > user ID different from that of the sending process. - Linux `man 2 kill`: https://linux.die.net/man/2/kill > EPERM The process does not have permission to send the signal > *to any* of the target processes. > - Process states in Linux: https://kerneltalks.com/linux/process-states-in-linux/ - Reproduce killpg returning EPERM on MacOS: https://git.picodata.io/picodata/picodata/picodata/-/snippets/7
-
- May 23, 2022
-
-
Yaroslav Dynnikov authored
Pytest supports running tests in parallel using the `xdist` plugin. In order to support it in Picodata, one should avoid ports collision. It assigns each worker a dedicated IP address `127.7.n.1`, where `n = xdist_worker_number`. Unfortunately, it doesn't work on MacOS, because Mac doesn't provide any loopback aliases except `127.0.0.1` by default. This patch provides another address generation logics. The `subnet` parameter is superseeded with a `base_port`, that is `3300 + n * 100`. In this way, every pytest (xdist) worker gets a dedicated port range `[3301, 3399]`, `[3401, 3499]` and so on. Close https://git.picodata.io/picodata/picodata/picodata/-/issues/65
-
Yaroslav Dynnikov authored
When bootstrapping an instance, there're two possible execution paths - `start_boot` and `start_join`. While `start_join` takes all uuids from JoinResponse, `start_boot` already deals with a bootstrapped `box.cfg` (it's done in `start_discover`, refer to [1]). In order to make uuids consistent across `box.cfg` and topology module, `start_boot` stage is preceded with rebootstrap. This case is also covered with a pytest. - [1] doc/clustering.md
-
Yaroslav Dynnikov authored
Follow-up for https://git.picodata.io/picodata/picodata/picodata/-/issues/50
-
- May 20, 2022
-
-
Yaroslav Dynnikov authored
Implementation of `net_box` in `tarantool-module` resolves hostnames with a `to_socket_addrs` function that is blocking. Pytest uses fake addresses in one test, and sometimes it results in 5-second blockage and consequent test failure. This patch only provides a workaround. It makes a connection to fail even before the blocking DNS request. See also: - https://git.picodata.io/picodata/picodata/tarantool-module/-/issues/81
-
- May 17, 2022
-
-
Yaroslav Dynnikov authored
Before this patch, pytest used to launch all instances in a clean environment. It prevented running with `PICODATA_LOG_LEVEL=verbose`.
-
This patch covers one more case when discovery request is handled by an instance that has the discovery module unitialized.
-
- May 13, 2022
-
-
- May 12, 2022
-
-
Yaroslav Dynnikov authored
-
Yaroslav Dynnikov authored
Commit 1a3b5233 missed a bug. Iteration over instances could be aborted by an exception during teardown. It resulted in garbage process remaining alive after pytest termination.
-
Yaroslav Dynnikov authored
There were some problems with join requests synchronization. Raft forbids proposing a configuration change if there's another one uncommitted (see [1]). In that case, it replaces an `EntryConfChange` with an `EntryNormal`. It could happen at any time even without bugs in code due to the network partitioning, and its the repsonsibility of the picodata product to handle it properly. Earlier, there was no way to wait when raft leaves the joint state. It used to slow down cluster assembling and made it race-prone. The waiting for the cluster readiness is also important in tests. Some operations (the most important amongst them is leader switching) are impossible until instance finishes promotion to a voter. For instance, raft rejects `MsgTimeoutNow` unless the node is promotable (see [2]). It makes some testing scenarios flaky. This patch introduces new synchronization primitive - `JointStateLatch`. The latch is held on the leader and is locked upon `raw_node.propose_conf_change()`. It's unlocked only when the second (implicit) conf change that represents leaving joint state is committed. The latch also tracks the index of the corresponding `EntryConfChange`. Even if raft ignores it for any reason, the latch is still unlocked as soon as the committed index exceeds the one of the latch. [1] https://github.com/tikv/raft-rs/blob/v0.6.0/src/raft.rs#L2014-L2026 [2] https://github.com/tikv/raft-rs/blob/v0.6.0/src/raft.rs#L2314 Close https://git.picodata.io/picodata/picodata/picodata/-/issues/47 Close https://git.picodata.io/picodata/picodata/picodata/-/issues/53
-
Yaroslav Dynnikov authored
Waiting for a valid `leader_id` on a node isn't enough. It may already have one, but still be a Learner. Instead, the fixture should wait until the node is promoted to voter.
-
Yaroslav Dynnikov authored
The assertion `status == "Leader"` was in the first place, and `raft_timeout_now` call was unreachable.
-
- May 11, 2022
-
-
Yaroslav Dynnikov authored
1. Print logs to the stderr so that they interleave with tarantool logs. 2. Fix `cluster.__repr__()`.
-
Yaroslav Dynnikov authored
1. Review `Pipfile`: - Remove unused `filelock`; - Install `mypy` - static type checker for Python. 2. Add new command `pipenv run lint`. 3. Enable `mypy` in CI. Fix reported errors in `test_basics.py`. 4. Renew readme.
-
Yaroslav Dynnikov authored
In `conftest.py`: - *Add* function `xdist_worker_number`. It converts `str(worker_id)` into `int`. It serves as a substitute for `session_data_mutex` for parallel test runs. - *Change* `normalize_net_box_result`. Replace a function with a decorator. Also, handle typical kinds of picodata responses. Extensively test it in `test_basics.py::test_eval/call_normalization`. - *Add* dataclass `RaftStatus`. It shouldn't be used outside `conftest.py`. It only makes assertions more brief in logs and understndable in code. - *Add* all raft stuff into the `Instance` class. This implies `raft_propose_eval`, `assert_raft_status` all `promote_or_fail` moved from `util.py`. - *Change* fixture `compile`. No need in extra logics since commit 59c31cb8. - *Preserve* fixture `binary_path`. - *Remove* fixtures `session_data_mutex` and `run_id`. Superseded with `xdist_worker_number`. - *Remove* fixtures `run_cluster` and `run_instance`. Superseded with `cluster.deploy(...)`. - *Remove* function `wait_tcp_port`. It's never enough to check raw socket. Superseded with `instance.wait_ready()`. - Give the instances clean names `i1, i2, ...`, and simple addresses `127.7.0.1:3301`. For the parallel test run use different IPs `127.7.N.1` etc. In `test_basics.py`: - *Add* `test_xdist_worker_number`. - *Add* `test_call_normalization` and `test_eval_normalization`. - *Add* `test_process_management`. It's brand new, never implemented in luatest before. - *Rename* `test_single_instance_raft_eval` to `test_propose_eval` and extend it with additional assertion from `single_test.lua`. - *Remove* `test_instance`. A part of its logics is moved to `test_call/eval_normalization`. The other part is rewritten and extended in `test_process_management`. - *Remove* `test_cluster`. It was completely useless because of inappropriate synchronization and no valuable assertions. - *Remove* `test_propose_eval`. It wasn't that useful, but failed because of inappropriate synchronization. In `test_couple.py`: - *Preserve* `test_follower_proposal` and `test_failover`. Just slightly refactor according to the new `conftest.py` API. In `util.py` (completely removed): - *Remove* decorator `retry`. Needless. - *Remove* decorator `retry_on_network_errors`. Inappropriate predicate didn't catch Lua errors. - *Remove* everything related to raft. Move it into `conftest.py`. Part of https://git.picodata.io/picodata/picodata/picodata/-/issues/59
-
- Apr 28, 2022
-
-
Yaroslav Dynnikov authored
-
Yaroslav Dynnikov authored
-
Sergey V authored
-