- May 26, 2022
-
-
Yaroslav Dynnikov authored
-
Yaroslav Dynnikov authored
It's already formatted in conformity to usual `cargo test`. Also, remove unused command-line arguments from `picodata test` command. Close https://git.picodata.io/picodata/picodata/picodata/-/issues/61
-
- May 25, 2022
-
-
Sergey V authored
-
Alexander Tolstoy authored
-
- May 24, 2022
-
-
Yaroslav Dynnikov authored
-
Yaroslav Dynnikov authored
The intention is to eliminate ambiguities in the `Instance` API. Make it more like `subprocess` module (as regards `kill` and `terminate` functions).
-
Yaroslav Dynnikov authored
Behavior of `killpg` slightly differs in Mac and Linux. For some reason, `killpg` returns error EPERM when sending a signal to a zomibie process. And that is the reason of `test_process_management` failure on mac - there's a small gap between killing child and and subreaper calls `waitpid`. Now pytest handles this exception properly. Close https://git.picodata.io/picodata/picodata/picodata/-/issues/70 See also: - Stackoverflow: Why would `killpg` return "not permitted" when ownership is correct? https://stackoverflow.com/questions/12521705/why-would-killpg-return-not-permitted-when-ownership-is-correct - Linux `man 2 killpg`: https://linux.die.net/man/2/killpg#Notes > Notes > > There are various differences between the permission checking in > BSD-type systems and System V-type systems. See the POSIX rationale > for kill(). A difference not mentioned by POSIX concerns the return > value EPERM: BSD documents that no signal is sent and EPERM returned > when the permission check failed for **at least one** target process, > while POSIX documents EPERM only when the permission check failed for > **all** target processes. - MacOS `man 2 killpg`: https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/killpg.2.html > [EPERM] The sending process is not the super-user and > **one or more** of the target processes has an effective > user ID different from that of the sending process. - Linux `man 2 kill`: https://linux.die.net/man/2/kill > EPERM The process does not have permission to send the signal > *to any* of the target processes. > - Process states in Linux: https://kerneltalks.com/linux/process-states-in-linux/ - Reproduce killpg returning EPERM on MacOS: https://git.picodata.io/picodata/picodata/picodata/-/snippets/7
-
- May 23, 2022
-
-
Yaroslav Dynnikov authored
Pytest supports running tests in parallel using the `xdist` plugin. In order to support it in Picodata, one should avoid ports collision. It assigns each worker a dedicated IP address `127.7.n.1`, where `n = xdist_worker_number`. Unfortunately, it doesn't work on MacOS, because Mac doesn't provide any loopback aliases except `127.0.0.1` by default. This patch provides another address generation logics. The `subnet` parameter is superseeded with a `base_port`, that is `3300 + n * 100`. In this way, every pytest (xdist) worker gets a dedicated port range `[3301, 3399]`, `[3401, 3499]` and so on. Close https://git.picodata.io/picodata/picodata/picodata/-/issues/65
-
Alexander Tolstoy authored
-
Yaroslav Dynnikov authored
When bootstrapping an instance, there're two possible execution paths - `start_boot` and `start_join`. While `start_join` takes all uuids from JoinResponse, `start_boot` already deals with a bootstrapped `box.cfg` (it's done in `start_discover`, refer to [1]). In order to make uuids consistent across `box.cfg` and topology module, `start_boot` stage is preceded with rebootstrap. This case is also covered with a pytest. - [1] doc/clustering.md
-
Yaroslav Dynnikov authored
- Add corresponding field to the Peer struct. - Generate it in the topology module. - Use it in `box.cfg`. Close https://git.picodata.io/picodata/picodata/picodata/-/issues/51
-
Yaroslav Dynnikov authored
Address `replication_factor` when choosing `relicaset_id` for a new instance. It dosn't consider `failure_domain` yet, but takes into account the number of instances. Close https://git.picodata.io/picodata/picodata/picodata/-/issues/68
-
Yaroslav Dynnikov authored
- Choose it in the topology module if it's not provided in a `JoinRequest`. - Persist in `raft_group` space. - Respond with an error if `JoinRequest` contains different `replicaset_id`. - In `JoinResponse` it's transferred automatically. Part of https://git.picodata.io/picodata/picodata/picodata/-/issues/51
-
Yaroslav Dynnikov authored
Follow-up for https://git.picodata.io/picodata/picodata/picodata/-/issues/50
-
Yaroslav Dynnikov authored
- Generate it in the topology module - Persist it in `raft_group` space - Transfer it in `JoinResponse` - Use it in `box.cfg` Close https://git.picodata.io/picodata/picodata/picodata/-/issues/50
-
- May 21, 2022
-
-
Yaroslav Dynnikov authored
It's necessary to incapsulate topology management logics away from main.
-
Yaroslav Dynnikov authored
It encapsulates the logics of a JoinRequest batch processing. Topology module will be quite important in picodata. This first version misses a lot of features, but a few commits later it's going to implement quite a lot of logics. When a new instance is joined - there's one complex thing: raft leader has to decide where this new instance is going to be emplaced, i.e. what replicaset should it join. There're many different parameters have an influence - `repliction_factor`, `failure-domain`, and of course the existing topology. So, this new `topology` module must make the decision. This patch only refactors the current Picodata behavior, and doesn't bring new features for its users. Instead, it opens the door to a future development. Also, this patch provides a unit-testing basis for the future features.
-
- May 20, 2022
-
-
Yaroslav Dynnikov authored
Both JoinRequest and JoinResponse are going to be used in other modules. Move them one level above from `traft::node::*` to `traft::*`.
-
Yaroslav Dynnikov authored
One of the most tricky Raft cases is a so-called ABA problem [1]. In that case it's important to protect a batch of join requests with a term number. Since the whole batch is supplied with atomicity-sensitive uuids, applying it on a different term may cause an inconsistency, which is very, veeeery bad. [1] https://en.wikipedia.org/wiki/ABA_problem
-
Yaroslav Dynnikov authored
Implementation of `net_box` in `tarantool-module` resolves hostnames with a `to_socket_addrs` function that is blocking. Pytest uses fake addresses in one test, and sometimes it results in 5-second blockage and consequent test failure. This patch only provides a workaround. It makes a connection to fail even before the blocking DNS request. See also: - https://git.picodata.io/picodata/picodata/tarantool-module/-/issues/81
-
- May 17, 2022
-
-
Yaroslav Dynnikov authored
Before this patch, pytest used to launch all instances in a clean environment. It prevented running with `PICODATA_LOG_LEVEL=verbose`.
-
Yaroslav Dynnikov authored
Specifying the particular directory with tests significantly speeds up their collection and execution. Before: ```console $ time pytest -k nothing 12 deselected in 1.36s ``` After: ``` 12 deselected in 0.04s ```
-
This patch covers one more case when discovery request is handled by an instance that has the discovery module unitialized.
-
- May 16, 2022
-
-
Yaroslav Dynnikov authored
By default cargo runs tests in parallel in multiple threads. Both `test_log_level` and `test_parse` access environment variables which are shared across threads. Consequently, their concurrent modification results in the test failure. This patch unites these two tests making it linear.
-
- May 13, 2022
-
-
- May 12, 2022
-
-
Yaroslav Dynnikov authored
-
Yaroslav Dynnikov authored
Commit 1a3b5233 missed a bug. Iteration over instances could be aborted by an exception during teardown. It resulted in garbage process remaining alive after pytest termination.
-
Yaroslav Dynnikov authored
1. Lower log level of connection errors in `netork.rs`. 2. Give raft fibers a name.
-
Yaroslav Dynnikov authored
There were some problems with join requests synchronization. Raft forbids proposing a configuration change if there's another one uncommitted (see [1]). In that case, it replaces an `EntryConfChange` with an `EntryNormal`. It could happen at any time even without bugs in code due to the network partitioning, and its the repsonsibility of the picodata product to handle it properly. Earlier, there was no way to wait when raft leaves the joint state. It used to slow down cluster assembling and made it race-prone. The waiting for the cluster readiness is also important in tests. Some operations (the most important amongst them is leader switching) are impossible until instance finishes promotion to a voter. For instance, raft rejects `MsgTimeoutNow` unless the node is promotable (see [2]). It makes some testing scenarios flaky. This patch introduces new synchronization primitive - `JointStateLatch`. The latch is held on the leader and is locked upon `raw_node.propose_conf_change()`. It's unlocked only when the second (implicit) conf change that represents leaving joint state is committed. The latch also tracks the index of the corresponding `EntryConfChange`. Even if raft ignores it for any reason, the latch is still unlocked as soon as the committed index exceeds the one of the latch. [1] https://github.com/tikv/raft-rs/blob/v0.6.0/src/raft.rs#L2014-L2026 [2] https://github.com/tikv/raft-rs/blob/v0.6.0/src/raft.rs#L2314 Close https://git.picodata.io/picodata/picodata/picodata/-/issues/47 Close https://git.picodata.io/picodata/picodata/picodata/-/issues/53
-
Yaroslav Dynnikov authored
Waiting for a valid `leader_id` on a node isn't enough. It may already have one, but still be a Learner. Instead, the fixture should wait until the node is promoted to voter.
-
Yaroslav Dynnikov authored
The assertion `status == "Leader"` was in the first place, and `raft_timeout_now` call was unreachable.
-
- May 11, 2022
-
-
Yaroslav Dynnikov authored
1. Print logs to the stderr so that they interleave with tarantool logs. 2. Fix `cluster.__repr__()`.
-
Yaroslav Dynnikov authored
1. Review `Pipfile`: - Remove unused `filelock`; - Install `mypy` - static type checker for Python. 2. Add new command `pipenv run lint`. 3. Enable `mypy` in CI. Fix reported errors in `test_basics.py`. 4. Renew readme.
-
Yaroslav Dynnikov authored
In `conftest.py`: - *Add* function `xdist_worker_number`. It converts `str(worker_id)` into `int`. It serves as a substitute for `session_data_mutex` for parallel test runs. - *Change* `normalize_net_box_result`. Replace a function with a decorator. Also, handle typical kinds of picodata responses. Extensively test it in `test_basics.py::test_eval/call_normalization`. - *Add* dataclass `RaftStatus`. It shouldn't be used outside `conftest.py`. It only makes assertions more brief in logs and understndable in code. - *Add* all raft stuff into the `Instance` class. This implies `raft_propose_eval`, `assert_raft_status` all `promote_or_fail` moved from `util.py`. - *Change* fixture `compile`. No need in extra logics since commit 59c31cb8. - *Preserve* fixture `binary_path`. - *Remove* fixtures `session_data_mutex` and `run_id`. Superseded with `xdist_worker_number`. - *Remove* fixtures `run_cluster` and `run_instance`. Superseded with `cluster.deploy(...)`. - *Remove* function `wait_tcp_port`. It's never enough to check raw socket. Superseded with `instance.wait_ready()`. - Give the instances clean names `i1, i2, ...`, and simple addresses `127.7.0.1:3301`. For the parallel test run use different IPs `127.7.N.1` etc. In `test_basics.py`: - *Add* `test_xdist_worker_number`. - *Add* `test_call_normalization` and `test_eval_normalization`. - *Add* `test_process_management`. It's brand new, never implemented in luatest before. - *Rename* `test_single_instance_raft_eval` to `test_propose_eval` and extend it with additional assertion from `single_test.lua`. - *Remove* `test_instance`. A part of its logics is moved to `test_call/eval_normalization`. The other part is rewritten and extended in `test_process_management`. - *Remove* `test_cluster`. It was completely useless because of inappropriate synchronization and no valuable assertions. - *Remove* `test_propose_eval`. It wasn't that useful, but failed because of inappropriate synchronization. In `test_couple.py`: - *Preserve* `test_follower_proposal` and `test_failover`. Just slightly refactor according to the new `conftest.py` API. In `util.py` (completely removed): - *Remove* decorator `retry`. Needless. - *Remove* decorator `retry_on_network_errors`. Inappropriate predicate didn't catch Lua errors. - *Remove* everything related to raft. Move it into `conftest.py`. Part of https://git.picodata.io/picodata/picodata/picodata/-/issues/59
-
- May 08, 2022
-
-
Yaroslav Dynnikov authored
-
- May 06, 2022
-
-