Commits · 46567ebcba68f0f8f4e32d1619fdb1c358c83530 · core / picodata

Jul 01, 2022
- feat(switchover): implement voter switchover rules · 46567ebc
  Georgy Moshkin authored 2 years ago
  
  46567ebc
Jun 30, 2022
- revise two more pytests · b7049dd1
  Yaroslav Dynnikov authored 2 years ago
  
  b7049dd1
- fix: conftest cluster deploying · 72787c37
  Yaroslav Dynnikov authored 2 years ago
  
  72787c37
Jun 27, 2022
- refactor: return raft op result · 0512946a
  Yaroslav Dynnikov authored 2 years ago
  
  0512946a
- feat: optional instance_id · 1002372c
  Valentin Syrovatskiy authored 2 years ago
  
  1002372c
Jun 23, 2022

feat(failover): deactivate instance on shutdown · f66392e2

Georgy Moshkin authored 2 years ago and

Georgy Moshkin committed 2 years ago

deactivation means that instance
- is demoted to learner (if it wasn't one already)
- is marked "inactive" so that it's ignored when determining
    the number of voters required for the cluster

f66392e2

Jun 17, 2022

feature: setup replication both ways · 1963893e

Yaroslav Dynnikov authored 2 years ago and

Yaroslav Dynnikov committed 2 years ago

Bootstrapping the replication was implemented in 2dac77c5. But it was
configured on the new instance only. The old instance (that joined
earlier) couldn't update `box.cfg({replication})` until now.

Close https://git.picodata.io/picodata/picodata/picodata/-/issues/52

1963893e

Jun 15, 2022
- feat: rebootstrap Follower in a cluster of 3+ · 44804d01
  Valentin Syrovatskiy authored 2 years ago and Yaroslav Dynnikov committed 2 years ago
  
  44804d01
Jun 06, 2022

test: add colored instance name prefix in tests' output · 0eccb692
Sergey V authored 2 years ago and Yaroslav Dynnikov committed 2 years ago

0eccb692

fix(discovery): don't fail if raft node is ready but leader_id is not · 31bf1bc2

Georgy Moshkin authored 2 years ago

If proc_discover is invoked after raft node was initialized but before
raft leader was elected, it would return an error before this commit.
Because of that it was impossible to restart the whole cluster at once.

This commit change proc_discover such that in case leader_id is not
ready, the normal discovery algorithm takes place.

Closes #93

31bf1bc2

Jun 03, 2022
- fix(test): fix some pyright warnings · 628d2667
  Georgy Moshkin authored 2 years ago and Georgy Moshkin committed 2 years ago
  
  628d2667
Jun 02, 2022
- test: add test for graceful stopping · 95de80f3
  Georgy Moshkin authored 2 years ago
  
  95de80f3
Jun 01, 2022

test: restart instances · 0c5b947f

Yaroslav Dynnikov authored 2 years ago

Restarting both instances doesn't work yet, to be fixed later.

Close https://git.picodata.io/picodata/picodata/picodata/-/issues/90

0c5b947f

fix: pytest raft_status assertion message · 4250a0c8

Yaroslav Dynnikov authored 2 years ago

Since commit d87dd4ca `leader_id` became an `Option`, so the `None`
value isn't rendered in the `picolib.raft_status` response:

```python
status={'is_ready': False, 'raft_state': 'Follower', 'id': 1}
```

It makes pytest complain about missing argument:

```
cluster2 = Cluster("127.0.0.1:3300", n=2)

    def test_restart_leader(cluster2: Cluster):
        i1, i2 = cluster2.instances
        i1.assert_raft_status('Leader')
        i2.assert_raft_status('Follower')

        i1.restart()
>       i1.wait_ready()

test/int/test_joining.py:209:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../.local/share/virtualenvs/picodata-6sv6l6y-/lib/python3.10/site-packages/funcy/decorators.py:45:
in wrapper
    return deco(call, *dargs, **dkwargs)
../../.local/share/virtualenvs/picodata-6sv6l6y-/lib/python3.10/site-packages/funcy/flow.py:127:
in retry
    return call()
../../.local/share/virtualenvs/picodata-6sv6l6y-/lib/python3.10/site-packages/funcy/decorators.py:66:
in __call__
    return self._func(*self._args, **self._kwargs)
test/int/conftest.py:305: in wait_ready
    status = self._raft_status()
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = Instance(i1, listen=127.0.0.1:3301)

    def _raft_status(self) -> RaftStatus:
        status = self.call("picolib.raft_status")
        assert isinstance(status, dict)
        eprint(f"{status=}")
>       return RaftStatus(**status)
E       TypeError: RaftStatus.__init__() missing 1 required positional
argument: 'leader_id'

test/int/conftest.py:280: TypeError
```

This patch fixes the failure message:

```
self = Instance(i1, listen=127.0.0.1:3301)

    @funcy.retry(tries=20, timeout=0.1)
    def wait_ready(self):
        status = self._raft_status()
>       assert status.is_ready
E       AssertionError: assert False
E        +  where False = RaftStatus(id=1, raft_state='Follower',
is_ready=False, leader_id=None).is_ready

test/int/conftest.py:306: AssertionError
```

4250a0c8

feat: --cluster-id parameter · f8ac1dbe

Sergey V authored 2 years ago

* Make `--cluster-id` CLI mandatory.
* Handle cluster_id mismatch in raft_join.
  When an instance attempts to join the cluster and the instances's
  `--instance-id` parameter mismatches the cluster_id of the cluster
  an error is raised inside the raft_join handler.

f8ac1dbe

refactor(test): make xdist_worker_number a fixture · 39ef4e09
Sergey V authored 2 years ago

39ef4e09

May 31, 2022
- test: parallel discovery test (currently failing) · 39498cb6
  Georgy Moshkin authored 2 years ago
  
  39498cb6
- refactor: add prefixes to the instance output · 08f68542
  Sergey V authored 2 years ago
  
  08f68542
May 30, 2022
- test: rename a method for better readability · 4d72cc98
  Yaroslav Dynnikov authored 2 years ago
  
  4d72cc98
- test: replication setup · 248aee81
  Yaroslav Dynnikov authored 2 years ago
  
  248aee81
- test: make logs prettier · e5196d3b
  Yaroslav Dynnikov authored 2 years ago
  
  Pytest has a feature to segregate setup, test, and teardown logs. The setup phase is considered to be an intialization of the fixtures. In order to split logs properly `cluster.deploy()` is now called inside a fixture.
  e5196d3b
May 26, 2022

chore: reveal inner tests log · 32ea48df

Yaroslav Dynnikov authored 2 years ago

It's already formatted in conformity to usual `cargo test`.
Also, remove unused command-line arguments from `picodata test` command.

Close https://git.picodata.io/picodata/picodata/picodata/-/issues/61

32ea48df

May 25, 2022
- refactor: int tests · 3aa31215
  Sergey V authored 2 years ago
  
  3aa31215
May 24, 2022

test: log more instance management events · 38a76846
Yaroslav Dynnikov authored 2 years ago

38a76846

test: rename instance.killpg -> kill · d6c3027f

Yaroslav Dynnikov authored 2 years ago

The intention is to eliminate ambiguities in the `Instance` API. Make
it more like `subprocess` module (as regards `kill` and `terminate`
functions).

d6c3027f

fix: killpg permission error on macos · edce4535

Yaroslav Dynnikov authored 2 years ago

Behavior of `killpg` slightly differs in Mac and Linux. For some reason,
`killpg` returns error EPERM when sending a signal to a zomibie process.
And that is the reason of `test_process_management` failure on mac -
there's a small gap between killing child and and subreaper calls
`waitpid`.

Now pytest handles this exception properly.

Close https://git.picodata.io/picodata/picodata/picodata/-/issues/70

See also:

- Stackoverflow: Why would `killpg` return "not permitted" when ownership is correct?
  https://stackoverflow.com/questions/12521705/why-would-killpg-return-not-permitted-when-ownership-is-correct

- Linux `man 2 killpg`:
  https://linux.die.net/man/2/killpg#Notes

  > Notes
  >
  > There are various differences between the permission checking in
  > BSD-type systems and System V-type systems. See the POSIX rationale
  > for kill(). A difference not mentioned by POSIX concerns the return
  > value EPERM: BSD documents that no signal is sent and EPERM returned
  > when the permission check failed for **at least one** target process,
  > while POSIX documents EPERM only when the permission check failed for
  > **all** target processes.

- MacOS `man 2 killpg`:
  https://developer.apple.com/library/archive/documentation/System/Conceptual/ManPages_iPhoneOS/man2/killpg.2.html

  > [EPERM] The sending process is not the super-user and
  >         **one or more** of the target processes has an effective
  >         user ID different from that of the sending process.

- Linux `man 2 kill`:
  https://linux.die.net/man/2/kill

  >  EPERM  The process does not have permission to send the signal
  >         *to any* of the target processes.
  >

- Process states in Linux:
  https://kerneltalks.com/linux/process-states-in-linux/

- Reproduce killpg returning EPERM on MacOS:
  https://git.picodata.io/picodata/picodata/picodata/-/snippets/7

edce4535

May 23, 2022

fix: test running on macos · 46eaf810

Yaroslav Dynnikov authored 2 years ago

Pytest supports running tests in parallel using the `xdist` plugin. In
order to support it in Picodata, one should avoid ports collision. It
assigns each worker a dedicated IP address `127.7.n.1`, where
`n = xdist_worker_number`. Unfortunately, it doesn't work on MacOS,
because Mac doesn't provide any loopback aliases except `127.0.0.1`
by default.

This patch provides another address generation logics. The `subnet`
parameter is superseeded with a `base_port`, that is `3300 + n * 100`.
In this way, every pytest (xdist) worker gets a dedicated port range
`[3301, 3399]`, `[3401, 3499]` and so on.

Close https://git.picodata.io/picodata/picodata/picodata/-/issues/65

46eaf810

bug: uuid mismatch on bootstrap leader · b2b1b293

Yaroslav Dynnikov authored 2 years ago

When bootstrapping an instance, there're two possible execution paths -
`start_boot` and `start_join`. While `start_join` takes all uuids from
JoinResponse, `start_boot` already deals with a bootstrapped `box.cfg`
(it's done in `start_discover`, refer to [1]). In order to make uuids
consistent across `box.cfg` and topology module, `start_boot` stage is
preceded with rebootstrap.

This case is also covered with a pytest.

- [1] doc/clustering.md

b2b1b293

test: employ instance_uuid in pytest · 19f1f622
Yaroslav Dynnikov authored 2 years ago
```
Follow-up for https://git.picodata.io/picodata/picodata/picodata/-/issues/50
```
19f1f622

May 20, 2022

fix: avoid requests to invalid hostname · 67b62df9

Yaroslav Dynnikov authored 2 years ago

Implementation of `net_box` in `tarantool-module` resolves hostnames
with a `to_socket_addrs` function that is blocking. Pytest uses
fake addresses in one test, and sometimes it results in 5-second
blockage and consequent test failure.

This patch only provides a workaround. It makes a connection to fail
even before the blocking DNS request.

See also:
- https://git.picodata.io/picodata/picodata/tarantool-module/-/issues/81

67b62df9

May 17, 2022

fix: preserve env when running pytest · 9439cbdd

Yaroslav Dynnikov authored 2 years ago

Before this patch, pytest used to launch all instances in a clean
environment. It prevented running with `PICODATA_LOG_LEVEL=verbose`.

9439cbdd

test: dynamic discovery · 47a6e7cb

Yaroslav Dynnikov authored 2 years ago and

Yaroslav Dynnikov committed 2 years ago

This patch covers one more case when discovery request is handled
by an instance that has the discovery module unitialized.

47a6e7cb

May 13, 2022
- test: dynamic discovery test · ebaa9ca2
  Georgy Moshkin authored 2 years ago and Georgy Moshkin committed 2 years ago
  
  ebaa9ca2
May 12, 2022

test: try sending raft_join request to a follower · a509c703
Yaroslav Dynnikov authored 2 years ago

a509c703

fix: pytest cluster teardown · 325c18a9

Yaroslav Dynnikov authored 2 years ago

Commit 1a3b5233 missed a bug. Iteration over instances could be aborted
by an exception during teardown. It resulted in garbage process
remaining alive after pytest termination.

325c18a9

feature: concurrent join requests handling · 9b079eae

Yaroslav Dynnikov authored 2 years ago

There were some problems with join requests synchronization. Raft
forbids proposing a configuration change if there's another one
uncommitted (see [1]). In that case, it replaces an `EntryConfChange`
with an `EntryNormal`. It could happen at any time even without bugs in
code due to the network partitioning, and its the repsonsibility of
the picodata product to handle it properly.

Earlier, there was no way to wait when raft leaves the joint state. It
used to slow down cluster assembling and made it race-prone. The waiting
for the cluster readiness is also important in tests. Some operations
(the most important amongst them is leader switching) are impossible
until instance finishes promotion to a voter. For instance, raft rejects
`MsgTimeoutNow` unless the node is promotable (see [2]). It makes some
testing scenarios flaky.

This patch introduces new synchronization primitive - `JointStateLatch`.
The latch is held on the leader and is locked upon
`raw_node.propose_conf_change()`. It's unlocked only when the second
(implicit) conf change that represents leaving joint state is committed.
The latch also tracks the index of the corresponding `EntryConfChange`.
Even if raft ignores it for any reason, the latch is still unlocked as
soon as the committed index exceeds the one of the latch.

[1] https://github.com/tikv/raft-rs/blob/v0.6.0/src/raft.rs#L2014-L2026
[2] https://github.com/tikv/raft-rs/blob/v0.6.0/src/raft.rs#L2314

Close https://git.picodata.io/picodata/picodata/picodata/-/issues/47
Close https://git.picodata.io/picodata/picodata/picodata/-/issues/53

9b079eae

fix: pytest wait_ready implementation · 7b94717a

Yaroslav Dynnikov authored 2 years ago

Waiting for a valid `leader_id` on a node isn't enough. It may already
have one, but still be a Learner. Instead, the fixture should wait until
the node is promoted to voter.

7b94717a

fix: pytest promote_or_fail implementation · bf02c082

Yaroslav Dynnikov authored 2 years ago

The assertion `status == "Leader"` was in the first place, and
`raft_timeout_now` call was unreachable.

bf02c082

May 11, 2022

test: arrange pytest logging · 71d5bb1f

Yaroslav Dynnikov authored 2 years ago

1. Print logs to the stderr so that they interleave with tarantool logs.
2. Fix `cluster.__repr__()`.

71d5bb1f

test: revise pytest environment · 1bfc5a21

Yaroslav Dynnikov authored 2 years ago

1. Review `Pipfile`:
  - Remove unused `filelock`;
  - Install `mypy` - static type checker for Python.
2. Add new command `pipenv run lint`.
3. Enable `mypy` in CI. Fix reported errors in `test_basics.py`.
4. Renew readme.

1bfc5a21