Commits · 1d329f0bbbf98dff878588b2ad754e2dd18404a7 · core / tarantool

Sep 29, 2020

raft: introduce box.cfg.election_* options · 1d329f0b

Vladislav Shpilevoy authored 4 years ago

The new options are:

- election_is_enabled - enable/disable leader election (via
  Raft). When disabled, the node is supposed to work like if Raft
  does not exist. Like earlier;

- election_is_candidate - a flag whether the instance can try to
  become a leader. Note, it can vote for other nodes regardless
  of value of this option;

- election_timeout - how long need to wait until election end, in
  seconds.

The options don't do anything now. They are added separately in
order to keep such mundane changes from the main Raft commit, to
simplify its review.

Option names don't mention 'Raft' on purpose, because
- Not all users know what is Raft, so they may not even know it
  is related to leader election;
- In future the algorithm may change from Raft to something else,
  so better not to depend on it too much in the public API.

Part of #1146

1d329f0b

raft: introduce persistent raft state · 4f0f7c8f

Vladislav Shpilevoy authored 4 years ago

The patch introduces a sceleton of Raft module and a method to
persist a Raft state in snapshot, not bound to any space.

Part of #1146

4f0f7c8f

replication: track registered replica count · 764a548a

Vladislav Shpilevoy authored 4 years ago

Struct replicaset didn't store a number of registered replicas.
Only an array, which was necessary to fullscan each time when want
to find the count.

That is going to be needed in Raft to calculate election quorum.
The patch makes the count tracked so as it could be found for
constant time by simply reading an integer.

Needed for #1146

764a548a

wal: don't touch box.cfg.wal_dir more than once · 40335790

Vladislav Shpilevoy authored 4 years ago

Relay.cc and box.cc obtained box.cfg.wal_dir value using
cfg_gets() call. To initialize WAL and create struct recovery
objects.

That is not only a bit dangerous (cfg_gets() uses Lua API and can
throw a Lua error) and slow, but also not necessary - wal_dir
parameter is constant, it can't be changed after instance start.

It means, the value can be stored somewhere one time and then used
without Lua.

Main motivation is that the WAL directory path will be needed
inside relay threads to restart their recovery iterators in the
Raft patch. They can't use cfg_gets(), because Lua lives in TX
thread. But can access a constant global variable, introduced in
this patch (it existed before, but now has a method to get it).

Needed for #1146

40335790

box: introduce summary RO flag · 31dc4faf

Vladislav Shpilevoy authored 4 years ago

An instance is writable if box.cfg.read_only is false, and it is
not orphan. Update of the final read-only state of the instance
needs to fire read-only update triggers, and notify the engines.
These 2 flags were easy and cheap to check on each operation, and
the triggers were easy to use since both flags are stored and
updated inside box.cc.

That is going to change when Raft is introduced. Raft will add 2
more checks:

  - A flag if Raft is enabled on the node. If it is not, then Raft
    state won't affect whether the instance is writable;

  - When Raft is enabled, it will allow writes on a leader only.

It means a check for being read-only would look like this:

    is_ro || is_orphan || (raft_is_enabled() && !raft_is_leader())

This is significantly slower. Besides, Raft somehow needs to
access the read-only triggers and engine API - this looks wrong.

The patch introduces a new flag is_ro_summary. The flag
incorporates all the read-only conditions into one flag. When some
subsystem may change read-only state of the instance, it needs to
call box_update_ro_summary(), and the function takes care of
updating the summary flag, running the triggers, and notifying the
engines.

Raft will use this function when its state or config will change.

Needed for #1146

31dc4faf

applier: store instance_id in struct applier · 7c14819f

Vladislav Shpilevoy authored 4 years ago

Applier is going to need its numeric ID in order to tell the
future Raft module who is a sender of a Raft message. An
alternative would be to add sender ID to each Raft message, but
this looks like a crutch. Moreover, applier still needs to know
its numeric ID in order to notify Raft about heartbeats from the
peer node.

Needed for #1146

7c14819f

httpc: src/httpc.c missed va_end() macro · 90108875

Sergey Kaplun authored 4 years ago

Found and fixed not closed va_list 'ap' with cppcheck:

[src/httpc.c:190]: (error) va_list 'ap' was opened but not closed by va_end().

90108875

Sep 28, 2020

box: disallow to alter SQL view · c5cb8d31

Roman Khabibov authored 4 years ago

Ban ability to modify view on box level. Since a view is a named
select, and not a table, in fact, altering view is not a valid
operation.

c5cb8d31

Add flaky tests checksums to fragile · 75ba744b

Alexander V. Tikhonov authored 4 years ago

Added for tests with issues:
  app/fiber.test.lua				gh-5341
  app-tap/debug.test.lua			gh-5346
  app-tap/http_client.test.lua			gh-5346
  app-tap/inspector.test.lua			gh-5346
  box/gh-2763-session-credentials-update.test.lua gh-5363
  box/hash_collation.test.lua			gh-5247
  box/lua.test.lua				gh-5351
  box/net.box_connect_triggers_gh-2858.test.lua	gh-5247
  box/net.box_incompatible_index-gh-1729.test.lua gh-5360
  box/net.box_on_schema_reload-gh-1904.test.lua gh-5354
  box/protocol.test.lua				gh-5247
  box/update.test.lua				gh-5247
  box-tap/net.box.test.lua			gh-5346
  replication/autobootstrap.test.lua		gh-4533
  replication/autobootstrap_guest.test.lua	gh-4533
  replication/ddl.test.lua			gh-5337
  replication/gh-3160-misc-heartbeats-on-master-changes.test.lua gh-4940
  replication/gh-3247-misc-iproto-sequence-value-not-replicated.test.lua.test.lua gh-5357
  replication/gh-3637-misc-error-on-replica-auth-fail.test.lua gh-5343
  replication/long_row_timeout.test.lua		gh-4351
  replication/on_replace.test.lua		gh-5344, gh-5349
  replication/prune.test.lua			gh-5361
  replication/qsync_advanced.test.lua		gh-5340
  replication/qsync_basic.test.lua		gh-5355
  replication/replicaset_ro_mostly.test.lua	gh-5342
  replication/wal_rw_stress.test.lua		gh-5347
  replication-py/multi.test.py			gh-5362
  sql/prepared.test.lua test			gh-5359
  sql-tap/selectG.test.lua			gh-5350
  vinyl/ddl.test.lua				gh-5338
  vinyl/gh-3395-read-prepared-uncommitted.test.lua gh-5197
  vinyl/iterator.test.lua			gh-5336
  vinyl/write_iterator_rand.test.lua	gh-5356
  xlog/panic_on_wal_error.test.lua		gh-5348

75ba744b

alter: box/alter.cc null pointer dereference · 641c1710

Sergey Kaplun authored 4 years ago

Found and fixed Null pointer dereference with cppcheck:

[src/box/alter.cc:395]: (error) Null pointer dereference

641c1710

fiber: src/lua/fiber.c null pointer dereference · 58f84d1a

Sergey Kaplun authored 4 years ago

[src/lua/fiber.c:245] -> [src/lua/fiber.c:217]: (warning) Either the condition 'if(func)' is redundant or there is possible null pointer dereference: func.

58f84d1a

Sep 26, 2020

test: update test-run · 3cf741b9

Alexander Turenko authored 4 years ago

Updated test_run:wait_upstream() and test_run:wait_downstream() to wait
until box will be configured and an instance with given ID will appear
in box.info.replication.

See https://github.com/tarantool/test-run/issues/221

Fixes #5317
Fixes #5329

Unverified

3cf741b9

Sep 25, 2020

test: update test-run · 59dca189
Alexander Turenko authored 4 years ago
```
Justify columns in the output.

https://github.com/tarantool/test-run/pull/222
```
Unverified

59dca189
test: fix mistake in replication/suite.ini · 6b98017e
Alexander V. Tikhonov authored 4 years ago
```
Removed dust line from merge.
```
6b98017e

Enable test reruns on failed fragiled tests · 74328386

Alexander V. Tikhonov authored 4 years ago

In test-run implemented the new format of the fragile lists based on
JSON format set as fragile option in 'suite.ini' files per each suite:

   fragile = {
        "retries": 10,
        "tests": {
            "bitset.test.lua": {
                "issues": [ "gh-4095" ],
                "checksums": [ "050af3a99561a724013995668a4bc71c", "f34be60193cfe9221d3fe50df657e9d3" ]
            }
        }}

Added ability to check results file checksum on tests fail and
compare with the checksums of the known issues mentioned in the
fragile list.

Also added ability to set 'retries' option, which sets the number
of accepted reruns of the tests failed from 'fragile' list that
have checksums on its fails.

Closes #5050

74328386

test: flaky replication/anon.test.lua test · bb856247

Alexander V. Tikhonov authored 4 years ago

Found flaky issues multi running replication/anon.test.lua test
on the single worker:

 [007] --- replication/anon.result	Fri Jun  5 09:02:25 2020
 [007] +++ replication/anon.reject	Mon Jun  8 01:19:37 2020
 [007] @@ -55,7 +55,7 @@
 [007]
 [007]  box.info.status
 [007]   | ---
 [007] - | - running
 [007] + | - orphan
 [007]   | ...
 [007]  box.info.id
 [007]   | ---

 [094] --- replication/anon.result       Sat Jun 20 06:02:43 2020
 [094] +++ replication/anon.reject       Tue Jun 23 19:35:28 2020
 [094] @@ -154,7 +154,7 @@
 [094]  -- Test box.info.replication_anon.
 [094]  box.info.replication_anon
 [094]   | ---
 [094] - | - count: 1
 [094] + | - count: 2
 [094]   | ...
 [094]  #box.info.replication_anon()
 [094]   | ---
 [094]

It happend because replications may stay active from the previous
runs on the common tarantool instance at the test-run worker. To
avoid of it added restarting of the tarantool instance at the very
start of the test.

Closes #5058

bb856247

gitlab-ci: set opensuse jobs to test group · f56c8c4d

Alexander V. Tikhonov authored 4 years ago

Set opensuse jobs to test group to be sure that it will be run with
artifacts collecting and without gitlab-ci jobs extra parallization.

f56c8c4d

gitlab-ci: save failed test results artifacts · 814d3e27

Alexander V. Tikhonov authored 4 years ago

Added artifacts saver to all gitlab-ci jobs with testing.

Gitlab-ci jobs saves its results files in the following paths:

  1. base jobs for testing different features:
    - test/var/artifacts

  2. OSX jobs:
    - ${OSX_VARDIR}/artifacts

  3. pack/deploy jobs:
    - build/usr/src/*/tarantool-*/test/var/artifacts

  4. VBOX jobs (freebsd_12) on virtual host:
    - ~/tarantool/test/var/artifacts

In gitlab-ci configuration added 'after_script' section with script
which collects from different test places 'artifacts' directories
created by test-run tool. It saves 'artifacts' directories as root
path in artifacts packages. User will be able to download these
packages using gitlab-ci GUI either API.

Additionally added OSX_VARDIR environment variable to be able to
setup common path for artifacts and OSX shell scripts options.

  OSX_VARDIR: /tmp/tnt

Part of #5050

814d3e27

gitignore: ignore directories made on running jepsen tests · 25fc1c06

Sergey Bronnikov authored 4 years ago

On running Jepsen tests created directory with Terraform state and directory
with Jepsen tests source code in a build directory. Everything is ok on using
out of source build in a separate directory, but with building in a project
root directory these directories appears in `git status` output. This patch add
ignores for these directories.

25fc1c06

cmake: move jepsen targets under option WITH_JEPSEN · a36749de

Sergey Bronnikov authored 4 years ago

For running Jepsen tests we need to checkout external repository with tests
source code on a build stage. This behaviour brokes a Tarantool build under
Gentoo. Option WITH_JEPSEN enables targets only when they needed.

Closes #5325

a36749de

Sep 24, 2020

test: update test-run · 43482eed

Alexander Turenko authored 4 years ago

Retry a failed test when it is marked as fragile (and several other
conditions are met, see below).

The test-run already allows to set a list of fragile tests. They are run
one-by-one after all parallel ones in order to eliminate possible
resource starvation and fit timings to ones when the tests pass. See
[1].

In practice this approach does not help much against our problem with
flaky tests. We decided to retry failed tests, when they are known as
flagile. See [2].

The core idea is to split responsibility: known flaky fails will not
deflect attention of a developer, but each fragile test will be marked
explicitly, trackerized and will be analyzed by the quality assurance
team.

The default behaviour is not changed: each test from the fragile list
will be run once after all parallel ones. But now it is possible to set
retries amount.

Beware: the implementation does not allow to just set retries count, it
also requires to provide an md5sum of a failed test output (so called
reject file). The idea here is to ensure that we retry the test only in
case of a known fail: not some other fail within the test.

This approach has the limitation: in case of fail a test may output an
information that varies from run to run or depend of a base directory.
We should always verify the output before put its checksum into the
configuration file.

Despite doubts regarding this approach, it looks simple and we decided
to try and revisit it if there will be a need.

See configuration example in [3].

[1]: https://github.com/tarantool/test-run/issues/187
[2]: https://github.com/tarantool/test-run/issues/189
[3]: https://github.com/tarantool/test-run/pull/217

Part of #5050

Unverified

43482eed

Sep 23, 2020

txm: add a test · 0018398d
Aleksandr Lyapunov authored 4 years ago
```
Closes #4897
```
0018398d

test: move txn_proxy.lua to box/lua · 6f9f57fa

Aleksandr Lyapunov authored 4 years ago

txn_proxy is a special utility for transaction tests.
Formerly it was used only for vinyl tests and thus was placed in
vinyl folder.
Now the time has come to test memtx transactions and the utility
must be placed amongst other utils - in box/lua.

Needed for #4897

6f9f57fa

txm: use new tx manager in memtx · 7c2a0c18
Aleksandr Lyapunov authored 4 years ago
```
Use mvcc transaction engine in memtx if the engine is enabled.

Closes #4897
```
7c2a0c18

txm: clarify all fetched tuples · ee8ed065

Aleksandr Lyapunov authored 4 years ago

If a tuple fetched from an index is dirty - it must be clarified.
Let's fix all fetched from indexeds in that way.
Also fix a snapshot iterator - it must save a part of history
along with creating a read view in order to clean tuple during
iteration from another thread.

Part of #4897

ee8ed065

txm: introduce snapshot cleaner · ef47de0f

Aleksandr Lyapunov authored 4 years ago

When memtx snapshot iterator is created it could contain some
amount of dirty tuples that should be clarified before writing
to WAL file.
Implement special snapshot cleaner for this purpose.

Part of #4897

ef47de0f

txm: introduce memtx_story · c4205758

Aleksandr Lyapunov authored 4 years ago

Memtx story is a part of a history of a value in space.
It's a story about a tuple, from the point it was added to space
to the point when it was deleted from the space.
All stories are linked into a list of stories of the same key of
each index.

Part of #4897

c4205758

txm: introduce conflict tracker · 518fb9d8

Aleksandr Lyapunov authored 4 years ago

There are situations when we have to track that if some TX is
committed then some others must be aborted due to conflict.
The common case is that one r/w TX have read some value while the
second is about to overwrite the value; if the second is committed,
the first must be aborted.
Thus we have to store many-to-many TX relations between breaker
TX and victim TX.
The patch implements that.

Part of #4897

518fb9d8

txm: introduce memtx tx manager · bd1ed6dd

Aleksandr Lyapunov authored 4 years ago

Define memtx TX manager. It will store data for MVCC and conflict
manager. Define also 'memtx_use_mvcc_engine' in config that
enables that MVCC engine.

Part of #4897

bd1ed6dd

txm: introduce prepare sequence number · ef5c293c

Aleksandr Lyapunov authored 4 years ago

Prepare sequence number is a monotonically increasing ID that is
assigned to any prepared transaction. This ID is suitable for
serialization order resolution: the bigger is ID - the later the
transaction exists in the serialization order of transactions.

Note that id of transactions has quite different order in case
when transaction could yield - an younger (bigger id) transaction
can prepare/commit first (lower psn) while older tx sleeps in vain.

Also it should be mentioned that LSN has the same order as PSN,
but it has two general differences:
1. The LSN sequence has no holes, i.e. it is a natural number
sequence. This property is useless for transaction engine.
2. The LSN sequence is provided by WAL writer and thus LSN is not
available for TX thas was prepared and haven't been committed yet.
That feature makes psn more suitable sequence for transactions as
it allows to order prepared but not committed transaction and
allows, for example, to create a read view between prepared
transactions.

Part of #4897

ef5c293c

txm: save does_require_old_tuple flag in txn_stmt · 61bce613

Aleksandr Lyapunov authored 4 years ago

That flag is needed for transactional conflict manager - if any
other transaction commits a replacement of old_tuple before
current one and the flag is set - the current transaction will
be aborted.
For example REPLACE just replaces a key, no matter what tuple
lays in the index and thus does_require_old_tuple = false.
In contrast, UPDATE makes new tuple using old_tuple and thus
the statement will require old_tuple (does_require_old_tuple = true).
INSERT also does_require_old_tuple = true because it requires
old_tuple to be NULL.

Part of #4897

61bce613

txm: add TX status · 070a0cd4

Aleksandr Lyapunov authored 4 years ago

Transaction engine (see further commits) needs to distinguish and
maniputate transactions by their status. The status describe the
lifetime point of a transaction (inprogress, prepared, committed)
and its abilities (conflicted, read view).

Part of #4897
Part of #5108

070a0cd4

vinyl: rename tx_manager -> vy_tx_manager · 363169a2

Aleksandr Lyapunov authored 4 years ago

Apart from other vinyl objects that are named with "vy_" prefix,
its transaction manager (tx_manager) have no such prefix.
It should have in order to avoid conflicts with global tx manager.

Needed for #4897

363169a2

coio: fix cord leak on stop · 8477b6c0

Kirill Yukhin authored 4 years ago

cord_ptr variable is calloc()-ated in coio_on_start()
and is not free()-ed, which triggers ASAN. free() it
in coio_on_stop().

Closes #5308

8477b6c0

Sep 18, 2020

tests: fix replication/prune.test.lua hang · f7bcdf4c

Vladislav Shpilevoy authored 4 years ago

The test tried to start a replica whose box.cfg would hang, with
replication_connect_quorum = 0 to make it return immediately.

But the quorum parameter was added and removed during work on
44421317 ("replication: do not
register outgoing connections"). Instead, to start the replica
without blocking on box.cfg it is necessary to pass 'wait=False'
with the test_run:cmd('start server') command.

Closes #5311

f7bcdf4c

ci: integrate Jepsen tests to GitLab CI · a8e89b77

Sergey Bronnikov authored 4 years ago

added a new stage with a single job to run Jepsen tests.
Job is not started automatically by default, one need to
trigger it manually. Directory with test results
(logs, graphs, operations history) published to artifacts.

Closes #5277

a8e89b77

tools: add script to run Jepsen tests · 49bca315

Sergey Bronnikov authored 4 years ago

Main script that handle creation of set of virtual machines
using Terraform, setup for remote connection, running
Jepsen tests and teardown test environment.

Part of #5277

49bca315

cmake: add targets to run Jepsen tests · a42f8993

Sergey Bronnikov authored 4 years ago

Added targets 'make jepsen-single' and 'make jepsen-cluster'
to run Jepsen tests on a single Tarantool instance and
cluster of Tarantool instances.

Part of #5277

a42f8993

extra: add Terraform config files · 0b59bc93

Sergey Bronnikov authored 4 years ago

For testing Tarantool with Jepsen we use virtual machines as they provides
better resource isolation in comparison to containers. Jepsen tests may need a
single instance or a set of instances for testing cluster.  To setup virtual
machines we use Terraform [1]. Patch adds a set of configuration files for
Terraform that can create required number of virtual machines in MCS and output
IP addresses to stdout.

Terraform needs some parameters before run. They are:

- id, identificator of a test stand that should be specific for this run, id
also is a part of virtual machine name
- keypair_name, name of keypair used in a cloud, public SSH key of that key pair
will be placed to virtual machine
- instance_count, number of virtual machines in a test stand
- ssh_key, SSH private key, used to access to a virtual machine
- user_name
- password
- tenant_id
- user_domain_id

These parameters can be passed via enviroment variables with TF_VAR_ prefix
(like TF_VAR_id) or via command-line parameters.

To demonstrate full lifecycle of a test stand with Terraform one needs to
perform these commands:

terraform init extra/tf
terraform apply extra/tf
terraform output instance_names
terraform output instance_ips
terraform destroy extra/tf

1. https://www.terraform.io/

Part of #5277

0b59bc93

lua/pwd: workaround the systemd bug · ab3ff23f

Cyrill Gorcunov authored 4 years ago


There is a bug in systemd-209 source code: it returns
ENOENT when no more entries in a password database left.

Later the issue been fixed but we still meet the systems
where it hits. The problem affects getpwent/getgrent calls
only thus we can expect them to return the buggy error code
to skip.

Notes:

1) See systemd's commit where issue been fixed

   | commit 06202b9e659e5cc72aeecc5200155b7c012fccbc
   | Author: Yu Watanabe <watanabe.yu+github@gmail.com>
   | Date:   Sun Jul 15 23:00:00 2018 +0900
   |
   |     nss: do not modify errno when NSS_STATUS_NOTFOUND or NSS_STATUS_SUCCESS

2) Another option is to call getpwall on Tarantool startup
   unconditionally where we could simply ignore any errors. This
   is a very bad choise since traversig a password database might
   introduce significant lags if backend does some network activiy
   or have expired caches. Thus drop getpwall() unconditional call
   run it iif a user does an explicit request.

Fixes #5034

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

ab3ff23f