Commits · 5c7dae44511cd076770acbc85fbf333a583bc429 · core / tarantool

Dec 25, 2020

box: rework clear_synchro_queue to commit everything · 5c7dae44

Serge Petrenko authored 4 years ago

It is possible that a new leader (elected either via raft or manually or
via some user-written election algorithm) loses the data that the old
leader has successfully committed and confirmed.

Imagine such a situation: there are N nodes in a replicaset, the old
leader, denoted A, tries to apply some synchronous transaction. It is
written on the leader itself and N/2 other nodes, one of which is B.
The transaction has thus gathered quorum, N/2 + 1 acks.

Now A writes CONFIRM and commits the transaction, but dies before the
confirmation reaches any of its followers. B is elected the new leader and it
sees that the last A's transaction is present on N/2 nodes, so it doesn't have a
quorum (A was one of the N/2 + 1).

Current `clear_synchro_queue()` implementation makes B roll the transaction
back, leading to rollback after commit, which is unacceptable.

To fix the problem, make `clear_synchro_queue()` wait until all the rows from
the previous leader gather `replication_synchro_quorum` acks.

In case the quorum wasn't achieved during replication_synchro_timeout, rollback
nothing and wait for user's intervention.

Closes #5435

Co-developed-by: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>

5c7dae44

txn_limbo: introduce txn_limbo_last_synchro_entry method · 618e8269
Serge Petrenko authored 4 years ago
```
It'll be useful for box_clear_synchro_queue rework.

Prerequisite #5435
```
618e8269

replication: introduce on_ack trigger · 0941aaa1

Vladislav Shpilevoy authored 4 years ago

The trigger is fired every time any of the relays notifies tx of replica's
known vclock change.

The trigger will be used to collect synchronous transactions quorum for
old leader's transactions.

Part of #5435

0941aaa1

box: add a single execution guard to clear_synchro_queue · 05f7ff7c

Serge Petrenko authored 4 years ago

Clear_synchro_queue isn't meant to be called multiple times on a single
instance.

Multiple simultaneous invocations of clear_synhcro_queue() shouldn't
hurt now, since clear_synchro_queue simply exits on an empty limbo, but
may be harmful in future, when clear_synchro_queue is reworked.

Prohibit such misuse by introducing an execution guard and raising an
error once duplicate invocation is detected.

Prerequisite #5435

05f7ff7c

test: remove dead code in Python tests end extra newlines · cdb5f603
Sergey Bronnikov authored 4 years ago
```
Closes #5538
```
cdb5f603

test: get rid of iteritems() · 6394a997

Sergey Bronnikov authored 4 years ago

For Python 3, PEP 3106 changed the design of the dict builtin and the
mapping API in general to replace the separate list based and iterator
based APIs in Python 2 with a merged, memory efficient set and multiset
view based API. This new style of dict iteration was also added to the
Python 2.7 dict type as a new set of iteration methods. PEP-0469 [1]
recommends to replace d.iteritems() to iter(d.items()) to make code
compatible with Python 3.

1. https://www.python.org/dev/peps/pep-0469/

Part of #5538

6394a997

test: make strings compatible with Python 3 · 5c24c5ae

Sergey Bronnikov authored 4 years ago

The largest change in Python 3 is the handling of strings.
In Python 2, the str type was used for two different
kinds of values - text and bytes, whereas in Python 3,
these are separate and incompatible types.
Patch converts strings to byte strings where it is required
to make tests compatible with Python 3.

Part of #5538

5c24c5ae

test: make dict.items() compatible with Python 3.x · e97dc044

Sergey Bronnikov authored 4 years ago

In Python 2.x calling items() makes a copy of the keys that you can
iterate over while modifying the dict. This doesn't work in Python 3.x
because items() returns an iterator instead of a list and Python 3 raise
an exception "dictionary changed size during iteration". To workaround
it one can use list to force a copy of the keys to be made.

Part of #5538

e97dc044

test: convert print to function and make quotes use consistent · a113e43c

Sergey Bronnikov authored 4 years ago

- convert print statement to function. In a Python 3 'print' becomes a
function, see [1]. Patch makes 'print' in a regression tests compatible with
Python 3.
- according to PEP8, mixing using double quotes and quotes in a project looks
inconsistent. Patch makes using quotes with strings consistent.
- use "format()" instead of "%" everywhere

1. https://docs.python.org/3/whatsnew/3.0.html#print-is-a-function

Part of #5538

a113e43c

feedback_daemon: add operation statistics reporting · 781ae4f4

Serge Petrenko authored 4 years ago

Report box.stat().*.total, box.stat.net().*.total and
box.stat.net().*.current via feedback daemon report.
Accompany this data with the time when report was generated so that it's
possible to calculate RPS from this data on the feedback server.

`box.stat().OP_NAME.total` reside in `feedback.stats.box.OP_NAME.total`, while
`box.stat.net().OP_NAME.total` reside in `feedback.stats.net.OP_NAME.total`
The time of report generation is located at `feedback.stats.time`

Closes #5589

781ae4f4

Dec 24, 2020

crash: report crash data to the feedback server · f132aa9b

Cyrill Gorcunov authored 4 years ago


We have a feedback server which gathers information about a running instance.
While general info is enough for now we may loose a precious information about
crashes (such as call backtrace which caused the issue, type of build and etc).

In the commit we add support of sending this kind of information to the feedback
server. Internally we gather the reason of failure, pack it into base64 form
and then run another Tarantool instance which sends it out.

A typical report might look like

 | {
 |   "crashdump": {
 |     "version": "1",
 |     "data": {
 |       "uname": {
 |         "sysname": "Linux",
 |         "release": "5.9.14-100.fc32.x86_64",
 |         "version": "#1 SMP Fri Dec 11 14:30:38 UTC 2020",
 |         "machine": "x86_64"
 |       },
 |       "build": {
 |         "version": "2.7.0-115-g360565efb",
 |         "cmake_type": "Linux-x86_64-Debug"
 |       },
 |       "signal": {
 |         "signo": 11,
 |         "si_code": 0,
 |         "si_addr": "0x3e800004838",
 |         "backtrace": "#0  0x630724 in crash_collect+bf\n...",
 |         "timestamp": "2020-12-23 14:42:10 MSK"
 |       }
 |     }
 |   }
 | }

There is no simple way to test this so I did it manually:
1) Run instance with

	box.cfg{log_level = 8, feedback_host="127.0.0.1:1500"}

2) Run listener shell as

	while true ; do nc -l -p 1500 -c 'echo -e "HTTP/1.1 200 OK\n\n $(date)"'; done

3) Send SIGSEGV

	kill -11 `pidof tarantool`

Once SIGSEGV is delivered the crashinfo data is generated and sent out. For
debug purpose this data is also printed to the terminal on debug log level.

Closes #5261

Co-developed-by: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

@TarantoolBot document
Title: Configuration update, allow to disable sending crash information

For better analysis of program crashes the information associated with
the crash such as

 - utsname (similar to `uname -a` output except the network name)
 - build information
 - reason for a crash
 - call backtrace

is sent to the feedback server. To disable it set `feedback_crashinfo`
to `false`.

f132aa9b

crash: move fatal signal handling in · a0a443bd

Cyrill Gorcunov authored 4 years ago


When SIGSEGV or SIGFPE reaches the tarantool we try to gather
all information related to the crash and print it out to the
console (well, stderr actually). Still there is a request
to not just show this info locally but send it out to the
feedback server.

Thus to keep gathering crash related information in one module,
we move fatal signal handling into the separate crash.c file.
This allows us to collect the data we need in one place and
reuse it when we need to send reports to stderr (and to the
feedback server, which will be implemented in next patch).

Part-of #5261

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

a0a443bd

backtrace: allow to specify destination buffer · e3503bc2

Cyrill Gorcunov authored 4 years ago


This will allow to reuse this routine in crash reports.

Part-of #5261

Acked-by: Serge Petrenko <sergepetrenko@tarantool.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

e3503bc2

util: introduce strlcpy helper · 125d444f

Cyrill Gorcunov authored 4 years ago


Very convenient to have this string extension.
We will use it in crash handling.

Acked-by: Serge Petrenko <sergepetrenko@tarantool.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

125d444f

lua/key_def: fix compare_with_key() part count check · 37b15af9
Sergey Nikiforov authored 4 years ago
```
Added corresponding test

Fixes: #5307
```
37b15af9

update_repo: add Fedora 32 · 78c1de32

Alexander V. Tikhonov authored 4 years ago

It was added Fedora 32 gitlab-ci packaging job in commit:
  507c47f7a829581cc53ba3c4bd6a5191d088cdf ("gitlab-ci: add packaging for Fedora 32")

but also it had to be enabled in update_repo tool to make able to save
packages in S3 buckets.

Follows up #4966

78c1de32

test: add replication/gh-5446-qsync-eval-quorum.test.lua · d5a5754a

Cyrill Gorcunov authored 4 years ago


Part-of #5446

Co-developed-by: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

d5a5754a

cfg: more precise check for replication_synchro_quorum value · 74439bcf

Cyrill Gorcunov authored 4 years ago


When we fetch replication_synchro_quorum value (either as
a plain integer or via formula evaluation) we trim the
number down to integer, which silently hides potential
overflow errors.

For example

 | box.cfg{replication_synchro_quorum='4294967297'}

which is 1 in terms of machine words. Lets use 8 bytes
values and trigger an error instead.

Part-of #5446

Reported-by: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

74439bcf

cfg: support symbolic evaluation of replication_synchro_quorum · 14fa5fd8

Cyrill Gorcunov authored 4 years ago


When synchronous replication is used we prefer a user to specify
a quorum number, ie the number of replicas where data must be
replicated before the master node continue accepting new
transactions.

This is not very convenient since a user may not know initially
how many replicas will be used. Moreover the number of replicas
may vary dynamically. For this sake we allow to specify the
number of quorum in a symbolic way.

For example

box.cfg {
	replication_synchro_quorum = "N/2+1",
}

where `N` is a number of registered replicas in a cluster.
Once new replica attached or old one detached the number
is renewed and propagated.

Internally on each replica_set_id() and replica_clear_id(),
ie at moment when replica get registered or unregistered,
we call box_update_replication_synchro_quorum() helper which
finds out if evaluation of replication_synchro_quorum is
needed and if so we calculate new replication_synchro_quorum
value based on number of currently registered replicas. Then
we notify dependent systems such as qsync and raft to update
their guts.

Note: we do *not* change the default settings for this option,
it remains 1 by default for now. Change the default option should
be done as a separate commit once we make sure that everything is
fine.

Closes #5446

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

@TarantoolBot document
Title: Support dynamic evaluation of synchronous replication quorum

Setting `replication_synchro_quorum` option to an explicit integer
value was introduced rather for simplicity sake mostly. For example
if the cluster's size is not a constant value and new replicas are
connected in dynamically then an administrator might need to increase
the option by hands or by some other external tool.

Instead one can use a dynamic evaluation of a quorum value via formal
representation using symbol `N` as a current number of registered replicas
in a cluster.

For example the canonical definition for a quorum (ie majority
of members in a set) of `N` replicas is `N/2+1`. For such configuration
define

```
box.cfg {replication_synchro_quorum = "N/2+1"}
```

The formal statement allows to provide a flexible configuration but keep
in mind that only canonical quorum (and bigger values, say `N` for all
replicas) guarantees data reliability and various weird forms such as
`N/3+1` while allowed may lead to unexpected results.

14fa5fd8

cfg: rework box_check_replication_synchro_quorum · e140e7cd

Cyrill Gorcunov authored 4 years ago


Currently the box_check_replication_synchro_quorum helper
test for "replication_synchro_quorum" value being valid
and returns the value itself to use later in code.

This is fine for regular numbers but since we're gonna
support formula evaluation the real value to use will
be dynamic and returning a number "to use" won't be
convenient.

Thus lets change the context: make
box_check_replication_synchro_quorum() to return 0|-1
for success|failure and when the real value is needed
we will fetch it explicitly via cfg_geti call.

To make this more explicit the real update of the
appropriate variable is done via
box_update_replication_synchro_quorum() helper.

Part-of #5446

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

e140e7cd

cfg: add cfg_isnumber helper · 8d090514

Cyrill Gorcunov authored 4 years ago


We will need it to figure out if parameter
is a numeric value when doing configuration
check.

Part-of #5446

Acked-by: Serge Petrenko <sergepetrenko@tarantool.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

8d090514

sql: do not reset region on select · d0d668fa

Mergen Imeev authored 4 years ago

Prior to this patch, region on fiber was reset during select(), get(),
count(), max(), or min(). This would result in an error if one of these
operations was used in a user-defined function in SQL. After this patch,
these functions truncate region instead of resetting it.

Closes #5427

d0d668fa

Dec 23, 2020

sql: fix return value type of ifnull built-in · 2a3a0d1a

Nikita Pettik authored 4 years ago

Accidentally, in built-in declaration list it was specified that
ifnull() can return only integer values, meanwhile it should return
SCALAR: ifnull() returns first non-null argument so type of return value
depends on type of arguments. Let's fix this and set return type of
ifnull() SCALAR.

2a3a0d1a

box: remove unnecessary rights from peristent functions · 4a50e1c4

Mergen Imeev authored 4 years ago

After this patch, the persistent functions "box.schema.user.info" and
"LUA" will have the same rights as the user who executed them.

The problem was that setuid was unnecessarily set. Because of this,
these functions had the same rights as the user who created them.
However, they must have the same rights as the user who used them.

Fixes tarantool/security#1

4a50e1c4

lua: avoid panic if HOOK_GC is not an active hook · 95aa7d20

Sergey Kaplun authored 4 years ago


Platform panic occurs when fiber.yield() is used within any active
(i.e. being executed now) hook.

It is a regression caused by 96dbc49d
('lua: prohibit fiber yield when GC hook is active').

This patch fixes false positive panic in cases when VM is not running
a GC hook.

Relates to #4518
Closes #5649

Reported-by: Michael Filonenko <filonenko.mikhail@gmail.com>

95aa7d20

gitlab-ci: add packaging for Fedora 32 · 1507c47f
Alexander V. Tikhonov authored 4 years ago
```
Added packaging jobs for Fedora 32.

Closes #4966
```
1507c47f

test: filter replication/skip_conflict_row output · 2828f912

Alexander V. Tikhonov authored 4 years ago

Found that test replication/skip_conflict_row.test.lua fails with
output message in results file:

  [035] @@ -139,7 +139,19 @@
  [035]  -- applier is not in follow state
  [035]  test_run:wait_upstream(1, {status = 'stopped', message_re = "Duplicate key exists in unique index 'primary' in space 'test'"})
  [035]  ---
  [035] -- true
  [035] +- false
  [035] +- id: 1
  [035] +  uuid: f2084d3c-93f2-4267-925f-015df034d0a5
  [035] +  lsn: 553
  [035] +  upstream:
  [035] +    status: follow
  [035] +    idle: 0.0024020448327065
  [035] +    peer: unix/:/builds/4BUsapPU/0/tarantool/tarantool/test/var/035_replication/master.socket-iproto
  [035] +    lag: 0.0046234130859375
  [035] +  downstream:
  [035] +    status: follow
  [035] +    idle: 0.086121961474419
  [035] +    vclock: {2: 3, 1: 553}
  [035]  ...
  [035]  --
  [035]  -- gh-3977: check that NOP is written instead of conflicting row.

Test could not be restarted with checksum because of changing values
like UUID on each fail. It happend because test-run uses internal
chain of functions wait_upstream() -> gen_box_info_replication_cond()
which returns instance information on its fails. To avoid of it this
output was redirected to log file instead of results file.

2828f912

github-ci: set same workflow name as job · fa85c848

Alexander V. Tikhonov authored 4 years ago

Due to current testing schema uses separate pipelines per each testing
job then workflow names should be the same as jobs to make it more
visible on github actions results page [1].

[1] - https://github.com/tarantool/tarantool/actions

fa85c848

Dec 22, 2020

Change the behavior of the option 'force_recovery' · 0be1243e

mechanik20051988 authored 4 years ago

There was an option 'force_recovery' that makes tarantool
to ignore some problems during xlog recovery. This patch
change  this option behavior and makes tarantool to ignore
some errors during snapshot recovery just like during
xlog recovery.
Error types which can be ignored:
 - snapshot is someway truncated,
   but after necessary system spaces
 - snapshot has some garbage after it declared length
 - single tuple within snapshot has broken checksum
   and may be skipped without consequences (in this case
   we ignore all row with this tuple)

@TarantoolBot document
Title: Change 'force_recovery' option behavior
Change 'force_recovery' option behavior to allow
tarantool loading from broken snapshot

Closes #5422

0be1243e

github-ci: add skip-duplicate-actions module · 141802b6

Alexander V. Tikhonov authored 4 years ago

Found that jobs on push and pull_request filters run duplicating each
other [1][2]. To avoid of it found additional module [3]. Used entire
jobs skip on duplicated jobs either previously run jobs in queue that
were already updated [4].

[1] - https://github.community/t/duplicate-checks-on-push-and-pull-request-simultaneous-event/18012
[2] - https://github.community/t/how-to-trigger-an-action-on-push-or-pull-request-but-not-both/16662
[3] - https://github.com/fkirc/skip-duplicate-actions#concurrent_skipping
[4] - https://github.com/fkirc/skip-duplicate-actions#option-1-skip-entire-jobs

141802b6

github-ci: implement coverity check · 9209e8b2

Alexander V. Tikhonov authored 4 years ago

Added standalone job with coverity check as described at [1]. This
job uploads results to coverity.com host to 'tarantool' project when
COVERITY_TOKEN environment is enabled. Main coverity functionality
added at .travis.mk make file as standalone targets:

  'test_coverity_debian_no_deps' - used in github-ci actions
  'coverity_debian' - additional target with needed tools check

This job configured by cron scheduler on each Saturday 04:00 am.

Closes #5600

[1] - https://scan.coverity.com/download?tab=cxx

9209e8b2

github-ci: switch coverage saving from travis-ci · 7aa0b018

Alexander V. Tikhonov authored 4 years ago

Moved coverage saving to coveralls.io repository from travis-ci to
github-ci. Completely removed travis-ci from commit criteria.

Part of #5294

7aa0b018

github-ci: implement OSX commit testing · 70f2bd5f

Alexander V. Tikhonov authored 4 years ago

Implemented github-ci action workflow OSX jobs on commits:
 - OSX 10.15
 - OSX 11.0

Part of #5294

70f2bd5f

github-ci: initiate commit testing on github-ci · b5e545d0

Alexander V. Tikhonov authored 4 years ago

Implemented github-ci action workflow on commits.
Added group of CI jobs:

  1) on Debian 9 ("Stretch"):
    - luacheck
    - release
    - debug_coverage
    - release_clang
    - release_lto

  2) on Debian 10 ("Buster")
    - release_lto_clang11
    - release_asan_clang11

Part of #5294

b5e545d0

github-ci: update docker images creation routine · fb224554

Alexander V. Tikhonov authored 4 years ago

Due to all the activities moving from Gitlab-CI to Github-CI Actions,
then docker images creation routine updated with the new images naming
and containers registry:

  GITLAB_REGISTRY?=registry.gitlab.com

changed to

  DOCKER_REGISTRY?=docker.io

Part of #5294

fb224554

test: add test filter for vinyl tests · 566b1af7

Alexander V. Tikhonov authored 4 years ago

Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Found issues:

 1) vinyl/deferred_delete.test.lua
    https://gitlab.com/tarantool/tarantool/-/jobs/913623306#L4552

  [036] 2020-12-15 19:10:01.996 [16602] coio vy_log.c:2202 E> failed to process vylog record: delete_slice{slice_id=744, }
  [036] 2020-12-15 19:10:01.996 [16602] main/103/vinyl vy_log.c:2068 E> ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Slice 744 deleted but not registered

 2) vinyl/gh-4864-stmt-alloc-fail-compact.test.lua
    https://gitlab.com/tarantool/tarantool/-/jobs/913810422#L4835

  [052] @@ -56,9 +56,11 @@
  [052]  --
  [052]  dump(true)
  [052]   | ---
  [052] - | ...
  [052] -dump()
  [052] - | ---
  [052] + | - error: 'Invalid VYLOG file: Slice 253 deleted but not registered'
  [052] + | ...

 3) vinyl/misc.test.lua
    https://gitlab.com/tarantool/tarantool/-/jobs/913727925#L5284

  [014] @@ -62,14 +62,14 @@
  [014]  ...
  [014]  box.snapshot()
  [014]  ---
  [014] -- ok
  [014] +- error: 'Invalid VYLOG file: Slice 1141 deleted but not registered'
  [014]  ...

 4) vinyl/quota.test.lua
    https://gitlab.com/tarantool/tarantool/-/jobs/914016074#L4595

  [025] 2020-12-15 22:56:50.192 [25576] coio vy_log.c:2202 E> failed to process vylog record: delete_slice{slice_id=522, }
  [025] 2020-12-15 22:56:50.193 [25576] main/103/vinyl vy_log.c:2068 E> ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Slice 522 deleted but not registered

 5) vinyl/update_optimize.test.lua
    https://gitlab.com/tarantool/tarantool/-/jobs/913728098#L2512

  [051] 2020-12-15 20:18:43.365 [17147] coio vy_log.c:2202 E> failed to process vylog record: delete_slice{slice_id=350, }
  [051] 2020-12-15 20:18:43.365 [17147] main/103/vinyl vy_log.c:2068 E> ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Slice 350 deleted but not registered

 6) vinyl/upsert.test.lua
    https://gitlab.com/tarantool/tarantool/-/jobs/913623510#L6132

  [008] @@ -441,7 +441,7 @@
  [008]  -- Mem has DELETE
  [008]  box.snapshot()
  [008]  ---
  [008] -- ok
  [008] +- error: 'Invalid VYLOG file: Slice 1411 deleted but not registered'
  [008]  ...

 7) vinyl/replica_quota.test.lua
    https://gitlab.com/tarantool/tarantool/-/jobs/914272656#L5739

  [023] @@ -41,7 +41,7 @@
  [023]  ...
  [023]  box.snapshot()
  [023]  ---
  [023] -- ok
  [023] +- error: 'Invalid VYLOG file: Slice 232 deleted but not registered'
  [023]  ...

 8) vinyl/ddl.test.lua
    https://gitlab.com/tarantool/tarantool/-/jobs/914309343#L4538

  [039] @@ -81,7 +81,7 @@
  [039]  ...
  [039]  box.snapshot()
  [039]  ---
  [039] -- ok
  [039] +- error: 'Invalid VYLOG file: Slice 206 deleted but not registered'
  [039]  ...

 9) vinyl/write_iterator.test.lua
    https://gitlab.com/tarantool/tarantool/-/jobs/920646297#L4694

  [059] @@ -80,7 +80,7 @@
  [059]  ...
  [059]  box.snapshot()
  [059]  ---
  [059] -- ok
  [059] +- error: 'Invalid VYLOG file: Slice 351 deleted but not registered'
  [059]  ...
  [059]  --
  [059]  -- Create a couple of tiny runs on disk, to increate the "number of runs"

 10) vinyl/gc.test.lua
     https://gitlab.com/tarantool/tarantool/-/jobs/920441445#L4691

  [050] @@ -59,6 +59,7 @@
  [050]  ...
  [050]  gc()
  [050]  ---
  [050] +- error: 'Invalid VYLOG file: Run 1176 deleted but not registered'
  [050]  ...
  [050]  files = ls_data()
  [050]  ---

 11) vinyl/gh-3395-read-prepared-uncommitted.test.lua
     https://gitlab.com/tarantool/tarantool/-/jobs/921944705#L4258

  [019] @@ -38,7 +38,7 @@
  [019]   | ...
  [019]  box.snapshot()
  [019]   | ---
  [019] - | - ok
  [019] + | - error: 'Invalid VYLOG file: Slice 634 deleted but not registered'
  [019]   | ...
  [019]
  [019]  c = fiber.channel(1)

566b1af7

test: setup workspace in tmpfs for OOS build · dfcefb63

Alexander V. Tikhonov authored 4 years ago

Found that running vinyl test suite in parallel using test-run vardir
on real hard drive may cause a lot of tests to fail. It happens because
of bottleneck with hard drive usage up to 100% which can be seen by any
of the tools like atop during vinyl tests run in parallel. To avoid of
it all heavy loaded testing processes should use tmpfs for vardir path.
Found that out-of-source build had to be updated to use tmpfs for it.
This patch mounts additional tmpfs mount point in OOS build docker run
process for test-run vardir. This mount point set using '--tmpfs' flag
because '--mount' does not support 'exec' option which is needed to be
able to execute commands in it [2][3].

Issues met on OOS before the patch, like described in #5504 and [1]:

  Test hung! Result content mismatch:
  --- vinyl/write_iterator.result	Fri Nov 20 14:48:24 2020
  +++ /rw_bins/test/var/081_vinyl/write_iterator.result	Fri Nov 20 15:01:54 2020
  @@ -200,831 +200,3 @@
   ---
   ...
   for i = 1, 100 do space:insert{i, ''..i} if i % 2 == 0 then box.snapshot() end end
  ----
  -...
  -space:delete{1}
  ----
  -...

Closes #5622
Part of #5504

[1] - https://gitlab.com/tarantool/tarantool/-/jobs/863266476#L5009
[2] - https://stackoverflow.com/questions/54729130/how-to-mount-docker-tmpfs-with-exec-rw-flags
[3] - https://github.com/moby/moby/issues/35890

dfcefb63

rfc: luajit metrics · 32358f4f
Sergey Kaplun authored 4 years ago
```
Part of #5187
```
32358f4f

Dec 21, 2020

raft: fix crash on death timeout decrease · 4042b5c0

Vladislav Shpilevoy authored 4 years ago

If death timeout was decreased during waiting for leader death or
discovery to a new value making the current death waiting end
immediately, it could crash in libev.

Because it would mean the remaining time until leader death became
negative. The negative timeout was passed to libev without any
checks, and there is an assertion, that a timeout should always
be >= 0.

This commit makes raft code covered almost on 100%, not counting
one 'unreachable()' place.

Closes #5303

4042b5c0

raft: fix crash on election timeout decrease · ad713399

Vladislav Shpilevoy authored 4 years ago

If election timeout was decreased during election to a new value
making the current election expired immediately, it could crash in
libev.

Because it would mean the remaining time until election end became
negative. The negative timeout was passed to libev without any
checks, and there is an assertion, that a timeout should always
be >= 0.

Part of #5303

ad713399