Commits · 726b96f09e2c4dd9d81eae72d05ae9b9f2387f61 · core / tarantool

Dec 30, 2020

base64: Properly ignore invalid characters · 726b96f0

Not all invalid characters were ignored by base64 decoder
causing data corruption and reads beyond decode table
(faults under ASAN).

Added corresponding check into base64 unit test.

Fixes: #5627

726b96f0

luajit: bump new version · 94203c14

Igor Munkin authored 4 years ago

* core: remove excess assertion inside memprof
* core: fix resources leak in memory profiler
* misc: fix build with disabled memory profiler

Follows up #5442

94203c14

build: don't re-export libcurl.so/dylib symbols · 47c19eeb

Alexander Turenko authored 4 years ago

Export libcurl's symbols only when they are provided by tarantool
itself: when the library is linked statically into the tarantool's
executable. There is no much sense to export the symbols when we link
against the library dynamically.

Regarding motivation of the change. Since 2.6.0-36-g29ec62891 ('Ensure
all curl symbols are exported') the curl_multi_poll() function is
exported from the tarantool executable. It leads to a failure in
Homebrew's build, because there we link (dynamically) with a system
libcurl. On Mac OS 10.15 it is libcurl 7.64.1, while the function
appears since libcurl 7.66.0. So a linker reports the undefined symbol:
`curl_multi_poll`.

Now the symbols are not exported at dynamic linking with libcurl, so the
linker is happy.

This commit relaxes bounds for dynamic linking, but an attempt to link
with libcurl older than 7.66.0 statically still leads to a linking
failure. The box-tap/gh-5223-curl-exports.test.lua test still fails when
tarantool is linked (dynamically) against an old libcurl.

It looks as the good compromise. When libcurl functionality is provided
by tarantool itself, *all* functions listed in the test are present
(otherwise a linker will complain). But tarantool does not enforce a
newer libcurl version, when it just *uses* this functionality and don't
provide it for modules and stored procedured. It is not tarantool's
responsibility in the case.

We possibly should skip the box-tap/gh-5223-curl-exports.test.lua test
when tarantool is built against libcurl dynamically or revisit the
described approach. I'll leave it as possible follow up activity.

Fixes #5542

47c19eeb

txm: fix tuple ownership strategy · ecc3f3d2

Aleksandr Lyapunov authored 4 years ago

Hotfix of 88b76800

Fix an obvious bug - tuple ref/unref manipulation must be done only
 when we handle the primary index. Even code comment states that.

Part of #5628

ecc3f3d2

Dec 29, 2020

github-ci: add --init option for docker containers · 18e24209

Alexander Turenko authored 4 years ago


Now we have a PID 1 zombie reaping problem, when zombie processes
launched in Docker aren't collected by init, how it would be done if we
launch it on a real host.

It is fixed by adding --init option, when a container is created or
started.

Co-authored-by: Artem Starshov <artemreyt@tarantool.org>

Follows up #4983

18e24209

crash: allow to build on non x86-64 machines · eaa61b5b

Cyrill Gorcunov authored 4 years ago


The general purpose registers were optional earlier
lets make them optional back allowing the code to
be compiled on non x86-64 machines.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Alexander Turenko <alexander.turenko@tarantool.org>

eaa61b5b

crash: extend report with instance data · 7cf4c487

Cyrill Gorcunov authored 4 years ago


Product team would prefer to have more data to
be included into a crash report.

So we add "instance" key with appropriate values
(just like regular feedback entry has). For example

| {
|   "crashdump": {
|     "version": "1",
|     "data": {
|       "uname": {
|         "sysname": "Linux",
|         "release": "5.9.14-100.fc32.x86_64",
|         "version": "#1 SMP Fri Dec 11 14:30:38 UTC 2020",
|         "machine": "x86_64"
|       },
|       "instance": {
|         "server_id": "336bfbfd-9e71-4728-91e3-ba84aec4d7ea",
|         "cluster_id": "176f3669-488f-46a5-a744-1be0b8a31029",
|         "uptime": "3"
|       },
|       "build": {
|         "version": "2.7.0-183-g02970b402",
|         "cmake_type": "Linux-x86_64-Debug"
|       },
|       "signal": {
|         "signo": 11,
|         "si_code": 0,
|         "si_addr": "0x3e800095fb9",
|         "backtrace": "#0  0x6317ab in crash_collect+bf...",
|         "timestamp": "2020-12-28 21:09:29 MSK"
|       }
|     }
|   }
| }

Closes #5668

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Alexander Turenko <alexander.turenko@tarantool.org>

7cf4c487

txm: change tuple ownership strategy · 88b76800

Aleksandr Lyapunov authored 4 years ago

Since a space holds pointers to tuples it must increment
reference counters of its tuples and must decrement counters
of tuples that are deleted from the space.

Memtx TX manager also holds references to processing tuples
in the same way.

Before this patch there was a logic: while a tuple is dirty
it belongs to TX manager and does not belong to the space. Only
when a tuple is cleared and it is still in space it will be
referenced by space.

That logic leads to crashes in some DDL requests since they
works with indexes directly. For example deleting an index
causes dereferencing of all its tuples - even dirty.

This patch changes the logic. Now all tuples that are physically
in the primary index of the space a referenced. Once removed from
primary index - the tuple is dereferenced. TX manager references
tuples like before - every holding tuple is referenced.

Part of #5628

88b76800

txm: free resource gracefully on server shutdown · a3741f6b
Aleksandr Lyapunov authored 4 years ago
```
Part of #5628
```
a3741f6b

txm: fix another simple bug in tx manager · de6a4849

Aleksandr Lyapunov authored 4 years ago

There was a typo in collection of read set - a dirty
tuple was added instead of clean.

Closes #5559

de6a4849

txm: fix a simple bug in tx manager · 122dc47f

Aleksandr Lyapunov authored 4 years ago

The problem happened when a tuple story was delete by two
statements, one committed and one not committed.

Part of #5628

122dc47f

memtx: change small allocator behavior · 4d175bff

mechanik20051988 authored 4 years ago

Previously, in small allocator, memory pools
were allocated at the request, which in the case
of a small slab_alloc_factor led to use
pools with incorrect sizes. This patch changed
small allocator behavior, now we allocate pools
on the stage of allocator creation. Also we use
special function to find appropriate pool, which
is faster, then previous version with rbtree.
This change fixes #5216.

Also moved a check, that the slab_alloc_factor is in
the range (1.0, 2.0] from small allocator to memtx_engine.
If factor is not in range change it to 1.0001 or 2.0 respectively

Closes #5216

4d175bff

small: bump new version · 43aceb45

Kirill Yukhin authored 4 years ago

* Fix Centos6 build failed
* Fix build error
* Remove Ubuntu 19.04 Disco
* small: implement new size class evaluation
* test: add small allocator performance test
* small: changed small allocator pool management

43aceb45

Dec 28, 2020

github-ci: fix git refs/heads paths mistakes · badb51b4

Alexander V. Tikhonov authored 4 years ago

Found that in previous commit to debian_11.yml file were made mistakes
in git refs/heads paths:

  e0a33ffe ("github-ci: build packages for debian-bullseye")

This patch fixes it.

Follows up #5638

badb51b4

github-ci: build packages for debian-bullseye · e0a33ffe

Alexander V. Tikhonov authored 4 years ago

Added package workflow to github actions. Set deployment of the
packages for the master branch and tags.

Closes #5638

e0a33ffe

build: allow packpack's build for Debian 11 (bullseye) · b2b1789a

Andrey Kulikov authored 4 years ago

Make build dependency on dh-systemd optional, as this package is
deprecated and removed from Debian 11. It part of debhelper (>=
9.20160709) now. No new dependencies, as tarantool already depends on
debhelper (> 9).

b2b1789a

iproto: fix comment and add assert on destruction · 4357f736

Ilya Kosarev authored 4 years ago

The comment in tx_process_destroy() about obuf destruction was
wrong. Memory for them is actually allocated from tx-belonging
slab cache and tx_process_destroy() obviously happens in tx, so
the comment is fixed to reflect the reality.
It is also implied that connection is in
IPROTO_CONNECTION_DESTROYED state in tx_process_destroy(). Now it
is verified with assert().

Part of #3776

4357f736

iproto: move msg fields initialization to iproto_msg_new() · 45499daa

Ilya Kosarev authored 4 years ago

msg->close_connection flag was only initialized in iproto_on_accept()
while other struct iproto_msg fields are being initialized in
iproto_msg_new(). It is potentially dangerous for new logic involving
msg->close_connection flag, so it is now moved to iproto_msg_new().

Part of #3776

45499daa

github-ci: check the other way of duplication jobs · 02970b40

Alexander V. Tikhonov authored 4 years ago


Initially decided to use external module:

  fkirc/skip-duplicate-actions

but after some additional investigation [1], decided to use simple check

  github.event.pull_request.head.repo.full_name != github.repository

instead of this module. This solution was found in github.com/h2o/h2o
repository [2].

Co-authored-by: Alexander Turenko <alexander.turenko@tarantool.org>

[1] - https://github.com/tarantool/tarantool/pull/5638#discussion_r549248841
[2] - https://github.com/h2o/h2o/commit/1c79d79e8b7686145975bc1cb04201e78b875924

02970b40

cfg: fix format in error message · 3333536d

Cyrill Gorcunov authored 4 years ago


Coverity pointed that quorum value is int64_t
and could be trimmed in error message. Fix it
using a proper formatting.

CID 1500388
Part-of #5446

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

3333536d

box: fix arguments passed to error message · 43442240

Serge Petrenko authored 4 years ago

box_wait_quorum() had an extraneous argument to error format string.
Remove it.
Found by Coverity.

CID 1500387
Follow-up #5435

43442240

build: embed LuaJIT memory profiler tools · 84e737bc

Sergey Kaplun authored 4 years ago


This patch embeds a parser for binary data dumped via the memory
profiler to Tarantool binary.

It is a set of the following Lua modules:
* utils/bufread.lua: read binary data from a file.
* utils/symtab.lua: symbol table decode functions
* memprof/parse.lua: decode the memory profiler event stream
* memprof/humanize.lua: display decoded data in human readable format
* memprof.lua: Lua script and module to display data

It launch with the following command:
$ tarantool -e 'require("memprof")(arg[1])' - filename.bin

Closes #5490

Reviewed-by: Igor Munkin <imun@tarantool.org>
Reviewed-by: Alexander V. Tikhonov <avtikhon@tarantool.org>
Signed-off-by: Igor Munkin <imun@tarantool.org>

84e737bc

luajit: bump new version · 434e1c5c

Igor Munkin authored 4 years ago

* tools: introduce a memory profile parser
* misc: add Lua API for memory profiler
* core: introduce memory profiler
* vm: introduce VM states for Lua and fast functions
* core: introduce write buffer module
* utils: introduce leb128 reader and writer

Relates to #5490
Closes #5442

434e1c5c

Dec 27, 2020

lua: fix running init lua script · a8f3a6cb

Artem Starshov authored 4 years ago

When tarantool launched with -e flag and in
script after there is an error, program hangs.
This happens because shed fiber launches separate
fiber for init user script and starts auxiliary
event loop. It's supposed that fiber will stop
this loop, but in case of error in script, fiber
tries to stop a loop when the last one isn't
started yet.

Added a flag, which will watch is loop started and
when fiber tries to call `ev_break()` we can be sure
that loop is running already.

Fixes #4983

a8f3a6cb

test: update test-run (pass timeouts via env) · a598f3f5

Alexander Turenko authored 4 years ago

The following variables now control timeouts (if corresponding command
line options are not passed): TEST_TIMEOUT, NO_OUTPUT_TIMEOUT,
REPLICATION_SYNC_TIMEOUT. See [1] for details.

I set the following values in the GitLab CI web interface:

| Variable                 | Value                                                   |
| ------------------------ | ------------------------------------------------------- |
| REPLICATION_SYNC_TIMEOUT | 300                                                     |
| TEST_TIMEOUT             | 310                                                     |
| NO_OUTPUT_TIMEOUT        | 320                                                     |
| PRESERVE_ENVVARS         | REPLICATION_SYNC_TIMEOUT,TEST_TIMEOUT,NO_OUTPUT_TIMEOUT |

See packpack change [2] and the commit 'ci: preserve certain environment
variables' regarding the PRESERVE_ENVVARS variable.

The reason, why we need to increase timeouts, comes from the following
facts:

- We use self-hosted runners to serve GitLab CI jobs. So, the machine
  resources are limited.
- We run testing with high level of parallelism to speed it up.
- We have a bunch of vinyl tests, which intensively use disk.

Disk accesses may be quite long within this infrastructure and the
obvious way to workaround the problem is to increase timeouts.

In the long term we should scale resources depending on the testing
needs. We'll try to use GitHub hosted runners or, if we'll reach some
limits, will setup GitHub runners on the Mail.Ru Cloud Solutions
infrastructure.

[1]: https://github.com/tarantool/test-run/issues/258
[2]: https://github.com/packpack/packpack/pull/135

a598f3f5

ci: preserve certain environment variables · d2f4bd68

Alexander Turenko authored 4 years ago

We want to increase testing timeouts for GitLab CI, where we use our own
runners and observe stalls and high disk pressure when several vinyl
tests are run in parallel. The idea is to set variables in GitLab CI web
interface and read them from test-run (see [1]).

First, we need to pass the variables into inner environments. GitLab CI
jobs run the testing using packpack, Docker or VirtualBox.

Packpack already preserves environment variables that are listed in the
PRESERVE_ENVVARS variable (see [2]).

This commit passes the variables that are listed in the PRESERVE_ENVVARS
variable into Docker and VirtualBox environment. So, all jobs will have
given variables in the enviroment. (Also dropped unused EXTRA_ENV
variable.)

The next commit will update the test-run submodule with support of
setting timeouts using environment variables.

[1]: https://github.com/tarantool/test-run/issues/258
[2]: https://github.com/packpack/packpack/pull/135

d2f4bd68

Dec 26, 2020

test: remove obvious part in rpm spec for Travis · d9c25b7a

Alexander V. Tikhonov authored 4 years ago

Removed obvious part in rpm spec for Travis-CI, due to it is no
longer in use.

---- Comments from @Totktonada ----

This change is a kind of revertion of the commit
d48406d5 ('test: add more tests to
packaging testing'), which did close #4599.

Here I described the story, why the change was made and why it is
reverted now.

We run testing during an RPM package build: it may catch some
distribution specific problem. We had reduced quantity of tests and
single thread tests execution to keep the testing stable and don't break
packages build and deployment due to known fragile tests.

Our CI had to use Travis CI, but we were in transition to GitLab CI to
use our own machines and don't reach Travis CI limit with five jobs
running in parallel.

We moved package builds to GitLab CI, but kept build+deploy jobs on
Travis CI for a while: GitLab CI was the new for us and we wanted to do
this transition smoothly for users of our APT / YUM repositories.

After enabling packages building on GitLab CI, we wanted to enable more
tests (to catch more problems) and wanted to enable parallel execution
of tests to speed up testing (and reduce amount of time a developer wait
for results).

We observed that if we'll enable more tests and parallel execution on
Travis CI, the testing results will become much less stable and so we'll
often have holes in deployed packages and red CI.

So, we decided to keep the old way testing on Travis CI and perform all
changes (more tests, more parallelism) only for GitLab CI.

We had a guess that we have enough machine resources and will able to do
some load balancing to overcome flaky fails on our own machines, but in
fact we picked up another approach later (see below).

That's all story behind #4599. What changes from those days?

We moved deployment jobs to GitLab CI[^1] and now we completely disabled
Travis CI (see #4410 and #4894). All jobs were moved either to GitLab CI
or right to GitHub Actions[^2].

We revisited our approach to improve stability of testing. Attemps to do
some load balancing together with attempts to keep not-so-large
execution time were failed. We should increase parallelism for speed,
but decrease it for stability at the same time. There is no optimal
balance.

So we decided to track flaky fails in the issue tracker and restart a
test after a known fail (see details in [1]). This way we don't need to
exclude tests and disable parallelism in order to get the stable and
fast testing[^3]. At least in theory. We're on the way to verify this
guess, but hopefully we'll stick with some adequate defaults that will
work everywhere[^4].

To sum up, there are several reasons to remove the old workaround, which
was implemented in the scope of #4599: no Travis CI, no foreseeable
reasons to exclude tests and reduce parallelism depending on a CI
provider.

Footnotes:

[^1]: This is simplification. Travis CI deployment jobs were not moved
      as is. GitLab CI jobs push packages to the new repositories
      backend (#3380). Travis CI jobs were disabled later (as part of
      #4947), after proofs that the new infrastructure works fine.
      However this is the another story.

[^2]: Now we're going to use GitHub Actions for all jobs, mainly because
      GitLab CI is poorly integrated with GitHub pull requests (when
      source branch is in a forked repository).

[^3]: Some work toward this direction still to be done:

      First, 'replication' test suite still excluded from the testing
      under RPM package build. It seems, we should just enable it back,
      it is tracked by #4798.

      Second, there is the issue [2] to get rid of ancient traces of the
      old attempts to keep the testing stable (from test-run side).
      It'll give us more parallelism in testing.

[^4]: Of course, we perform investigations of flaky fails and fix code
      and testing problems it feeds to us. However it appears to be the
      long activity.

References:

[1]: https://github.com/tarantool/test-run/pull/217
[2]: https://github.com/tarantool/test-run/issues/251

d9c25b7a

Dec 25, 2020

test: integrate with OSS Fuzz · 7680948f

Sergey Bronnikov authored 4 years ago

To run Tarantool fuzzers on OSS Fuzz infrastructure it is needed to pass
library $LIB_FUZZING_ENGINE to linker and use external CFLAGS and
CXXFLAGS. Full description how to integrate with OSS Fuzz is in [1] and
[2].

Patch to OSS Fuzz repository [2] is ready to merge.

We need to pass options with "-fsanitize=fuzzer" two times
(in cmake/profile.cmake and test/fuzz/CMakeLists.txt) because:

- cmake/profile.cmake is for project source files,
  -fsanitize=fuzzer-no-link option allows to instrument project source
  files for fuzzing, but LibFuzzer will not replace main() in these
  files.

- test/fuzz/CMakeLists.txt uses -fsanitize=fuzzer and not
  -fsanitize=fuzzer-no-link because we want to add automatically
  generated main() for each fuzzer.

1. https://google.github.io/oss-fuzz/getting-started/new-project-guide/
2. https://google.github.io/oss-fuzz/advanced-topics/ideal-integration/
3. https://github.com/google/oss-fuzz/pull/4723

Closes #1809

7680948f

travis: build tarantool with ENABLE_FUZZER · af126b90

Sergey Bronnikov authored 4 years ago

OSS Fuzz has a limited number of runs per day and now it is a 4 runs.
Option ENABLE_FUZZERS is enabled to make sure that building of fuzzers
is not broken.

Part of #1809

af126b90

test: add corpus to be used with fuzzers · 8c1bb620

Sergey Bronnikov authored 4 years ago

Fuzzing tools uses evolutionary algorithms. Supplying seed corpus
consisting of good sample inputs is one of the best ways to improve
fuzz target’s coverage. Patch adds a corpuses that can be used with
existed fuzzers. The name of each file in the corpus is the sha1
checksum of its contents.

Corpus with http headers was added from [1] and [2].

1. https://google.github.io/oss-fuzz/getting-started/new-project-guide/
2. https://en.wikipedia.org/wiki/List_of_HTTP_header_fields
3. https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers

The libFuzzer allow to minimize corpus with help of `-merge` flag:
when 1 is passed, any corpus inputs from the 2nd, 3rd etc. corpus
directories that trigger new code coverage will be merged into the
first corpus directory, when 0 is passed an existed corpus will be
minimized.

All provided corpuses in a patch were minimized.

Part of #1809

8c1bb620

test: add fuzzers and support for fuzzing testing · 2ad7caca

Sergey Bronnikov authored 4 years ago

There is a number of bugs related to parsing and encoding/decoding data.
Examples:

- csv: #2692, #4497, #2692
- uri: #585

One of the effective method to find such issues is a fuzzing testing.
Patch introduces a CMake flag to enable building fuzzers (ENABLE_FUZZER)
and add fuzzers based on LibFuzzer [1] to csv, http_parser and uri
modules. Note that fuzzers must return 0 exit code only, other exit
codes are not supported [2].

NOTE: LibFuzzer requires Clang compiler.

1. https://llvm.org/docs/LibFuzzer.html
2. http://llvm.org/docs/LibFuzzer.html#id22

How-To Use:

$ mkdir build && cd build
$ cmake -DENABLE_FUZZER=ON \
	-DENABLE_ASAN=ON \
	-DCMAKE_BUILD_TYPE=Debug \
	-DCMAKE_C_COMPILER="/usr/bin/clang" \
	-DCMAKE_CXX_COMPILER="/usr/bin/clang++" ..
$ make -j
$ ./test/fuzz/csv_fuzzer -workers=4 ../test/static/corpus/csv

Part of #1809

2ad7caca

luacheck: remove unneeded comment · e0417b4d

Sergey Bronnikov authored 4 years ago

serpent module has been dropped in commit
b53cb2ae
"console: drop unused serpent module", but comment that belong to module
was left in luacheck config.

e0417b4d

luacheck: fix warnings in test/app · 3f031411

Sergey Bronnikov authored 4 years ago


Closes #5454

Reviewed-by: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Reviewed-by: Igor Munkin <imun@tarantool.org>

Co-authored-by: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Co-authored-by: Igor Munkin <imun@tarantool.org>

3f031411

luacheck: fix warnings in test/app-tap · 09cf8f33

Sergey Bronnikov authored 4 years ago


Closes #5453

Reviewed-by: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Reviewed-by: Igor Munkin <imun@tarantool.org>

Co-authored-by: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Co-authored-by: Igor Munkin <imun@tarantool.org>

09cf8f33

test: fix box/error · 038c8abe
Serge Petrenko authored 4 years ago
```
Follow-up #5435
```
038c8abe

txn_limbo: ignore CONFIRM/ROLLBACK for a foreign master · cab99888

Serge Petrenko authored 4 years ago

We designed limbo so that it errors on receiving a CONFIRM or ROLLBACK
for other instance's data. Actually, this error is pointless, and even
harmful. Here's why:

Imagine you have 3 instances, 1, 2 and 3.
First 1 writes some synchronous transactions, but dies before writing CONFIRM.

Now 2 has to write CONFIRM instead of 1 to take limbo ownership.
From now on 2 is the limbo owner and in case of high enough load it constantly
has some data in the limbo.

Once 1 restarts, it first recovers its xlogs, and fills its limbo with
its own unconfirmed transactions from the previous run. Now replication
between 1, 2 and 3 is started and the first thing 1 sees is that 2 and 3
ack its old transactions. So 1 writes CONFIRM for its own transactions
even before the same CONFIRM written by 2 reaches it.
Once the CONFIRM written by 1 is replicated to 2 and 3 they error and
stop replication, since their limbo contains entries from 2, not from 1.
Actually, there's no need to error, since it's just a really old CONFIRM
which's already processed by both 2 and 3.

So, ignore CONFIRM/ROLLBACK when it references a wrong limbo owner.

The issue was discovered with test replication/election_qsync_stress.

Follow-up #5435

cab99888

test: fix replication/election_qsync_stress test · bf0fbf3a

Serge Petrenko authored 4 years ago

The test involves writing synchronous transactions on one node and
making other nodes confirm these transactions after its death.
In order for the test to work properly we need to make sure the old
node replicates all its transactions to peers before killing it.
Otherwise once the node is resurrected it'll have newer data, not
present on other nodes, which leads to their vclocks being incompatible
and noone becoming the new leader and hanging the test.

Follow-up #5435

bf0fbf3a

box: rework clear_synchro_queue to commit everything · 5c7dae44

Serge Petrenko authored 4 years ago

It is possible that a new leader (elected either via raft or manually or
via some user-written election algorithm) loses the data that the old
leader has successfully committed and confirmed.

Imagine such a situation: there are N nodes in a replicaset, the old
leader, denoted A, tries to apply some synchronous transaction. It is
written on the leader itself and N/2 other nodes, one of which is B.
The transaction has thus gathered quorum, N/2 + 1 acks.

Now A writes CONFIRM and commits the transaction, but dies before the
confirmation reaches any of its followers. B is elected the new leader and it
sees that the last A's transaction is present on N/2 nodes, so it doesn't have a
quorum (A was one of the N/2 + 1).

Current `clear_synchro_queue()` implementation makes B roll the transaction
back, leading to rollback after commit, which is unacceptable.

To fix the problem, make `clear_synchro_queue()` wait until all the rows from
the previous leader gather `replication_synchro_quorum` acks.

In case the quorum wasn't achieved during replication_synchro_timeout, rollback
nothing and wait for user's intervention.

Closes #5435

Co-developed-by: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>

5c7dae44

txn_limbo: introduce txn_limbo_last_synchro_entry method · 618e8269
Serge Petrenko authored 4 years ago
```
It'll be useful for box_clear_synchro_queue rework.

Prerequisite #5435
```
618e8269

replication: introduce on_ack trigger · 0941aaa1

Vladislav Shpilevoy authored 4 years ago

The trigger is fired every time any of the relays notifies tx of replica's
known vclock change.

The trigger will be used to collect synchronous transactions quorum for
old leader's transactions.

Part of #5435

0941aaa1