Commits · 3af79e70b5e1e9b1d69b97f3031a299132a02d2f · core / tarantool

Jul 15, 2020

Fix luacheck warnings in src/lua/ · 3af79e70


Part of #4681

Reviewed-by: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Reviewed-by: Igor Munkin <imun@tarantool.org>

Co-authored-by: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Co-authored-by: Igor Munkin <imun@tarantool.org>

3af79e70

Fix luacheck warnings in extra/dist/ · 0e020dde

Sergey Bronnikov authored 4 years ago


Part of #4681

Reviewed-by: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Reviewed-by: Igor Munkin <imun@tarantool.org>

Co-authored-by: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Co-authored-by: Igor Munkin <imun@tarantool.org>

0e020dde

gitlab-ci: enable static analysis with luacheck · 5defb84f
Sergey Bronnikov authored 4 years ago
```
Part of #4681
```
5defb84f
build: enable 'make luacheck' target · 6b730e4f
Sergey Bronnikov authored 4 years ago
```
Part of #4681

Reviewed-by: Igor Munkin <imun@tarantool.org>
```
6b730e4f

Add initial luacheck config · bb549629

Sergey Bronnikov authored 4 years ago


Directories with Lua source code are now excluded because luacheck found
warnings and errors there. Some of these directories will be included as
these errors are fixed in subsequent commits.

Part of #4681

Reviewed-by: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Co-authored-by: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>

bb549629

Jul 14, 2020

test: fix for openSUSE tests build · f526debc

Alexander V. Tikhonov authored 4 years ago

Found that openSUSE toolchain adds '--no-undefined' linker flag leading
to fails while building tests. The changes suppress this flag since
dynamic libraries are loaded via Tarantool executable and use its
symbols. So it is completely OK to have undefined symbols at build time.

Needed for #4562

f526debc

test: replication/wal_rw_stress fix wait_cond · d3e2a2a4

Alexander V. Tikhonov authored 4 years ago

Found that on heavy loaded hosts the test tries to check replication
downstream status when downstream structure is not ready and it fails
with the error:

[017] --- replication/wal_rw_stress.result	Thu Jul  9 17:04:16 2020
[017] +++ replication/wal_rw_stress.reject	Fri May  8 08:25:15 2020
[017] @@ -75,7 +75,8 @@
[017]      return box.info.replication[1].downstream.status ~= 'stopped' \
[017]  end) or box.info
[017]  ---
[017] -- true
[017] +- error: '[string "return test_run:wait_cond(function()         ..."]:1: attempt to
[017] +    index field ''downstream'' (a nil value)'
[017]  ...
[017]  test_run:cmd("switch default")
[017]  ---
[017]

So the wait condition should start from the check of the downstream
structure availability.

Follows up #4977

d3e2a2a4

engine: validate key in space_before_replace · d82804ae

Ilya Kosarev authored 4 years ago

Since 9fba29ab (memtx: introduce tuple
compare hint) index_get uses key_hint to get a comparison hint for the
key. It requires key type to be validated. However in
space_before_replace index_get was used without key validation on some
execution paths. Now it is fixed and validation is being performed if
needed. Corresponding test case is introduced. Also
space_before_replace() is minor refactored to reduce conditions amount
on a hot path.

Closes #5093

d82804ae

update_repo: fix unbound variables · fa917f0b
Alexander V. Tikhonov authored 4 years ago
```
Fixed unbound variables at packaging tool update_repo.sh.

Closes #5114
```
fa917f0b

Jul 13, 2020

tuple: turn invalid bar update into nop · 9bb2d472

Vladislav Shpilevoy authored 4 years ago

There was a bug about invalid bar update operations. Bar update is
a single update operation with non empty isolated JSON path.
Isolated JSON path means that it does not interleave with any
other update operation in the same update/upsert() call.

In case bar update fails, there were 2 outcomes:

- it could crash in several places, when JSON path was invalid, or
  a type assumed in the JSON path didn't match the actual type
  (such as [...] applied to a scalar value);

- it could log error and all, but in the end still saved the
  operation result like if it was '='. For example, invalid '+'
  would behave just like '='.

The errors were happening for upsert() only, because this call
never treats client errors as errors. Instead, they are just
ignored. Bad JSON - ignore, integer overflow - ignore, and so on.
Are not ignored only client errors found during operation
decoding.

The expected behaviour is that such invalid operations are
skipped. Note, that it is possible, that some operations are
skipped and some are not.

The patch adds a 'finish' phase to bar field update. This is
based on the fact that bar is *always* created from a nop field
(not updated, part of the old tuple). So it is enough to leave
this field nop, if something goes wrong.

With all the other field types all is fine. They already do that
when apply an operation.

It is worth mentioning that even if during an attempt to apply an
invalid operation the update tree was changed anyhow, this is ok.
Structure changes don't affect the final result. For example, it
is ok if a rope field in an array will be split in 2. It is also
fine if a bar field is branched into route -> bar. This is fine
if a map field sequence will be split in 2 sequences.

Closes #5135

9bb2d472

feedback: collect db engines and index features · 41b07bd4

Ilya Konyukhov authored 4 years ago

This patch adds basic db features to feedback report.
It collects info about what engine and which types of
indexes are setup by the user.

Here is how a report may look like if all the features are
used:

```json
{
  "arch": "x64",
  "features": {
    "schema": {
      "local_spaces": 0,
      "functional_indices": 0,
      "functional_multikey_indices": 0,
      "hash_indices": 0,
      "rtree_indices": 0,
      "temporary_spaces": 0,
      "tree_indices": 0,
      "jsonpath_indices": 0,
      "jsonpath_multikey_indices": 0,
      "vinyl_spaces": 0,
      "bitset_indices": 0,
      "memtx_spaces": 0
    }
  },
  "server_id": "79047619-2c60-4195-af4f-23dee72285e6",
  "cgroup": "",
  "os": "OSX",
  "cluster_id": "57da2972-c33f-4b68-8e98-12cffb2fe16f",
  "tarantool_version": "2.5.0-147-g4a80bc779",
  "feedback_version": 2
}
```

Part of #4943

41b07bd4

feedback: determine runtime platform info · c9802bd8

Ilya Konyukhov authored 4 years ago

This patch detect which platform instance is running on.
It uses luajit `jit` module to get OS name and architecture.
Se more in [docs page](https://luajit.org/ext_jit.html).

Also it tries to figure out whether instance is running
inside cgroup environment or not. It's difficult know
accurately but one of the most stable and simple ways
at the same time is to look in
[`/proc/1/cgroup`](https://stackoverflow.com/a/20012536/1881632)
file. It checks for "lxc" and "docker" env.
If nothing found, it reports empty string.

Closes #3608
Related to #4943

c9802bd8

Fix wrong make_scoped_guard usage · 5eabbbd2

Aleksandr Lyapunov authored 4 years ago

The common pitfall of using a lambda is wrong type of capture -
by value instead of by reference. The simple example is:
  struct sequence_def *new_def = NULL;
  auto def_guard = make_scoped_guard([=] { free(new_def); });
  // initialize new_def
The problem is that the lambda captures pointer by value, that
is NULL and will remain NULL in the lambda while new_def is
successfully initialized in function scope.

The patch fixes the problem above and a couple of similar mistakes.

Fixes #5154

5eabbbd2

tuple: make fields nullable by default except array/map · 4cf94ef8

Ilya Kosarev authored 4 years ago

Since e1d3fe8a (tuple format: don't
allow null where array/map is expected) tuple fields are non-nullable
by default. It seems strange at least in case we have implicit fields
in front of explicit nullable field. Also it causes incorrect behaviour
in case of using explicitly nullable array/map fields for multikey
index.
Now fields are nullable by default except arrays & maps, as far
as their implicit nullability might break field accessors expectations,
provide confusing error messages and cause incorrect behaviour of
tuple_multikey_count(). In case explicitly nullable array/map fields
are being used for multikey index, clear error message is provided.

Closes #5027

4cf94ef8

engine: fix assert for multikey indexes · 0fff75cb

Ilya Kosarev authored 4 years ago

Since 4273ec52 (box: introduce JSON
Indexes) we can create multikey index using array which might be the
first tuple field. It technically breaks assertion which implies that
first tuple field can't have offset in the tuple field map. Now the
assert us updated correspondingly. According test case is added.

Closes #5132

0fff75cb

recovery: handle local sync txns during recovery · a9b99f0e

Vladislav Shpilevoy authored 4 years ago

Recovery uses txn_commit_async() so as not to block the recovery
process when a synchronous transaction is met. They are either
committed later when CONFIRM is read, or stay in the limbo after
recovery.

However txn_commit_async() assumed it is used for remote
transactions only, and had some assertions about that. One of them
crashed in case master restarted and had any synchronous
transaction in WAL.

The patch makes txn_commit_async() not assume anything about
transaction's origin.

Closes #5163

a9b99f0e

test: fix flaky qsync_basic.test.lua · 604eb737

Vladislav Shpilevoy authored 4 years ago

In one of the test cases 2 fibers were started making a
transaction. In the first fiber the transaction was rolled back,
and the second fiber was expected to do the same.

It did rollback too, but not always immediately after the first
one. Because the first fiber needed not just do rollback right
away, but write a ROLLBACK entry into WAL before applying the
rollback to all next transactions. This led to a yield, during
which it was possible to observe the second fiber not dead yet.

The patch makes the test explicitly wait for the fibers death.

Closes #5162

604eb737

Jul 12, 2020
- tx: introduce txn_stmt_destroy · 9438d074
  Aleksandr Lyapunov authored 4 years ago
  
  9438d074
Jul 11, 2020

qsync: txn_limbo_wait_complete -- use txn_limbo_abort · 8b85cc0b
Cyrill Gorcunov authored 4 years ago
```
Instead of open coding.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
```
8b85cc0b

qsync: txn_limbo_read_rollback -- use txn_limbo_abort · b58c0a36

Cyrill Gorcunov authored 4 years ago


Bsaically this is the same what txn_limbo_abort does.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

b58c0a36

qsync: txn_limbo_pop -- drop fake reference · b35166e6

Cyrill Gorcunov authored 4 years ago


The limbo variable is accessed unconditionally
thus no need for fake reference.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

b35166e6

qsync: txn_limbo_assign_local_lsn -- drop redundant declaration · e3d65d95

Cyrill Gorcunov authored 4 years ago


We use limbo variable accounting acks so no need for
formal read here.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

e3d65d95

qsync: add a comment about sync txn in journal allocation · 390916e3

Cyrill Gorcunov authored 4 years ago


Otherwise it is not clear why we should setup a flag here.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

390916e3

Jul 10, 2020

test: update suites 'fragile' lists · daae9586
Alexander V. Tikhonov authored 4 years ago
```
Syncronized suites 'fragile' lists with actual list of flaky tests.
```
daae9586

sql: use mem_mp_type() in sql_value_type() · 52c3af7e

Nikita Pettik authored 4 years ago

sql_value_type() and mem_mp_type() do the same thing: return messagePack
type corresponding to value stored in memory cell. However,
sql_value_type() operates on opaque API wrapper - sql_value*. To avoid
duplicating code let's invoke mem_mp_type() in sql_value_type().
At once, let's account that mp_type now can be not only _BIN, but also
_ARRAY and _MAP - this fact will be used when we introduce arrays and
maps in SQL.

52c3af7e

sql: introduce mem_mp_type() function · ac307cfb

Nikita Pettik authored 4 years ago

It takes memory cell object and returns corresponding to its value
messagePack type (i.e. it maps MEM_* types on MP_* types). It's an
internal analogue of sql_value_type(). In other words, it operates
directly on struct Mem *.

ac307cfb

replication: test sync txn local rollback is reversed · ff3123a6

Vladislav Shpilevoy authored 4 years ago

Transactions are rolled back in reversed order, always. Limbo
somewhy removed rolled back transactions from the beginning, not
from the end. The test ensures it is not so.

Closes #5147

ff3123a6

replication: add test for encoding CONFIRM/ROLLBACK on txn region · db1742aa

Vladislav Shpilevoy authored 4 years ago

In the original issue there were 2 bugs: one memory leak and one
memory corruption.

The leak was in txn_limbo_write_confirm_rollback(). This function
used fiber->gc region to encode CONFIRM/ROLLBACK, but never freed
it.

The corruption was in applier.cc in process_confirm_rollback().
CONFIRM/ROLLBACK were stored on the applier's ibuf. As a result,
if applier experiences relatively intensive load, the ibuf will
be quickly recycled, right during a WAL write of CONFIRM/ROLLBACK
stored on it.

DML requests would have the same problem, but they were copied in
txn_add_redo() inside of xrow_encode_dml() call.

The test checks whether CONFIRM/ROLLBACK are also copied.

Closes #5138

db1742aa

txn_limbo: introduce dynamic synchro config · 29766ce7

Vladislav Shpilevoy authored 4 years ago

Synchronous replication options - replication_synchro_quorum and
replication_synchro_timeout - were not updated for the existing
transactions on change. As a result, there could be weird
inconsistencies, when a new transaction could have required quorum
smaller than a previous transaction's, and could implicitly
confirm it. The same could be told about rollback on timeout - new
transactions could wake up earlier than older transactions.

This patch makes configuration dynamic. So if the mentioned
options are updated, they are applied to the existing transactions
too.

It opens wide administrative capabilities. For example, when
replica count becomes less than the quorum, an administrator can
lower the quorum dynamically, and it will be applied to all the
existing transactions.

Closes #5119

29766ce7

box.ctl: introduce clear_synchro_queue function · 9509a036

Serge Petrenko authored 4 years ago

Introduce a new function to box.ctl API: box.ctl.clear_synchro_queue()
The function performs some actions to make sure that after it's
executed, the txn_limbo is free of any transactions issued on a remote
instance.
In order to achieve this goal, the instance first waits for 2
replication_synchro_timeouts so that confirmations and rollbacks from
the remote instance reach it.

If the limbo remains non-empty, the instance starts figuring out which
transactions should be confirmed and which should be rolled back. In
order to do so the instance scans through vclocks of all the instances
that replicate from it and defines which old leader's lsn is the last
reached by replication_synchro_quorum of replicas.

Then the instance writes appropriate CONFIRM and ROLLBACK entries.
After these actions the limbo must be empty.

Closes #4849

9509a036

util: move cmp_i64 from xlog.c to util.h · c25d4cb9
Serge Petrenko authored 4 years ago
```
The comparator will be needed in other files too, e.g. box.cc

Prerequisite #4849
```
c25d4cb9

test: add test on local transactions wait synchronous · 4e0552a1

Vladislav Shpilevoy authored 4 years ago

Fully local transactions are expected to be blocked if there is
a synchronous transaction not finished.

Also there is a special case for when a transaction is not local,
but has a local row in the end, related to #4928.

4e0552a1

replication: add tests for sync replication with snapshots · 90e7a270
Sergey Bronnikov authored 4 years ago
```
Part of #5055
```
90e7a270
replication: add tests for sync replication with anon replica · 39cc2935
Sergey Bronnikov authored 4 years ago
```
Part of #5055
```
39cc2935
replication: add advanced tests for sync replication · 0fdd675a
Sergey Bronnikov authored 4 years ago
```
Part of #5055
```
0fdd675a

replication: add test for quorum 1 · 158c7404

Vladislav Shpilevoy authored 4 years ago

When synchro quorum is 1, the final commit and confirmation write
are done by the fiber created the transaction, right after WAL
write. This case got special handling in the previous patches,
and this commits adds a test for that.

Closes #5123

158c7404

replication: add test for async transactions block when not empty limbo · 7ea50cd9
Vladislav Shpilevoy authored 4 years ago
```
Follow-up #4845
```
7ea50cd9

replication: only send confirmed data during final join · 920efcb4

Serge Petrenko authored 4 years ago

Final join (or register) stage is needed to deliver the replica its
_cluster registration. Since this stage is followed by a snapshot on
replica, the data received during this stage must be confirmed.

Make master check that there are no rollbacks for the data to be sent
during final join and that all the data is confirmed before final join
starts.

Closes #5097

920efcb4

replication: delay initial join until confirmation · 41e979f0

Serge Petrenko authored 4 years ago

All the data that master sends during the join stage (both initial and
final) is embedded into the first snapshot created on replica, so this
data mustn't contain any unconfirmed or rolled back synchronous
transactions.

Make sure that master starts sending the initial data, which contains a
snapshot-like dump of all the spaces only after the latest synchronous
tx it has is confirmed. In case of rollback, the replica may retry
joining.

Part of #5097

41e979f0

txn_limbo: add diag_set in txn_limbo_wait_confirm · 9c88b6cd
Serge Petrenko authored 4 years ago
```
Add failure reason to txn_limbo_wait_confirm

Prerequisite #5097
```
9c88b6cd