Commits · 4a50e1c498c1e76fbfef4ef81110a6bb4fac9ab0 · core / tarantool

Dec 23, 2020

box: remove unnecessary rights from peristent functions · 4a50e1c4

Mergen Imeev authored 4 years ago

After this patch, the persistent functions "box.schema.user.info" and
"LUA" will have the same rights as the user who executed them.

The problem was that setuid was unnecessarily set. Because of this,
these functions had the same rights as the user who created them.
However, they must have the same rights as the user who used them.

Fixes tarantool/security#1

4a50e1c4

lua: avoid panic if HOOK_GC is not an active hook · 95aa7d20

Sergey Kaplun authored 4 years ago


Platform panic occurs when fiber.yield() is used within any active
(i.e. being executed now) hook.

It is a regression caused by 96dbc49d
('lua: prohibit fiber yield when GC hook is active').

This patch fixes false positive panic in cases when VM is not running
a GC hook.

Relates to #4518
Closes #5649

Reported-by: Michael Filonenko <filonenko.mikhail@gmail.com>

95aa7d20

gitlab-ci: add packaging for Fedora 32 · 1507c47f
Alexander V. Tikhonov authored 4 years ago
```
Added packaging jobs for Fedora 32.

Closes #4966
```
1507c47f

test: filter replication/skip_conflict_row output · 2828f912

Alexander V. Tikhonov authored 4 years ago

Found that test replication/skip_conflict_row.test.lua fails with
output message in results file:

  [035] @@ -139,7 +139,19 @@
  [035]  -- applier is not in follow state
  [035]  test_run:wait_upstream(1, {status = 'stopped', message_re = "Duplicate key exists in unique index 'primary' in space 'test'"})
  [035]  ---
  [035] -- true
  [035] +- false
  [035] +- id: 1
  [035] +  uuid: f2084d3c-93f2-4267-925f-015df034d0a5
  [035] +  lsn: 553
  [035] +  upstream:
  [035] +    status: follow
  [035] +    idle: 0.0024020448327065
  [035] +    peer: unix/:/builds/4BUsapPU/0/tarantool/tarantool/test/var/035_replication/master.socket-iproto
  [035] +    lag: 0.0046234130859375
  [035] +  downstream:
  [035] +    status: follow
  [035] +    idle: 0.086121961474419
  [035] +    vclock: {2: 3, 1: 553}
  [035]  ...
  [035]  --
  [035]  -- gh-3977: check that NOP is written instead of conflicting row.

Test could not be restarted with checksum because of changing values
like UUID on each fail. It happend because test-run uses internal
chain of functions wait_upstream() -> gen_box_info_replication_cond()
which returns instance information on its fails. To avoid of it this
output was redirected to log file instead of results file.

2828f912

github-ci: set same workflow name as job · fa85c848

Alexander V. Tikhonov authored 4 years ago

Due to current testing schema uses separate pipelines per each testing
job then workflow names should be the same as jobs to make it more
visible on github actions results page [1].

[1] - https://github.com/tarantool/tarantool/actions

fa85c848

Dec 22, 2020

Change the behavior of the option 'force_recovery' · 0be1243e

mechanik20051988 authored 4 years ago

There was an option 'force_recovery' that makes tarantool
to ignore some problems during xlog recovery. This patch
change  this option behavior and makes tarantool to ignore
some errors during snapshot recovery just like during
xlog recovery.
Error types which can be ignored:
 - snapshot is someway truncated,
   but after necessary system spaces
 - snapshot has some garbage after it declared length
 - single tuple within snapshot has broken checksum
   and may be skipped without consequences (in this case
   we ignore all row with this tuple)

@TarantoolBot document
Title: Change 'force_recovery' option behavior
Change 'force_recovery' option behavior to allow
tarantool loading from broken snapshot

Closes #5422

0be1243e

github-ci: add skip-duplicate-actions module · 141802b6

Alexander V. Tikhonov authored 4 years ago

Found that jobs on push and pull_request filters run duplicating each
other [1][2]. To avoid of it found additional module [3]. Used entire
jobs skip on duplicated jobs either previously run jobs in queue that
were already updated [4].

[1] - https://github.community/t/duplicate-checks-on-push-and-pull-request-simultaneous-event/18012
[2] - https://github.community/t/how-to-trigger-an-action-on-push-or-pull-request-but-not-both/16662
[3] - https://github.com/fkirc/skip-duplicate-actions#concurrent_skipping
[4] - https://github.com/fkirc/skip-duplicate-actions#option-1-skip-entire-jobs

141802b6

github-ci: implement coverity check · 9209e8b2

Alexander V. Tikhonov authored 4 years ago

Added standalone job with coverity check as described at [1]. This
job uploads results to coverity.com host to 'tarantool' project when
COVERITY_TOKEN environment is enabled. Main coverity functionality
added at .travis.mk make file as standalone targets:

  'test_coverity_debian_no_deps' - used in github-ci actions
  'coverity_debian' - additional target with needed tools check

This job configured by cron scheduler on each Saturday 04:00 am.

Closes #5600

[1] - https://scan.coverity.com/download?tab=cxx

9209e8b2

github-ci: switch coverage saving from travis-ci · 7aa0b018

Alexander V. Tikhonov authored 4 years ago

Moved coverage saving to coveralls.io repository from travis-ci to
github-ci. Completely removed travis-ci from commit criteria.

Part of #5294

7aa0b018

github-ci: implement OSX commit testing · 70f2bd5f

Alexander V. Tikhonov authored 4 years ago

Implemented github-ci action workflow OSX jobs on commits:
 - OSX 10.15
 - OSX 11.0

Part of #5294

70f2bd5f

github-ci: initiate commit testing on github-ci · b5e545d0

Alexander V. Tikhonov authored 4 years ago

Implemented github-ci action workflow on commits.
Added group of CI jobs:

  1) on Debian 9 ("Stretch"):
    - luacheck
    - release
    - debug_coverage
    - release_clang
    - release_lto

  2) on Debian 10 ("Buster")
    - release_lto_clang11
    - release_asan_clang11

Part of #5294

b5e545d0

github-ci: update docker images creation routine · fb224554

Alexander V. Tikhonov authored 4 years ago

Due to all the activities moving from Gitlab-CI to Github-CI Actions,
then docker images creation routine updated with the new images naming
and containers registry:

  GITLAB_REGISTRY?=registry.gitlab.com

changed to

  DOCKER_REGISTRY?=docker.io

Part of #5294

fb224554

test: add test filter for vinyl tests · 566b1af7

Alexander V. Tikhonov authored 4 years ago

Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Found issues:

 1) vinyl/deferred_delete.test.lua
    https://gitlab.com/tarantool/tarantool/-/jobs/913623306#L4552

  [036] 2020-12-15 19:10:01.996 [16602] coio vy_log.c:2202 E> failed to process vylog record: delete_slice{slice_id=744, }
  [036] 2020-12-15 19:10:01.996 [16602] main/103/vinyl vy_log.c:2068 E> ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Slice 744 deleted but not registered

 2) vinyl/gh-4864-stmt-alloc-fail-compact.test.lua
    https://gitlab.com/tarantool/tarantool/-/jobs/913810422#L4835

  [052] @@ -56,9 +56,11 @@
  [052]  --
  [052]  dump(true)
  [052]   | ---
  [052] - | ...
  [052] -dump()
  [052] - | ---
  [052] + | - error: 'Invalid VYLOG file: Slice 253 deleted but not registered'
  [052] + | ...

 3) vinyl/misc.test.lua
    https://gitlab.com/tarantool/tarantool/-/jobs/913727925#L5284

  [014] @@ -62,14 +62,14 @@
  [014]  ...
  [014]  box.snapshot()
  [014]  ---
  [014] -- ok
  [014] +- error: 'Invalid VYLOG file: Slice 1141 deleted but not registered'
  [014]  ...

 4) vinyl/quota.test.lua
    https://gitlab.com/tarantool/tarantool/-/jobs/914016074#L4595

  [025] 2020-12-15 22:56:50.192 [25576] coio vy_log.c:2202 E> failed to process vylog record: delete_slice{slice_id=522, }
  [025] 2020-12-15 22:56:50.193 [25576] main/103/vinyl vy_log.c:2068 E> ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Slice 522 deleted but not registered

 5) vinyl/update_optimize.test.lua
    https://gitlab.com/tarantool/tarantool/-/jobs/913728098#L2512

  [051] 2020-12-15 20:18:43.365 [17147] coio vy_log.c:2202 E> failed to process vylog record: delete_slice{slice_id=350, }
  [051] 2020-12-15 20:18:43.365 [17147] main/103/vinyl vy_log.c:2068 E> ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Slice 350 deleted but not registered

 6) vinyl/upsert.test.lua
    https://gitlab.com/tarantool/tarantool/-/jobs/913623510#L6132

  [008] @@ -441,7 +441,7 @@
  [008]  -- Mem has DELETE
  [008]  box.snapshot()
  [008]  ---
  [008] -- ok
  [008] +- error: 'Invalid VYLOG file: Slice 1411 deleted but not registered'
  [008]  ...

 7) vinyl/replica_quota.test.lua
    https://gitlab.com/tarantool/tarantool/-/jobs/914272656#L5739

  [023] @@ -41,7 +41,7 @@
  [023]  ...
  [023]  box.snapshot()
  [023]  ---
  [023] -- ok
  [023] +- error: 'Invalid VYLOG file: Slice 232 deleted but not registered'
  [023]  ...

 8) vinyl/ddl.test.lua
    https://gitlab.com/tarantool/tarantool/-/jobs/914309343#L4538

  [039] @@ -81,7 +81,7 @@
  [039]  ...
  [039]  box.snapshot()
  [039]  ---
  [039] -- ok
  [039] +- error: 'Invalid VYLOG file: Slice 206 deleted but not registered'
  [039]  ...

 9) vinyl/write_iterator.test.lua
    https://gitlab.com/tarantool/tarantool/-/jobs/920646297#L4694

  [059] @@ -80,7 +80,7 @@
  [059]  ...
  [059]  box.snapshot()
  [059]  ---
  [059] -- ok
  [059] +- error: 'Invalid VYLOG file: Slice 351 deleted but not registered'
  [059]  ...
  [059]  --
  [059]  -- Create a couple of tiny runs on disk, to increate the "number of runs"

 10) vinyl/gc.test.lua
     https://gitlab.com/tarantool/tarantool/-/jobs/920441445#L4691

  [050] @@ -59,6 +59,7 @@
  [050]  ...
  [050]  gc()
  [050]  ---
  [050] +- error: 'Invalid VYLOG file: Run 1176 deleted but not registered'
  [050]  ...
  [050]  files = ls_data()
  [050]  ---

 11) vinyl/gh-3395-read-prepared-uncommitted.test.lua
     https://gitlab.com/tarantool/tarantool/-/jobs/921944705#L4258

  [019] @@ -38,7 +38,7 @@
  [019]   | ...
  [019]  box.snapshot()
  [019]   | ---
  [019] - | - ok
  [019] + | - error: 'Invalid VYLOG file: Slice 634 deleted but not registered'
  [019]   | ...
  [019]
  [019]  c = fiber.channel(1)

566b1af7

test: setup workspace in tmpfs for OOS build · dfcefb63

Alexander V. Tikhonov authored 4 years ago

Found that running vinyl test suite in parallel using test-run vardir
on real hard drive may cause a lot of tests to fail. It happens because
of bottleneck with hard drive usage up to 100% which can be seen by any
of the tools like atop during vinyl tests run in parallel. To avoid of
it all heavy loaded testing processes should use tmpfs for vardir path.
Found that out-of-source build had to be updated to use tmpfs for it.
This patch mounts additional tmpfs mount point in OOS build docker run
process for test-run vardir. This mount point set using '--tmpfs' flag
because '--mount' does not support 'exec' option which is needed to be
able to execute commands in it [2][3].

Issues met on OOS before the patch, like described in #5504 and [1]:

  Test hung! Result content mismatch:
  --- vinyl/write_iterator.result	Fri Nov 20 14:48:24 2020
  +++ /rw_bins/test/var/081_vinyl/write_iterator.result	Fri Nov 20 15:01:54 2020
  @@ -200,831 +200,3 @@
   ---
   ...
   for i = 1, 100 do space:insert{i, ''..i} if i % 2 == 0 then box.snapshot() end end
  ----
  -...
  -space:delete{1}
  ----
  -...

Closes #5622
Part of #5504

[1] - https://gitlab.com/tarantool/tarantool/-/jobs/863266476#L5009
[2] - https://stackoverflow.com/questions/54729130/how-to-mount-docker-tmpfs-with-exec-rw-flags
[3] - https://github.com/moby/moby/issues/35890

dfcefb63

rfc: luajit metrics · 32358f4f
Sergey Kaplun authored 4 years ago
```
Part of #5187
```
32358f4f

Dec 21, 2020

raft: fix crash on death timeout decrease · 4042b5c0

Vladislav Shpilevoy authored 4 years ago

If death timeout was decreased during waiting for leader death or
discovery to a new value making the current death waiting end
immediately, it could crash in libev.

Because it would mean the remaining time until leader death became
negative. The negative timeout was passed to libev without any
checks, and there is an assertion, that a timeout should always
be >= 0.

This commit makes raft code covered almost on 100%, not counting
one 'unreachable()' place.

Closes #5303

4042b5c0

raft: fix crash on election timeout decrease · ad713399

Vladislav Shpilevoy authored 4 years ago

If election timeout was decreased during election to a new value
making the current election expired immediately, it could crash in
libev.

Because it would mean the remaining time until election end became
negative. The negative timeout was passed to libev without any
checks, and there is an assertion, that a timeout should always
be >= 0.

Part of #5303

ad713399

raft: fix ignorance of bad state receipt · 3fe5367c

Vladislav Shpilevoy authored 4 years ago

raft_process_msg() only validated that the state is specified. But
it didn't check if the state is inside of the allowed value range.

Such messages were considered valid, and even their other fields
were accepted. For instance, an invalid message could bump term.

It is safer to reject such messages.

Part of #5303

3fe5367c

raft: fix crash when received 0 term message · 2f5522dd

Vladislav Shpilevoy authored 4 years ago

Term in raft can never be 0. It starts from 1 and can only grow.
It was assumed it can't be received from other nodes because they
do the same. There was an assertion for that.

But in raft_msg, used as a transport unit between raft nodes, it
was still possible to send 0 term. It could happen as a result of
a bug, or if someone would try to mimic the protocol but made a
mistake.

That led to a crash in the assert in debug build.

Part of #5303

2f5522dd

test: introduce raft unit tests · e688280b

Vladislav Shpilevoy authored 4 years ago

Raft algorithm was tested only by functional Lua tests, as a part
of the Tarantool executable.

Functional testing of something like raft algorithm has drawbacks:

- Not possible or too hard to cover some cases without error
  injections and/or good stability. Such as invalid messages, or
  small time durations, or a complex case which needs multiple
  steps to be reproduced. For instance, start WAL write, receive a
  message, finish the WAL write, and see if an expected thing
  happens.

- Too long time to run when need to test not tiny timeouts. On the
  other hand, with tiny timeouts the test would become unstable.

- Poor reproducibility due to random used in raft, due to system
  load, and number of other tests running in parallel.

- Hard to debug, because for raft it is necessary to start one
  Tarantool process per raft instance.

- Involves too much other systems, such as threads, real journal,
  relays, appliers, and so on. They can affect the process on the
  whole and reduce reproducibility and debug simplicity even more.

Exactly the same problem existed for SWIM algorithm implemented as
a module 'swim'. In order to test it, swim was moved to a separate
library, refactored to be able to start many swims in the process
(no global variables), its functions talking to other systems
were virtualized (swim_ev, swim_transport), and substituted in the
unit tests with fake analogue systems.

In the unit tests these virtual functions were implemented
differently, but the core swim algorithm was left intact and
properly tested.

The same is done for raft. This patch implements a set of helper
functions and objects to unit test raft in raft_test_utils.c/.h
files, and uses it to cover almost all the raft algorithm code.

During implementation of the tests some bugs were found, which are
not covered here, but fixed and covered in next commits.

Part of #5303

e688280b

raft: introduce raft_ev · 1d01394e

Vladislav Shpilevoy authored 4 years ago

Raft_ev.h/.c encapsulates usage of libev, to a certain extent. All
libev functions are wrapped into raft_ev_* wrappers. Objects and
types are left intact.

This is done in order to be able to replace raft_ev.c in the
soon coming unit tests with fakeev functions. That will allow to
simulate time and catch all the raft events with 100%
reproducibility, and without actual waiting for the events in
real time.

The similar approach is used for swim unit tests.

Original raft core file is built into a new library 'raft_algo'.
It is done for the sake of code coverage in unit tests. A test
could be built by directly referencing raft.c in
unit/CMakeLists.txt, but it can't apply compilation options to it,
including gcov options.

When raft.c is built into a library right where it is defined, it
gets the gcov options, and the code covered by unit tests can be
properly shown.

Part of #5303

1d01394e

fakesys: introduce fakeev_timer_remaining() · bce18d3c

Vladislav Shpilevoy authored 4 years ago

ev_timer_remaining() in libev returns number of seconds until the
timer will expire. It is used in raft code.

Raft is going to be tested using fakesys, and it means it needs
fakeev analogue of ev_timer_remaining().

Part of #5303

bce18d3c

fakesys: fix ev_is_active not working on fake timers · bacf206c

Vladislav Shpilevoy authored 4 years ago

fakeev timers didn't set 'active' flag for ev_watcher objects.

Because of that if fakeev_timer_start() was called, the timer
wasn't visible as 'active' via libev API.

Fakeev is supposed to be a drop-in simulation of libev, so it
should "work" exactly the same.

bacf206c

memtx: fix a bug with insertion to space during recovery · 3bc4a156

mechanik20051988 authored 4 years ago

There was a problem whith on_schema_init trigger.
This trigger gives a way to create on_replace trigger
that will modify temporary or is_local spaces during recovery
from snapshot, but on that stage of recovery process
all space indexes are in special build mode when no check
for uniqueness are made. I added a new function
'is_recovery_finished' in box.ctl, which gives
user ability to check that we are in snapshot recovery stage
and can't insert/replace/update/upsert something. Also i added a
check for corresponding operations, now they are failed
if user tries to do them during snapshot recovery.

@TarantoolBot document
Title: Add 'is_recovery_finished' function
Add 'is_recovery_finished' function in box.ctl
to add user ability to check that we are
in snapshot recovery stage and can't
insert/replace/update/upsert something

Closes #5304

3bc4a156

Dec 20, 2020

swim: don't call swim_quit via FFI · bf9549e3

Vladislav Shpilevoy authored 4 years ago

swim_quit yields, because it joins the event handler fiber. Hence
it can't be called via FFI, where a yield might lead to platform
panic.

Closes #4570

bf9549e3

Dec 19, 2020

test: update test-run (--test-timeout) · 6e6e7b29

Alexander Turenko authored 4 years ago

Added `--test-timeout <seconds>` argument. 110 seconds by default. The
main idea is to don't reach --no-output-timeout when possible and so be
able to restart a failed test (according to fragile test checksums) and
store artifacts. PR #244.

Fixed various cases when test-run doesn't wait for a stopping instance,
doesn't try to stop it at all or issue just SIGTERM, without SIGKILL
after some delay. PR #257.

Unverified

6e6e7b29

Dec 16, 2020

sql: update temporary file name format · 427fdaa7

Leonid Vasiliev authored 4 years ago


The bug was consisted in fail when working with temporary files
created by VDBE to sort large result of a `SELECT` statement with
`ORDER BY`, `GROUP BY` clauses.

Whats happen (step by step):
- We have two instances on one node (sharded cluster).
- A query is created that executes on both.
- The first instance creates the name of the temporary file and
  checks a file with such name on existence.
- The second instance creates the name of the temporary file
  (the same as in  first instance) and checks a file with such name
  on existence.
- The first instance creates a file with the `SQL_OPEN_DELETEONCLOSE`
  flag.
- The second instance opens(try to open) the same file.
- The first instance closes (and removes) the temporary file.
- The second instance tries to work with the file and fails.

Why did it happen:
The temporary file name format has a random part, but the random
generator uses a fixed seed.

When it was decided to use a fixed seed:
32cb1ad2
("sql: drop useless code from os_unix.c")

How the patch fixes the problem:
The patch injects the PID in the temporary file name format.
The generated name is unique for a single process (due to a random part)
and unique between processes (due to the PID part).

Alternatives:
1) Use `O_TMPFILE` or `tmpfile()` (IMHO the best way to work with
  temporary files). In both cases, we need to update a significant
  part of the code, and some degradation can be added. It's hard to
  review.
2) Return a random seed for the generator. As far as I understand,
  we want to have good reproducible system behavior, in which case
  it's good to use a fixed seed.
3) Add reopening file with the flags `O_CREAT | O_EXCL` until we
  win the fight. Now we set such flags when opening a temporary file,
  but after that we try to open the file in `READONLY` mode and
  if ok - return the descriptor. This is strange logic for me and I
  don't want to add any aditional logic here. Also, such solution will
  add additional attempts to open the file.

So, it look like such minimal changes will work fine and are simple
to review.

Co-authored-by: Mergen <Imeev&lt;imeevma@gmail.com>

Fixes #5537

427fdaa7

luacheck: change global vars to local in sql-tap · b1a4fed6

Artem Starshov authored 4 years ago

Fixed luacheck warning 111 (setting non-standard global variable)
in test/sql-tap directory.
Enabled this directory for checking W111 in
config file(.luacheckrc).

Changed almost all variables in test/sql-tap from globals
to locals. In any cases, where variables need to be global,
added explicit _G. prefix (table of globals).

Fixes #5173
Part-of #5464

b1a4fed6

bitset: replace zero-length array with flexible-array member · 9b0278f4

Artem Starshov authored 4 years ago

Zero-lenght arrays are GNU C extension.
There's ISO C99 flexible array member, which
is preffered mechanism to declare variable-length types.

Flexible array member allows us to avoid applying sizeof
operator cause it's incomplete type, so it will be an error
at compile time. There're any moments else why it's better
way to implement such structures via FAM:
https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html

In this issue it fixed gcc 10 warning:
"warning: writing 1 byte into
a region of size 0 [-Wstringop-overflow=]"

Closes #4966
Closes #5564

9b0278f4

sql: fix build with GCC 10 · a65d9c50

Artem Starshov authored 4 years ago

GCC 10 produces the following error:
cc1: warning: function may return address of local variable [-Wreturn-local-addr]

Fix it.

Part-of #4966

a65d9c50

Dec 11, 2020

box: refactor tuple_field_raw to omit path checks · 28f3b2f1

Serge Petrenko authored 4 years ago

tuple_field_raw is an alias to tuple_field_raw_by_path with zero path.
This involves multiple path != NULL checks which aren't needed for tuple
field access by field number. The checks make this function rather slow
compared to its 1.10 counterpart (see results below).

In order to fix perf problems when JSON path indices aren't involved,
factor out the part of tuple_field_raw_by_path which is responsible for
direct field access by number and place it in tuple_field_raw.

This patch was tested by snapshot recovery part involving secondary
index building for a 1.5G snapshot with
one space and one secondary index over 4 integer and one string field.
Comparison table is below:

    Version    | time(seconds)  | Change relative to 1.10
---------------|----------------|------------------------
1.10           |      2:24      |           -/-
2.x(unpatched) |      3:03      |          + 27%
2.x (patched)  |      2:10      |          - 10%

Numbers below show cumulative time spent in tuple_compare_slowpath,
for 1.10 / 2.x(unpatched) / 2.x(patched) for 15, 19 and 14 second
profiles respectively: 13.9 / 17.8 / 12.5.

tuple_field_raw() isn't measured directly, since it's inlined, but all
its uses come from tuple_compare_slowpath.

As the results show, we manage to be even faster, than 1.10 used to be
in this test. This must be due to tuple comparison hints, which are
present only in 2.x.

Closes #4774

28f3b2f1

box: speed up tuple_field_map_create · 420bacb2

Serge Petrenko authored 4 years ago

Since the introduction of JSON path indices tuple_init_field_map, which
was quite a simple routine traversing a tuple and storing its field data
offsets in the field map, was renamed to tuple_field_map_create and
optimised for working with JSON path indices.

The main difference is that tuple format fields are now organised in a
tree rather than an array, and the tuple itself may have indexed fields,
which are not plain array members, but rather members of some sub-array
or map. This requires more complex iteration over tuple format fields
and additional tuple parsing.

All the changes were, however, unneeded for tuple formats not supporting
fields indexed by JSON paths.

Rework tuple_field_map_create so that it doesn't go through all the
unnecessary JSON path-related checks for simple cases and restore most
of the lost performance.

Below are some benchmark results for the same workload that pointed to
the degradation initially.
Snapshot recovery time on RelWithDebInfo build for a 1.5G snapshot
containing a single memtx space with one secondary index over 4 integer
and 1 string field:

        Version            | Time (s) | Difference relative to 1.10
---------------------------|----------|----------------------------
1.10 (the golden standard) |    28    |             -/-
2.x (degradation)          |    39    |            + 39%
2.x (patched)              |    31    |            + 11%

Profile shows that the main difference is in memtx_tuple_new due to
tuple_init_field_map/tuple_field_map_create performance difference.

Numbers below show cumulative time spent in tuple_init_field_map (1.10) /
tuple_field_map_create (unpatched) / tuple_field_map_create (patched).
2.44 s / 8.61 s / 3.19 s

More benchmark results can be seen at #4774

Part of #4774

420bacb2

sql: remove unecessary execute of space_cache_find() · bc16e5df

Mergen Imeev authored 4 years ago

Due to the fact that space_cache_find () is called unnecessarily, it is
possible to set diag "Space '0' does not exist", although in this case
it is not a wrong situation when the space id is 0.

Part of #5592

bc16e5df

core: introduce evenly distributed int64 random in range · 31bf9ef1

Ilya Kosarev authored 4 years ago

Tarantool codebase had at least two functions to generate random
integer in given range and both of them had problems at least with
integer overflow. This patch brings nice functions to generate random
int64_t in given range without overflow while preserving uniform random
generator distribution using unbiased bitmask with rejection method.
It is now possible to use xoshiro256++ PRNG or random bytes as a basis.
Most relevant replacements have been made. Needed tests are introduced.

Closes #5075

31bf9ef1

test: add filter to box/net.*after_gh-3164 test · e519954a

Alexander V. Tikhonov authored 4 years ago

Found issue:

  [079] @@ -115,5 +115,14 @@
  [079]  -- connection is deleted by 'collect'.
  [079]  weak.c
  [079]  ---
  [079] -- null
  [079] +- peer_uuid: 035d7b36-f205-45f4-9e16-e5b0b99a9b0b
  [079] +  opts:
  [079] +    reconnect_after: 0.1
  [079] +  host: unix/
  [079] +  schema_version: 78
  [079] +  protocol: Binary
  [079] +  state: error_reconnect
  [079] +  error: Connection refused
  [079] +  peer_version_id: 132864
  [079] +  port: /tmp/tnt/079_box/proxy.socket-iproto
  [079]  ...

Which could not be restarted with checksum because of changing UUID
value each run. To avoid of it added filter on 'peer_uuid:' output.

e519954a

test: add test filter for vinyl/errinj.test.lua · 718cba1b

Alexander V. Tikhonov authored 4 years ago

Added test-run filter on box.snapshot error message:

  'Invalid VYLOG file: Slice [0-9]+ deleted but not registered'

to avoid of printing changing data in results file to be able to use
its checksums in fragile list of test-run to rerun it as flaky issue.

Needed for #4346

718cba1b

Dec 10, 2020

test: fix replication/skip_conflict_row output · 7398d495

Alexander V. Tikhonov authored 4 years ago

Found that test replication/skip_conflict_row.test.lua fails with
output message in results file:

  [260] @@ -117,11 +117,23 @@
  [260]  -- lsn is not promoted
  [260]  lsn1 == box.info.vclock[1]
  [260]  ---
  [260] -- true
  [260] +- false
  [260]  ...
  [260]  test_run:wait_upstream(1, {status = 'stopped', message_re = "Duplicate key exists in unique index 'primary' in space 'test'"})
  [260]  ---
  [260] -- true
  [260] +- false
  [260] +- id: 1
  [260] +  uuid: bdbf6673-6ee4-47eb-a88d-81164f4e61c9

Test could not be restarted with checksum because of changing values
like UUID on each fail. It happend because test-run uses internal
chain of functions wait_upstream() -> gen_box_info_replication_cond()
which returns instance information on its fails. To avoid of it this
output was redirected to log file instead of results file.

7398d495

test: move error messages into logs · f79fbdd8

Alexander V. Tikhonov authored 4 years ago

Found that some tests on fail use box.info* calls to print information,
like:

  [024] --- replication/wal_rw_stress.result	Mon Nov 30 10:02:43 2020
  [024] +++ var/rejects/replication/wal_rw_stress.reject	Sun Dec  6 16:06:46 2020
  [024] @@ -77,7 +77,45 @@
  [024]              r.downstream.status ~= 'stopped')    \
  [024]  end) or box.info
  [024]  ---
  [024] -- true
  [024] +- version: 2.7.0-109-g0b3ad5d8a0
  [024] +  id: 2
  [024] +  ro: false
  [024] +  uuid: e0b8863f-7b50-4eb5-947f-77f92c491827

It denies test-run from rerunng these tests using checksums, because
of changing output on each fail, like 'version:' either 'uuid:' fields
values above. To avoid of it, these calls outputs should be redirected
to log files using log.error(). Also the same fix made for tests with
fio.listdir() and fio.stat() on errors.

f79fbdd8

gitlab-ci: remove Ubuntu 19.10 from packing · 1bf44ccb

Alexander V. Tikhonov authored 4 years ago

Found issue on Tarantool package build for Ubuntu 19.10 [1]:

E: The repository 'http://archive.ubuntu.com/ubuntu eoan Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu eoan-updates Release' does not have a Release file.
E: The repository 'http://archive.ubuntu.com/ubuntu eoan-backports Release' does not have a Release file.
E: The repository 'http://security.ubuntu.com/ubuntu eoan-security Release' does not have a Release file.

Also found that time life of Ubuntu 19.04 ended with support [2]
on 17 of July 2020.

So packaging jobs for this OS removed from Gitlab-CI.

[1] - https://gitlab.com/tarantool/tarantool/-/jobs/902339975#L172
[2] - https://fridge.ubuntu.com/2020/07/17/ubuntu-19-10-eoan-ermine-end-of-life-reached-on-july-17-2020/#:~:text=Ubuntu%20announced%20its%2019.10%20(Eoan,updated%20packages%20for%20Ubuntu%2019.10.

1bf44ccb

Dec 08, 2020

test: fix to resolve box/net_msg_max flaky · 9f3e1d0a

Sergey Ostanevich authored 4 years ago

A problem was gh-4834-netbox-fiber-cancel left a request hanging
so the net_msg_max fails in case it follows on the same runner.

9f3e1d0a