Commits · 35473d5d8dbf4e1337c7e6af26a3e05992b98247 · core / tarantool

Jun 16, 2020

sql: don't build sql as a separate library · 35473d5d

Vladislav Shpilevoy authored 4 years ago

SQL heavily depends on box, and box on SQL. So they can't be
separate libraries. The build started failing with undefined box
symbols in SQL, when code of the latter has slightly changed in
one of the recent commits.

The build failed only with UB sanitizer enabled, but
'VERBOSE=1 make' showed that both with UB and without UB the build
command was the same (not counting -fsanitize flags). So the
sanitizer has nothing to do with it.

The patch makes SQL sources being built as a part of box library.

Closes #5067

35473d5d

Jun 15, 2020

sql: display collation in metadata for scalar · ed935572

Roman Khabibov authored 4 years ago

Fix bug with the display of collation for scalar fields in
<SELECT> result, when sql_full_metadata is enabled.

Closes #4755

ed935572

test: fix flaky replication/status.test.lua · 8b72c893

Alexander V. Tikhonov authored 4 years ago

Found issue:

 [009] --- replication/status.result	Wed May  6 09:03:18 2020
 [009] +++ replication/status.reject	Tue May 12 15:55:09 2020
 [009] @@ -307,11 +307,12 @@
 [009]  ...
 [009]  r.upstream.status == "stopped"
 [009]  ---
 [009] -- true
 [009] +- false
 [009]  ...
 [009]  r.upstream.message:match('Duplicate') ~= nil
 [009]  ---
 [009] -- true
 [009] +- error: '[string "return r.upstream.message:match(''Duplicate'') ..."]:1: attempt
 [009] +    to index field ''message'' (a nil value)'
 [009]  ...
 [009]  test_run:cmd('switch default')
 [009]  ---

To check the upstream status and it's message need to wait until an
upstream appears. This prevents an attempt to index a nil value when
one of those functions are called before a record about a peer appears
in box.info.replication. It was observed on test:
  replication/show_error_on_disconnect
after commit
  c6bea65f ('replication: recfg with 0
quorum returns immediately').

Closes #4969

8b72c893

Correct cleanup gitlab-ci · 892a188b

Alexander V. Tikhonov authored 4 years ago

Found the issue on regular testing hosts: [1]

  Fetching changes...
  00:04
   Reinitialized existing Git repository in
     /home/gitlab-runner/builds/zzyC6hh5/0/tarantool/tarantool/.git/
   Checking out 8ff7f32c as ...
   warning: failed to remove CMakeFiles/Makefile.cmake

Found the job that saved the directories with root permissions: [2]

The issue appeared because the job that saved directories with root
permissions used the 'shell' runner to run docker container inside.
It caused the gitlab-runner to run the default workspace cleanup
outside the docker container. In opposite to it, when 'docker' runner
is used, the cleanup routine runs inside the docker container and no
fails ever exist, because the root permissions are used in the docker
container and this is the same root permissions for the host. As the
result using 'shell' runner, cleanup routine failed to remove files
created by root inside the docker container and which were shared to
global host with the same permissions, because gitlab-runner runs the
'shell' runner by regular 'gitlab-runner' user, but not by root.

To fix the issue need to run docker containers using the gitlab-runner
only in RO mode with Out-Of-Source builds in it. Either use the
'docker' runners when docker containers are needed. Anyway 'shell'
runner jobs with additional calls to docker containers can't be
control for the branches of developers, to avoid of it need to make
local cleanup routine instead of default to the working paths for each
job with 'shell' runners use.

Decided to setup gitlab-runner configuration as described in: [3]

Also got the issue with left data from previous builds at the
submodule pathes, like here: [4]

   Undefined symbols for architecture x86_64:
     "_u_isprint_66", referenced from:
         _yaml_emitter_is_printable in libyaml_static.a(emitter.c.o)
   ld: symbol(s) not found for architecture x86_64
   clang: error: linker command failed with exit code 1 (...)

Also got issues with left files from previously tested branches,
like here: [5]

 [087] small/rlist.test
 [087] TAP13 parse failed (Missing plan in the TAP source).
 [087]
 [087] No result file (small/rlist.result) found.
 [087] Run the test with --update-result option to write the new result file.
 [087] [ fail ]

 [087] small/static.test
 [087] TAP13 parse failed (Missing plan in the TAP source).
 [087]
 [087] No result file (small/static.result) found.
 [087] Run the test with --update-result option to write the new result file.
 [087] [ fail ]

To fix it was added the command to clean all available git submodules:
  git submodule foreach git clean -ffdx

Closes #5036

1. https://gitlab.com/tarantool/tarantool/-/jobs/577884238#L7
2. https://gitlab.com/tarantool/tarantool/-/jobs/577768553
3. https://docs.gitlab.com/ce/ci/yaml/README.html#git-clean-flags
4. https://gitlab.com/tarantool/tarantool/-/jobs/574199256#L3141
5. https://gitlab.com/tarantool/tarantool/-/jobs/590573606#L3718

892a188b

Jun 14, 2020

cmake: set CMP0037 policy to NEW · a753f258

Olga Arkhangelskaia authored 4 years ago

To fix deprecation warning, CMP0037 policy was changed to NEW for cmake
3.11 and above.

CMP0037 old behavior (cmake 2.8.12) allowed target names such as test.
In cmake 3.10 and below names test, help and etc. were reserved.

Starting from cmake 3.11 these names are only reserved when the
corresponding feature is enabled (e.g. by including the CTest or CPack
modules). Tarantool does not use CTest so the name test can be used.

Closes #3587

Unverified

a753f258

Jun 11, 2020

Set full testing for all branches · b59a089c
Alexander V. Tikhonov authored 4 years ago
```
Set full testing with deploy builds and tests for all branches.
```
b59a089c

Divide test box/net.box · d382184a

Alexander V. Tikhonov authored 5 years ago

box/net.box_bad_argument_gh-594.test.lua
box/net.box_call_blocks_gh-946.test.lua
box/net.box_collectgarbage_gh-3107.test.lua
box/net.box_connect_timeout_gh-2054.test.lua
box/net.box_connect_triggers.test.lua
box/net.box_console_connections_gh-2677.test.lua
box/net.box_count_inconsistent_gh-3262.test.lua
box/net.box_discard_gh-3107.test.lua
box/net.box_disconnect_gh-3859.test.lua
box/net.box_fiber-async_gh-3107.test.lua
box/net.box_field_names_gh-2978.test.lua
box/net.box_get_connection_object.test.lua
box/net.box_gibberish_gh-3900.test.lua
box/net.box_huge_data_gh-983.test.lua
box/net.box_incompatible_index-gh-1729.test.lua
box/net.box_incorrect_iterator_gh-841.test.lua
box/net.box_index_unique_flag_gh-4091.test.lua
box/net.box_iproto_hangs_gh-3464.test.lua
box/net.box_is_nullable_gh-3256.test.lua
box/net.box_leaks_gh-3629.test.lua
box/net.box_log_corrupted_rows_gh-4040.test.lua
box/net.box_long-poll_input_gh-3400.test.lua
box/net.box_methods_gh-3107.test.lua
box/net.box_msgpack_gh-2195.test.lua
box/net.box_on_schema_reload-gh-1904.test.lua
box/net.box_password_gh-1545.test.lua
box/net.box_permissions.test.lua
box/net.box_pseudo_objects_gh-2401.test.lua
box/net.box_raw_response_gh-3107.test.lua
box/net.box_readahead_gh-3958.test.lua
box/net.box_reconnect_after_gh-3164.test.lua
box/net.box_reconnect_after.test.lua
box/net.box_reload_schema_gh-636.test.lua
box/net.box_remote_method_gh-544.test.lua
box/net.box_roll_back_gh-822.test.lua
box/net.box_schema_change_gh-2666.test.lua
box/net.box_schema_change_gh-3107.test.lua
box/net.box_session_type_gh-2642.test.lua
box/net.box_space_format_gh-2402.test.lua
box/net.box_stack_diag_gh-1148.test.lua
box/net.box_timeout_gh-1533.test.lua
box/net.box_timeout-gh-3107.test.lua
box/net.box_upsert_gh-970.test.lua
box/net.box_uri_first_arg_gh-398.test.lua
box/net.box_wait_connected_gh-3856.test.lua

Also removed the skip condition file:
  box/net.box.skipcond
mentioned as temporary workaround for enabling testing
on FreeBSD within issue #4271, because no really issues
with the tests on it found.

Closes #4880

d382184a

Jun 10, 2020

test: update test-run · ef86e3c9

Alexander Turenko authored 4 years ago

Explicitly notify about a missing newline at end of a .result file in
GNU Diff like style:

 |  ---
 | -...
 | \ No newline
 | +...

Closes https://github.com/tarantool/test-run/issues/212

Unverified

ef86e3c9

vinyl: bump dump_generation in case scheduler doesn't catch up with DDL · 38ac17f3

Nikita Pettik authored 5 years ago

It may turn out that dump_generation does not catch up with current
generation and no other dump tasks are in progress. This may happen
dump process is throttled due to errors. In this case generation is
bumped but dump_generation is not (since dump is not completed). In
turn, throttling opens a window for DDL operations. For instance, index
dropping and creation of new one results in mentioned situation:

box.snapshot() -- fails for some reason; next attempt at dumping will be
               -- taken in one second.
s:drop() -- drop index to be dumped
s = box.schema.space.create('test', {engine = 'vinyl'})
-- create new one (its mem generation is greater than scheduler's one)
i = s:create_index('pk')

Closes #4821

38ac17f3

vinyl: rotate mem during index build on demand · c3561b13

Nikita Pettik authored 4 years ago

Meanwhile in 17f6af7d the similar problem has been fixed, still it may
appear that in-memory level of secondary index being constructed has
lagging memory generation (in other words, it's values is less than
the value of current global generation). Such situation can be achieved
if yield which is required to check uniqueness of value being inserted
is too long. In this time gap other space may trigger dump process
bumping global memory generation counter, but dump itself is still not
yet scheduled.

So to get rid of generations mismatch, let's rotate in-memory level
after yield on demand.

Note that test is not included into the patch due to its complexity
and will be added as a follow-up.

Closes #5042

c3561b13

build: don't start example instance in postinstall · 080beba0

Olga Arkhangelskaia authored 4 years ago


After tarantool installation on Debian/Ubuntu from repo, example
instance was automatically started on 3301 port. At the same time
example instance on RHEL/CentOS is started manually. Patch does the same
for Debian/Ubuntu.

Closes #4507

Reviewed-by: Igor Munkin <imun@tarantool.org>
Reviewed-by: Alexander Turenko <alexander.turenko@tarantool.org>

Unverified

080beba0

lua: remove excess Lua call from table encoding · 67ac8df8

Sergey Kaplun authored 4 years ago

For safe table encoding <lua_field_try_serialize> function is pushed
to Lua stack along with auxiliary lightuserdata and table object to be
encoded. Its further protected call catches Lua error if one is raised
while encoding. It is only necessary when the object to be serialized
has __serialize field in metatable and this field is a Lua function.

This change reduces GC usage since a Lua function object is not
created. Moreover the function serializing the given object is called
without excess protected frame and auxiliary status struct.

67ac8df8

static build: fix build on ubuntu · 2e8fa27a

HustonMmmavr authored 4 years ago

Fixed static build with '-DBUILD_STATIC=ON' option:

* Added cmake option CMAKE_DL_LIBS to icu library for
  test/unit tests binaries builds at file:
    cmake/FindICU.cmake
  due to fail:
    [ 84%] Linking CXX executable vy_point_lookup.test
    /usr/local/lib/libicuuc.a(putil.ao): In function `uprv_dl_open_62':
    putil.cpp:(.text+0x1a22): undefined reference to `dlopen'
    /usr/local/lib/libicuuc.a(putil.ao): In function `uprv_dlsym_func_62':
    putil.cpp:(.text+0x1a7d): undefined reference to `dlsym'
    /usr/local/lib/libicuuc.a(putil.ao): In function `uprv_dl_close_62':
    putil.cpp:(.text+0x1a61): undefined reference to `dlclose'
    collect2: error: ld returned 1 exit status

* Added cmake option CMAKE_DL_LIBS to gomp library for
  test/unit tests binaries builds at file:
    cmake/BuildMisc.cmake
  due to fail:
    [ 91%] Linking CXX executable bps_tree.test
    /usr/lib/gcc/x86_64-linux-gnu/7/libgomp.a(target.o): In function `gomp_target_init':
    (.text+0x8b): undefined reference to `dlopen'
    (.text+0xa2): undefined reference to `dlsym'
    (.text+0xd9): undefined reference to `dlclose'
    (.text+0x29b): undefined reference to `dlsym'
    (.text+0x2a8): undefined reference to `dlerror'
    (.text+0x2d0): undefined reference to `dlsym'
    (.text+0x2e9): undefined reference to `dlsym'
    (.text+0x300): undefined reference to `dlsym'
    (.text+0x317): undefined reference to `dlsym'
    (.text+0x330): undefined reference to `dlsym'
    /usr/lib/gcc/x86_64-linux-gnu/7/libgomp.a(target.o):(.text+0x34d): more undefined references to `dlsym' follow
    /usr/lib/gcc/x86_64-linux-gnu/7/libgomp.a(target.o): In function `gomp_target_init':
    (.text+0x9cc): undefined reference to `dlerror'
    collect2: error: ld returned 1 exit status

Close #5024

2e8fa27a

Jun 09, 2020

test: update test-run · e3cf64a6

Alexander Turenko authored 4 years ago

Introduce the new test-run.py --update-result option, which allows to
write a new result file (a test reference output) or re-write an
existing one.

Before this patch, test-run did not require the option to write a new
result file (when there is no exising one). Now 'fail' status will be
reported for such tests when the option is not passed.

Closes https://github.com/tarantool/test-run/issues/194

Unverified

e3cf64a6

error: use int64_t as reference counter · 2e3c81de

Dmitry Khominich authored 4 years ago

Before it was unsafe to use error.prev, box.error.last(), or any error
method that increments error reference counter. Any of them could throw a
Lua error in case of INT32_MAX overflow.

This patch uses int64_t as error reference counter making it impossible
to get counter overflow.

Closes #4902

2e3c81de

box/constraint: make files non executable · 4450419c

Cyrill Gorcunov authored 4 years ago


For some reason these files are sitting with
mask 0755. Set proper mode.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

4450419c

sql: fix mem_apply_type double type truncation · a33108ad

Vladislav Shpilevoy authored 4 years ago

mem_apply_type(), when tried to cast a double value to an integer,
used the expressions:

    int64_t i = (int64_t) d;
    uint64_t u = (uint64_t) d;

To obtain integer versions of the double value, cast them back to
double, and see if they are equal. Assuming that if they are, the
double can be safely cast to one of them.

But this is undefined behaviour. Double can't be cast to int64_t,
if it is > INT64_MAX or < INT64_MIN. And can't be cast to
uint64_t, if it is < 0 or > UINT64_MAX.

The patch adds explicit checks for these borders before doing the
cast.

Part of #4609

a33108ad

sql: fix usage of not initialized index_stat · 3328b841

Vladislav Shpilevoy authored 4 years ago

Query planner uses a temporary index definition object 'to
represent the primary key index'. Whatever real purpose of this
index_def is (query planner wasn't changed since SQLite merge,
and may be broken), its opts.stat field pointed at a partially
initialized index_stat structure. Which is supposed to be used by
the planner to make decisions such as search by which index would
be the optimal.

The patch initializes the statistics with 0.

Part of #4609

3328b841

salad: fix UB pointer arithmetics in bps_tree · e73f7da6

Aleksandr Lyapunov authored 4 years ago

There is some pointer arithmetics in bps_tree that calculates
intermediate pointers that points out of array bounds. Though they
are never dereferenced and only used for further caclulation of
correct pointers, it is still UB and must be fixed.

Part of #4609

e73f7da6

digest: eliminate UBs from guava() · 620de6d0

Vladislav Shpilevoy authored 4 years ago

Guava hash function follows the algorithm described in the paper
"A Fast, Minimal Memory, Consistent Hash Algorithm", John Lamping,
Eric Veach. But the implementation somewhy is ported from Java
instead of taking a ready to use C function right from the paper.

Java version, ported to C as is, leads to undefined behaviour:
- signed integer overflow;
- double value outside of integer range assigned to the integer.

Here is the full old function:

    static const int64_t K = 2862933555777941757;
    static const double  D = 0x1.0p31;

    static inline double lcg(int64_t *state)
    {
        return (double)((int32_t)(((uint64_t)*state >> 33) + 1)) / D;
    }

    int32_t
    guava(int64_t state, int32_t buckets)
    {
        int32_t candidate = 0;
        int32_t next;
        while (1) {
            state = K * state + 1;
            next = (int32_t)((candidate + 1) / lcg(&state));
            if (next >= 0 && next < buckets)
                candidate = next;
            else
                return candidate;
        }
    }

Signed integer overflow happened in this line:

    state = K * state + 1;

This UB is fixed by changing state type to uint64_t. This is ok
and does not change behaviour, because overflow for signed
integers in reality behaves the same as for unsigned integers,
in terms of binary representation. Change to uint64_t didn't lead
to change of any single line of the result assembly code.

Double -> int32_t truncation overflow happened in the next line:

    next = (int32_t)((candidate + 1) / lcg(&state));

Right expression can become something out of int32_t range. This
is UB, but in reality the double value on the right is truncated
to either INT32_MIN, if it is < INT32_MIN, or INT32_MAX, if it is
> INT32_MAX.

This is fixed by making 'next' and 'candidate' variables int64_t.

This fixed the UB, because the right expression can't become
< INT64_MIN and > INT64_MAX. Indeed, max possible value of 'next'
from the expression above is:

    (candidate + 1) / lcg(&state) < ((2^31 - 1) + 1) / 2^-31
        = 2^62 < INT64_MAX

Min possible value of 'next' is:

    (-2^31 + 1) / 2^-31 > -2^62 > INT64_MAX

'Candidate' is assumed to be inside int32_t range, because it is
initialized from the previous 'next' value. If it would be out of
int32_t, it would become either bigger than 'buckets', which is
int32_t, or < 0, and the cycle would stop on the previous
iteration.

The patch fixes the UBs, but guava() function is still broken.
Because it returns values different than the function in the
original paper from John Lamping and Eric Veach. Here is the
correct function:

    int32_t
    correct_guava(uint64_t key, int32_t num_buckets)
    {
        int64_t b = -1, j = 0;
        while (j < num_buckets) {
            b=j;
            key = key * 2862933555777941757ULL + 1;
            j = (b + 1) * ((double)(1LL << 31) / (double)((key >> 33) + 1));
        }
        return b;
    }

correct_guava(6356101026326471242, 813154869) = 362866355,
                            guava(<the same>) = 2.

correct_guava(112571758688054605, 1355446793) = 907865430,
                            guava(<the same>) = 907865431.

And there are more examples. It means, guava() returns something
with an unknown distrubution, consistency, and not described in
the paper. That is a subject for a separate patch to deprecate
this function and add a correct one.

Part of #4609

620de6d0

test: fix signed integer overflow in vclock test · 1cb078d1

Vladislav Shpilevoy authored 4 years ago

Vclock unit test had a test where all vclock components were close
to INT64_MAX. The sum was bigger than INT64_MAX, and it caused
overflow in vclock's signature.

The patch reduces order of the LSNs in this test so their sum does
not overflow anymore.

Part of #4609

1cb078d1

swim: fix zero division · 0e02bf5e

Vladislav Shpilevoy authored 4 years ago

If swim_cfg() fails during first configuration after the self
member is created, it is deleted before return. Deletion of the
member causes member update event addition to the dissemination
queue. The event TTL depends on log2(member count), where the
count is zero in the described situation. This leads to zero
division according to clang sanitizer.

The patch makes member deletion from the member table happen a
little later, so the member event registration can't ever see the
table empty.

Part of #4609

0e02bf5e

xrow: don't cast double to float unconditionally · 76787282

Vladislav Shpilevoy authored 4 years ago

Xrow update functions use function xrow_update_arith_make() to
execute arithmetic operation. In case any operand is a floating
point value, both are cast to double. Afterwards the result value
tries to shrink to float if possible, to save space.

The check whether the value fits float was done in a dumb but
simple way:

    (double)(float)value == value

This was done to see if the fractional part is not truncated. The
integral part was checked against FLT_MAX and -FLT_MAX. But the
latter check was done after the first check. That led to undefined
behaviour if the |value| is bigger than |FLT_MAX|. This patch
makes xrow update firstly check the integral bounds and then the
fractional part.

Alongside there is fixed a bug, when FLT_MAX value was considered
double.

Part of #4609

76787282

vinyl: fix 0 division in case of canceled dump · cc7da0e5

Vladislav Shpilevoy authored 4 years ago

It can happen, that vinyl index dump is scheduled, but the index
is dropped before the dump starts. In this case the dump task is
finished with 0 dump time. The time then is used in a log message
to obtain dump rate as byte count / dump time. Since dump time is
0, this is undefined behaviour. Moreover, it was not even a dump,
so nothing to log here.

Also dump time is used to update throttling parameters in
vy_regulator. Here if the dump took 0 time, there was no a dump,
so it is just ignored.

Part of #4609

cc7da0e5

test: avoid usleep() usage for error injections · 0f953ab3

Vladislav Shpilevoy authored 4 years ago

Some error injections use usleep() to put the current thread in
sleep. For example, to simulate, that WAL thread is slow.

A few of them allowed to pass custom number of microseconds to
usleep, in form:

    usleep(injection->dvalue * 1000000);

Assuming, that dvalue is a number of seconds. But usleep argument
is uint32_t, at least on Mac (it is useconds_t, whose size is 4).
It means, that too big dvalue easily overflows it.

The patch makes it use nanosleep(), in a new wrapper:
thread_sleep(). It takes a double value, and does not truncate it
to 4 bytes.

The overflow was the case for ERRINJ_VY_READ_PAGE_TIMEOUT = 9000
in test/vinyl/errinj_vylog.test.lua. And
ERRINJ_VY_RUN_WRITE_STMT_TIMEOUT = 9000 in
test/vinyl/errinj.test.lua.

Part of #4609

0f953ab3

util: introduce double_compare_nint64() · 1a08e786

Vladislav Shpilevoy authored 4 years ago

Utility module (util.h and util.c) offers a function
double_compare_uint64() to compare double and uint64_t in a
reliable way, without undefined behaviour, without losses, even if
values are about 2^53 - 2^64.

There was no a similar function to compare double and int64_t,
which is needed when the right value is negative. To workaround it
the right value (int64_t) was multiplied by -1 when it was
negative to be able to use it in double_compare_uint64(). This led
to undefined behaviour, when right value was INT64_MIN, because
expression (uint64_t)-value did not help. Firstly -value was
calculated, and then it was cast to uint64_t. The first step is
detected by the sanitizer as undefined behaviour.

Not counting, that it was slower than straightforward comparison
of negative int64_t and a double value. Because involved
additional negation.

Besides, there was an expression:

    assert((uint64_t)(double)rhs == rhs || rhs > (uint64_t)EXP2_53);

Rhs is a uint64_t. And there was hidden an undefined behaviour,
when rhs is UINT64_MAX. The problem is that UINT64_MAX is
2^64 - 1. And it can't be stored in a double value without
precision loss. It is rounded up to 2^64 = UINT64_MAX + 1. This
is what happens in "(double)rhs" expression. And when it is
cast back to uint64_t: "(uint64_t)(double)rhs", it explodes.

The patch fixes it by simply changing check order. If rhs is
bigger than 2^53 (below this border all the integers can be
represented), then the cast is not done.

Part of #4609

1a08e786

cmake: enable misc types of UB detection in clang · 325b678f

Vladislav Shpilevoy authored 4 years ago

Option ENABLE_UB_SANITIZER enables clang undefined behaviour
sanitizer. So far the only UB to detect was alignment violation.
This was the biggest problem found by the sanitizer. Now when it
is fixed, most of the other types of UB are also turned on to fix
them as well.

There is a few of exceptions - pointer type overflow, vptr check,
and all types of integer overflow and truncation.

Pointer type overflow detection is disabled because it is abused
in the source code a lot, by stailq data structure.

Vptr sanitation is a runtime check ensuring that a pointer at a
non-POD type really points at an object of this type, using RTTI.
The check false-positively fails in alter.cc when AlterSpaceOp
class objects are stored in an rlist, and the list is iterated
using rlist_foreach_entry(). In the cycle there is a condition:

    &item->member != head

In the end the 'item' points not at an AlterSpaceOp, but at the
rlist head - offsetof(typeof(item), member), at an rlist
structure. Despite 'item' is never dereferenced, clang anyway
generates vptr check here, which of course fails. Note,
'&item->member' does not dereference item. It is
item + offsetof(typeof(item), member). Just another address a few
bytes after item.

Integer overflow and truncation are disabled because SQL uses
int64_t variables as a container of values of range [INT64_MIN,
UINT64_MAX]. This works because there is a flag 'is_neg' near
each such value which tells how to interpret it - as negative
int64_t, or as positive uint64_t. As a result, some operations
lead to a false-positive overflow. For example, consider
expr_code_int() function. It essentially can do this:

    int64_t value;
    ((uint64_t *)&value) = 9223372036854775808;
    value = -value;

9223372036854775808 is -INT64_MIN. It can't be stored in int64_t.
But the thing is that (uint64_t)9223372036854775808 is stored
exactly like (int64_t)INT64_MIN, in binary form. So the expression
"value = -value" looks perfectly valid:
"value = -9223372036854775808", But in fact it is interpreted as
"value = -(-9223372036854775808)".

These integer overflow/truncation problems are going to be fixed
in a separate commit due to big amount of changes needed for that.

Part of #4609

325b678f

test: app-tap/logger -- test symbolic loglevels · d64de911
Cyrill Gorcunov authored 4 years ago
```
Part-of #689

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
```
d64de911

lua/cfg: drop redundant variable · ef8a449b

Cyrill Gorcunov authored 4 years ago


No need for variable v, we use module_cfg itself.

Part-of #689

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

ef8a449b

lua/cfg: drop unused log methods · 1a6ad79e

Cyrill Gorcunov authored 4 years ago


The dynamic configuration of logging level and
format now goes via log module. These functions
are no longed needed.

Part-of #689

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

1a6ad79e

small: bump version after .result file update · 87bb27b0
Vladislav Shpilevoy authored 4 years ago
```
lsregion.result was broken, and now is fixed in small's master.
```
87bb27b0

Jun 08, 2020

xrow: use unaligned store operation in xrow_to_iovec() · fe6b7d3c

Vladislav Shpilevoy authored 4 years ago

xrow_to_iovec() tried to save a uint32_t value by a not aligned
address. The patch makes it use a special operation for that
instead of regular assignment.

Part of #4609

fe6b7d3c

port: make port_c_entry not PACKED · e3a68709

Vladislav Shpilevoy authored 4 years ago

PACKED structures don't have padding between their members and
after the structure (needed to be able to store them in an array).

Port_c_entry was PACKED, since it does not have padding between
its members anyway, and the padding in the end was not needed,
because these objects are never stored in an array. As a result,
sizeof(port_c_entry) was not aligned too.

Appeared, that mempool, used to allocate port_c_entry objects,
can't work correctly, when object size is not aligned at least by
8 bytes. Because mempool does not do any alignment internally, and
uses the free objects as a temporary storage for some metadata,
requiring 8 byte alignment.

The patch removes PACKED attribute from port_c_entry, so now its
size is aligned by 8 bytes, and mempool works fine.

Part of #4609

e3a68709

tuple: use unaligned store-load for field map · 94345e4d

Vladislav Shpilevoy authored 4 years ago

A tuple can have a field map. It is an array of uint32_t values,
stored right after 'struct tuple' object in the same memory block.

'struct tuple' is 10 byte size, and is aligned by 4 bytes (even
though it is 'PACKED', so does not need an alignment). It means,
that field map is stored by an address like this: 4*N + 10.

This address is never aligned by 4, needed for uint32_t field map
array. So the array members should be accessed using operations
aware of unaligned nature of the addresses.

Unfortunately, it is impossible to make the field map aligned,
because that requires + 2 bytes for each tuple. It is unaffordable
luxury, since tuple count can be millions and even billions. Every
new byte may result in gigabytes of memory in a cluster.

The patch makes field map accessed using unaligned store-load
operations.

Part of #4609

94345e4d

vinyl: align statements and bps tree extents · bef66d48

Vladislav Shpilevoy authored 4 years ago

Vinyl tuples (vy_stmt) in 0 level of LSM tree are stored in
lsregion. They were allocated using lsregion_alloc(), which does
not align its results, and is good only for byte arrays.

As a result, vy_stmt object addresses in 0 LSM level were not
aligned. Unaligned memory access is slower, and may even crash on
some platforms.

Besides, even aligned allocations couldn't help upserts in 0 level
of the LSM tree, because upsert vy_stmt objects had 1 byte prefix
to count merged upserts stored in this statement. This 1 byte
prefix ruined all the alignment. Now the upsert counter is also
aligned, the same as vy_stmt. Note, it does not consume
significantly more memory, since is used only for vinyl and only
for upserts, stored in 0 level of the LSM tree.

The same about BPS tree extents. LSM 0 level is a BPS tree, whose
blocks are allocated on lsregion. The extents are used as pointer
arrays inside the tree, so they need alignof(void *) alignment.

The mentioned unaligned accesses were revealed by clang undefined
behaviour sanitizer, and are fixed by this patch.

Part of #4609

bef66d48

region: use aligned allocations where necessary · ea6b814e

Vladislav Shpilevoy authored 4 years ago

Region is used for temporary allocations, usually bound to a
transaction, but not always. Keyword - temporary. It is usually
much faster to allocate something on the region than on the heap,
and much much faster to free the region, since it frees data in
whole slabs. Lots of objects at once.

Region is used both for byte arrays (MessagePack, strings, blobs),
and for objects like standard types, structs. Almost always for
allocations was used region_alloc() function, which returns a not
aligned address. It can be anything, even not multiple of 2.

That led to alignment violation for standard types and structs.
Usually it is harmless in terms of errors, but can be slower than
aligned access, and on some platforms may even crash. Also the
crash can be forced using clang undefined behaviour sanitizer,
which revealed all the not aligned accesses fixed in this patch.

Part of #4609

ea6b814e

sql: make BtCursor's memory aligned · 294dc053

Vladislav Shpilevoy authored 4 years ago

Vdbe at runtime allocates VdbeCursor structure using
allocateCursor() function. Inside there is a pointer at BtCursor
structure. To make the allocation faster and improve cache
locality, both cursors are allocated in one memory block + some
extra memory for uint32_t array, where BtCursor followed
VdbeCursor and the array without any padding:

   VdbeCursor + uint32_t * N + BtCursor

The problem is that BtCursor needs 8 byte alignment. When it
followed VdbeCursor (aligned by 8) + some uint32_t values, its
actual alignment could become 4 bytes. That led to a crash when
alignment sanitizer is enabled in clang.

The patch makes BtCursor offset aligned by 8 bytes.

Part of #4609

294dc053

crc32: align memory access · abef6986

Vladislav Shpilevoy authored 4 years ago

There is some assembly working with a byte array like with an
array of unsigned long values. That is incorrect, because the byte
array may be not aligned by 'unsigned long'.

The patch makes crc32 calculate the hash on the prefix
byte-by-byte, then word-by-word on aligned addresses, and again
byte-by-byte on a tail which is less than word.

Assuming that the word-by-word part is the longest, this should
reduce number of memory/cache loads in ~x2 times. Because in case
of a not aligned word load it was necessary to load 2 words, and
then merge them into one. When addresses are aligned, this is only
1 load.

Part of #4609

abef6986

cmake: add option ENABLE_UB_SANITIZER · 366cb668

Vladislav Shpilevoy authored 4 years ago

Clang has a built-in sanitizer for undefined behaviour. Such as
wrong memory alignment, array boundaries violation, 0 division,
bool values with non standard content, etc.

The sanitizer emits runtime checks which lead to either crash, or
a trap, or a warning print, depending on what is chosen.

The patch makes it possible to turn the sanitizer on and catch
UBs. The only supported UB so far is alignment check. Other types
can be added gradually, along with fixing bugs which they find.

The UB sanitizer is activated for ASAN builds in CI.

Part of #4609

366cb668

cmake: ignore warnings on alignof() and offsetof() · 635b5ad9

Vladislav Shpilevoy authored 4 years ago

Warning about invalid offsetof() (used on non-POD types) was set
for g++, but wasn't for clang++.

Warning about invalid alignof() (when expression is passed to it
instead of a type) wasn't ignored, but is going to be very
useful in upcoming unaligned memory access patches. That allows
to write something like:

    struct some_long_type *object = region_aligned_alloc(
            region, size, alignof(*object));

This will work even if type of 'object' will change in future,
and so it is safer. And shorter.

Part of #4609

635b5ad9