Commits · 1d7285c423c94ac006eb6ed80dc97986d79025b5 · core / tarantool

May 27, 2019

Disable flaky http_client.test.lua · 1d7285c4
Konstantin Osipov authored 5 years ago
```
Issue pending, gh-4254.
```
1d7285c4

vinyl: fix deferred DELETE statement lost on commit · b54433d9

Even if a statement isn't marked as VY_STMT_DEFERRED_DELETE, e.g. it's
a REPLACE produced by an UPDATE request, it may overwrite a statement in
the transaction write set that is marked so, for instance:

  s = box.schema.space.create('test', {engine = 'vinyl'})
  pk = s:create_index('pk')
  sk = s:create_index('sk', {parts = {2, 'unsigned'}})

  s:insert{1, 1}

  box.begin()
  s:replace{1, 2}
  s:update(1, {{'=', 2, 3}})
  box.commit()

If we don't mark REPLACE{3,1} produced by the update operatoin with
VY_STMT_DEFERRED_DELETE flag, we will never generate a DELETE statement
for INSERT{1,1}. That is, we must inherit the flag from the overwritten
statement when we insert a new one into a write set.

Closes #4248

b54433d9

vinyl: don't produce deferred DELETE on commit if key isn't updated · e2f5e1bc

Vladimir Davydov authored 5 years ago

Consider the following example:

  s = box.schema.space.create('test', {engine = 'vinyl'})
  s:create_index('primary')
  s:create_index('secondary', {parts = {2, 'unsigned'}})

  s:insert{1, 1, 1}
  s:replace{1, 1, 2}

When REPLACE{1,1} is committed to the secondary index, the overwritten
tuple, i.e. INSERT{1,1}, is found in the primary index memory, and so
deferred DELETE{1,1} is generated right away and committed along with
REPLACE{1,1}. However, there's no need to commit anything to the
secondary index in this case, because its key isn't updated. Apart from
eating memory and loading disk, this also breaks index stats, as vy_tx
implementation doesn't expect two statements committed for the same key
in a single transaction.

Fix this by checking if there's a statement in the log for the deleted
key and if there's skipping them both as we do in the regular case, see
the comment in vy_tx_set.

Closes #3693

e2f5e1bc

vinyl: fix secondary index divergence on update · 69aee6fc

Vladimir Davydov authored 5 years ago

If an UPDATE request doesn't touch key parts of a secondary index, we
don't need to re-index it in the in-memory secondary index, as this
would only increase IO load. Historically, we use column mask set by the
UPDATE operation to skip secondary indexes that are not affected by the
operation on commit. However, there's a problem here: the column mask
isn't precise - it may have a bit set even if the corresponding column
value isn't changed by the update operation, e.g. consider {'+', 2, 0}.
Not taking this into account may result in appearance of phantom tuples
on disk as the write iterator assumes that statements that have no
effect aren't written to secondary indexes (this is needed to apply
INSERT+DELETE "annihilation" optimization). We fixed that by clearing
column mask bits in vy_tx_set in case we detect that the key isn't
changed, for more details see #3607 and commit e72867cb ("vinyl: fix
appearance of phantom tuple in secondary index after update"). It was
rather an ugly hack, but it worked.

However, it turned out that apart from looking hackish this code has
a nasty bug that may lead to tuples missing from secondary indexes.
Consider the following example:

  s = box.schema.space.create('test', {engine = 'vinyl'})
  s:create_index('pk')
  s:create_index('sk', {parts = {2, 'unsigned'}})
  s:insert{1, 1, 1}

  box.begin()
  s:update(1, {{'=', 2, 2}})
  s:update(1, {{'=', 3, 2}})
  box.commit()

The first update operation writes DELETE{1,1} and REPLACE{2,1} to the
secondary index write set. The second update replaces REPLACE{2,1} with
DELETE{2,1} and then with REPLACE{2,1}. When replacing DELETE{2,1} with
REPLACE{2,1} in the write set, we assume that the update doesn't modify
secondary index key parts and clear the column mask so as not to commit
a pointless request, see vy_tx_set. As a result, we skip the first
update too and get key {2,1} missing from the secondary index.

Actually, it was a dumb idea to use column mask to skip statements in
the first place, as there's a much easier way to filter out statements
that have no effect for secondary indexes. The thing is every DELETE
statement inserted into a secondary index write set acts as a "single
DELETE", i.e. there's exactly one older statement it is supposed to
purge. This is, because in contrast to the primary index we don't write
DELETE statements blindly - we always look up the tuple overwritten in
the primary index first. This means that REPLACE+DELETE for the same key
is basically a no-op and can be safely skip. Moreover, DELETE+REPLACE
can be treated as no-op, too, because secondary indexes don't store full
tuples hence all REPLACE statements for the same key are equivalent.
By marking both statements as no-op in vy_tx_set, we guarantee that
no-op statements don't make it to secondary index memory or disk levels.

Closes #4242

69aee6fc

May 23, 2019

test/luajit-tap: Add table_chain_bug_LuaJIT_494.test.lua · d07e3ac2

Cyrill Gorcunov authored 5 years ago

Backport of openresty/luajit2-test-suite
commit ce2c916d5582914edeb9499f487d9fa812632c5c

To test hash chain bug.

Part-of #4171

d07e3ac2

Bump luajit submodule · 925f923f
Kirill Yukhin authored 5 years ago

925f923f

box: ban multikey index with multiple root field · b3e924de

Kirill Shcherbatov authored 5 years ago

The test for multikey index prefix compatibility was insufficient
because JSON path is relative for some fieldno. Those root field
identifiers also must coincide.

Follow up #1257

b3e924de

May 22, 2019

swim: allow to set codec before cfg · c0a7556c

Vladislav Shpilevoy authored 5 years ago

One another problem discovered with UDP broadcast test is that
it can affect other tests, even after termination. Doing
swim:broadcast() on one test a programmer can't be sure, who will
listen it, answer, and break the test scenario.

This commit reduces probability of such a problem by

    * allowance to set a codec before swim:cfg(). It allows to
      protect SWIM nodes of different tests from each other -
      they will not understand messages from other tests. By the
      way, the same problem can appear in real applications too;

    * do not binding again a URI passed by test-run into the
      test and closed here. If a test closes a URI given to it,
      it can't be sure, that next bind() will be successful -
      test-run could already reuse it.

Follow up #3234

c0a7556c

swim: be ready to idle round steps when net is slow · 7a5ac3d7

Vladislav Shpilevoy authored 5 years ago

First of all, the problem in a nutshell was that ev_timer with
non-zero 'repeat' field in fact is a ev_periodic. It is restarted
*automatically*, even if a user does not write ev_timer_again()
nor ev_timer_start().

This led to a situation, that a round message send is scheduled,
and next round step timer alarm happens before the message is
actually sent. It, in turn, led to an assertion on attempt to
schedule a task twice.

This patch fixes the swim test harness to behave like ev_timer
with 'repeat' > 0, and on first idle round step stops the timer -
it will be restarted once the currently hanging task will be
finally sent.

Follow up #3234

7a5ac3d7

swim: fix flaky parts in swim/swim.test.lua · cda363f3

Vladislav Shpilevoy authored 5 years ago

They are caused by

    * too slow network, when SWIM tests are run under high load;

    * UDP packets late arrival or drop.

Follow up #3234

cda363f3

swim: fix an obvious use-after-free · 8a20035f
Vladislav Shpilevoy authored 5 years ago
```
Follow up #3234
```
8a20035f
swim: fix an obvious leak in swim_delete() · bae68901
Vladislav Shpilevoy authored 5 years ago
```
Follow up #3234
```
bae68901

tuple format: remove invalid assertion from tuple_format_iterator_next · aad779db

Vladimir Davydov authored 5 years ago

It's too early to assert msgpack type as an array when a multikey field
is encountered - we haven't checked the field type yet so the type might
as well be a map, in which case we will raise an error just a few lines
below. Remove the assertion and add a test case.

aad779db

tuple format: don't allow null where array/map is expected · e1d3fe8a

Vladimir Davydov authored 5 years ago

If an indexed field expects array/map, it shouldn't be allowed to insert
null instead, because this might break expectations of field accessors.
For unikey indexes inserting null instead of array/map works fine though
somewhat confusing: for a non-nullable field you get a wrong error
message ("field is missing" instead of "array/map expected, got nil");
for a nullable field, this silently works, just looks weird as there's a
clear type mismatch here. However, for a multikey field you get a crash
as tuple_multikey_count() doesn't expect to see null where an array
should be according to the format:

tuple_raw_multikey_count: Assertion `mp_typeof(*array_raw) == MP_ARRAY' failed.

This issue exists, because we assume all fields are nullable by default
for some reason. Fix that and add some tests.

Note, you can still omit nullable fields, e.g. if field "[2].a[1]" is
nullable you may insert tuple [1, {a = {}}] or [1, {b = 1}] or even [1],
you just can't pass box.NULL instead of an array/map.

e1d3fe8a

box: fix assert with multikey hybrid index · b1828dd4

Kirill Shcherbatov authored 5 years ago

Tarantool used to assume that offset_slot has an extension
iff field_map_get_offset is called with multikey_idx >= 0.
In fact, when some part of the index contains a multikey index
placeholder, tuple_compare_* routines pass a tuple_hint in
meaning of multikey index for each tuple_field_raw_by_part call,
even for regular key_part that doesn't have array index
placeholder (and, correspondingly, field_map extension).
Thus this assumption is invalid.

This patch uses the fact that field_map slots that have extension
store negative offset to distinguish multikey and normal usage
of the field_map_get_offset routine.

Closes #4234

b1828dd4

May 21, 2019

Update README.md · eca5c292
Konstantin Osipov authored 5 years ago

Unverified

eca5c292

swim: interpret no-payload as nil · a1ec38f3

Vladislav Shpilevoy authored 5 years ago

Empty string as a no-payload-flag was not a good idea, because
then a user can't write something like:

    if not member:payload() then
        ...

Follow up #3234

a1ec38f3

swim: swim:set_codec() Lua API · 2dc1af75

Vladislav Shpilevoy authored 5 years ago

Encryption with an arbitrary algorithm and any mode with a
configurable private key.

Closes #3234

2dc1af75

swim: implement and expose transport-level encryption · 6137c197

Vladislav Shpilevoy authored 5 years ago

SWIM is going to be used in and between datacenters, which means,
that its packets will go through public networks. Therefore raw
SWIM packets are vulnerable to attacks.

An attacker can do any and all of the following things:

  1) Extract secret information from member payloads, like
     credentials to Tarantool binary ports;

  2) Change UUIDs and addresses in the packets and break a
     topology;

  3) Catch the packets and pretend being a Tarantool instance,
     which could lead to undefined behaviour depending on an
     application logic.

SWIM packets need a protection layer. This commit introduces it.
SWIM transport level allows to choose an encryption algorithm
with a private key to encrypt each packet with that key.

Besides, each packet is encrypted using a random public key
prepended to the packet.

SWIM now provides a public API to choose an encryption algorithm
and a private key.

Part of #3234

6137c197

swim: split send/recv into phases · f77f4b9e

Vladislav Shpilevoy authored 5 years ago

At this moment swim_scheduler_on_output() is a relatively simple
function. It takes a task, builds its meta and flushes a result
into the network. But soon SWIM will be able to encrypt messages.

It means, that in addition to regular preprocessing like building
meta headers a new phase will appear - encryption. What is more -
conditional encryption, because a user may want to do not encrypt
messages.

All the same is about swim_scheduler_on_input() - if a SWIM
instance uses encryption, it should decrypt incoming messages
before forwarding them into the SWIM core logic.

The chosen strategy - lets reuse on_output/on_input virtuality
and create two version of on_input/on_output functions:

    swim_on_plain_input()  | swim_on_encrypted_input()
    swim_on_plain_output() | swim_on_encrypted_output()

One of these pairs is chosen depending on if the instance uses
encryption.

To make these 4 functions as simple and short as possible this
commit creates two sets of functions, doing all the logic except
encryption:

    swim_begin_send()
    swim_do_send()
    swim_complete_send()

    swim_begin_recv()
    swim_do_recv()
    swim_complete_recv()

These functions will be used by on_input/on_output functions with
different arguments.

Part of #3234

f77f4b9e

swim: cache members in Lua member table · 39dd852d

Vladislav Shpilevoy authored 5 years ago

Each time a member was returned from a SWIM instance object, it
was wrapped by a table with a special metatable, cached payload.

But next the same lookup returned a new table. It

  - created garbage as a new member wrapper;
  - lost cached decoded payload.

This commit caches in a private table all wrapped members and
returns an existing wrapper on a next lookup. A microbenchmark
showed, that cached result retrieval is 10 times faster, than
each time create a new table.

Cache table keeps week references - it means, that when a member
object looses all its references in a user's application, it is
automatically dropped from the table.

Part of #3234

39dd852d

swim: cache decoded payload in the Lua module · 8ae88a3f

Vladislav Shpilevoy authored 5 years ago

Users of Lua SWIM module likely will use Lua objects as a
payload. Lua objects are serialized into MessagePack
automatically, and deserialized back on other instances. But
deserialization of 1.2Kb payload on each member:payload()
invocation is quite heavy operation. This commit caches decoded
payloads to return them again until change.

A microbenchmark showed, that cached payload is returned ~100
times faster, than it is decoded each time. Even though a tested
payload was quite small and simple:

    s:set_payload({a = 100, b = 200})

Even this payload is returned 100 times faster, and does not
affect GC.

Part of #3234

8ae88a3f

swim: allow to use cdata struct tt_uuid in Lua API · 70e99323

Vladislav Shpilevoy authored 5 years ago

Sometimes, especially in tests, it is useful to make something
like this:

    s:add_member({uuid = member:uuid(), uri = member:uri()})

But member:uuid() is cdata struct tt_uuid. This commit allows
that.

Part of #3234

70e99323

swim: pairs() function to iterate over member table · 248f2425

Vladislav Shpilevoy authored 5 years ago

Expose iterators API to be able to iterate over a member table in
a 'for' loop like it would just be a Lua table.

Part of #3234

248f2425

swim: Lua bindings to access individual members · 4b8c8723
Vladislav Shpilevoy authored 5 years ago
```
Expose API to search members by UUID, to read their attributes,
to set payload.

Part of #3234
```
4b8c8723

swim: Lua bindings to manipulate member table · 1203bf90

Vladislav Shpilevoy authored 5 years ago

Expose methods to add, remove, probe members by uri, uuid. Expose
broadcast method to probe multiple members by port.

Part of #3234

1203bf90

swim: introduce Lua interface · 407adf40

Vladislav Shpilevoy authored 5 years ago

SWIM as a library can be useful not only for server internals,
but for users as well. This commit exposes Lua bindings to SWIM
C API. Here only basic bindings are introduced to create, delete,
quit, check a SWIM instance. With sanity tests.

Part of #3234

407adf40

swim: validate URI in swim_probe_member() · ae651ab4

Vladislav Shpilevoy authored 5 years ago

Similar methods validate their arguments: add_member,
remove_member. Validate here as well for consistency.

Part of #3234

ae651ab4

swim: make swim_new_round() void · dcf4fbdf

Vladislav Shpilevoy authored 5 years ago

Firstly, I thought that there is an error - swim_begin_step()
does not reschedules round timer, when new_round() fails. But
then new_round() appeared never failing. This commit makes it
void to eliminate confusion.

Probably it is a legacy since the shuffled members array was
allocated and freed in new_round().

Part of #3234

dcf4fbdf

swim: fix an assertion on attempt to chage timeouts · 10ad9706

Vladislav Shpilevoy authored 5 years ago

Appeared, that libev does not allow to change ev_timer values in
flight. A timer, reset via ev_timer_set(), should be restarted,
because the function changes 'ev_timer.at', which in turn is used
internally by timer routines.

Part of #3234

10ad9706

buffer: replace all ffi.new(type[1]) with cached union · a2c19aa8

Vladislav Shpilevoy authored 5 years ago

Lua, which suffers from lack of ability to pass values by
pointers into FFI functions, nor has an address operator '&' to
take an address of integer or char or anything. Because of that
a user need to either use ffi.new(type[1]) or use static buffer,
but for such small allocations they are both too expensive and
aggravate GC problem.

Now buffer module provides preallocated basic types to use in FFI
functions. The commit is motivated by one another place where
ffi.new('int[1]') appeared, in SWIM module, to obtain payload
size as an out parameter of a C function.

a2c19aa8

buffer: port static allocator to Lua · 7fd6c809

Vladislav Shpilevoy authored 5 years ago

Static allocator gives memory blocks from cyclic BSS memory
block of 3 pages 4096 bytes each. It is much faster than
malloc, when a temporary buffer is needed. Moreover, it does not
complicate GC job. Despite being faster than malloc, it is still
slower, than ffi.new() of size <= 128 known in advance (according
to microbenchmarks).

ffi.new(size<=128) works either much faster or with the same
speed as static_alloc, because internally FFI allocator library
caches small blocks and can return them without malloc().

A simple micro benchmark showed, that ffi.new() vs
buffer.static_alloc() is ~100 times slower on allocations of
> 128 size,  on <= 128 when size is not inlined.

To better understand what is meant as 'inlined': this

    ffi.new('char[?]', < <=128 >)

works ~100 times faster than this:

    local size = <= 128
    ffi.new('char[?]', size)

ffi.new() with inlined size <= 128 works faster than light, and
even static allocator can't beat it.

7fd6c809

vinyl: fix assertion while recovering dumped statement · 9566f14c

Vladimir Davydov authored 5 years ago

Certain kinds of DML requests don't update secondary indexes, e.g.
UPDATE that doesn't touch secondary index parts or DELETE for which
generation of secondary index statements is deferred. For such a request
vy_is_committed(env, space) may return false on recovery even if it has
actually been dumped: since such a statement is not dumped for secondary
indexes, secondary index's vy_lsm::dump_lsn may be less than such
statement's signature, which makes vy_is_committed() assume that the
statement hasn't been dumped. Further in the code we have checks that
ensure that if we execute a request on recovery, it must not be dumped
for the primary index (as the primary index is always dumped after
secondary indexes for the sake of recovery), which fires in this case.

To fix that, let's refactor the code basing on the following two facts:
 - Primary index is always updated by a DML request.
 - Primary index may only be dumped after secondary indexes.

Closes #4222

9566f14c

build: fix strip_core with in-source build · 3cf75bc8

Alexander Turenko authored 5 years ago

Yet another fix for building of small library as part of tarantool.
Before this commit slab_arena test fails:

 | [019] Test failed! Result content mismatch:
 | [019] --- small/slab_arena.result	Mon May 20 21:37:46 2019
 | [019] +++ small/slab_arena.reject	Mon May 20 21:47:01 2019
 | [019] @@ -23,3 +23,4 @@
 | [019]  arena->maxalloc = 2000896
 | [019]  arena->used = 0
 | [019]  arena->slab_size = 65536
 | [019] +ERROR: Expected dd flag on VMA address 0x7f3ec2080000

See the corresponding commit in the small submdoule for more info.

Unverified

3cf75bc8

box: fix autoincrement for json path indexes · a6a33e9f

Vladimir Davydov authored 5 years ago

The autoincrement code was written when there were no nested field.
Now, it isn't enough to just skip to the autoincrement field - we also
need to descend deeper if key_part->path is set.

Note, the code expects the nested field to be present and set to NULL.
That is, if field path is [1].a.b, the tuple must have all intermediate
fields set: {{a = {b = box.NULL}}} (usage of box.NULL is mandatory to
create a tuple like that in Lua).

Closes #4210

a6a33e9f

schema: allow to set sequence for any index part, not just the first · 60111894

Vladimir Davydov authored 5 years ago

Closes #4009

@TarantoolBot document
Title: Sequence can now be set for an index part other than the first

Initially one could attach a sequence (aka autoincrement) only to the
first index part. Now it's possible to attach a sequence to any primary
index part. The part still must be integer though.

Syntax:

```
box.schema.space.create('test')
box.space.test:create_index('primary', {
    parts = {{1, 'string'}, {2, 'unsigned'}, {3, 'unsigned'}},
    sequence = true, sequence_part = 2
})
box.space.test:insert{'a', box.null, 1} -- inserts {'a', 1, 1}
```

Note, `sequence_part` option is 1-base.

If `sequence_part` is omitted, 1 is used, which assures backward
compatibility with the original behavior.

One can also attach a sequence to another index part using
`index.alter` (the code below continues the example above):

```
box.space.test.index.primary:alter{sequence_part = 3}
box.space.test:insert{'a', 1, box.null, 'x'} -- inserts {'a', 1, 2, 'x'}
```

60111894

schema: fix error while altering index with sequence · 7d778de6

Vladimir Davydov authored 5 years ago

A check was missing in index.alter. This resulted in an attempt to drop
the sequence attached to the altered index even if the sequence was not
modified.

Closes #4214

7d778de6

schema: use tuple field names in Lua · f47f2004

Vladimir Davydov authored 5 years ago

When schema.lua was introduced, there was no such thing as space format
and we had to access tuple fields by no. Now we can use human readable
names. Let's do it - this should improve code readability.

A note about box/alter.test.lua: for some reason it clears format of
_space and _index system spaces, which apparently breaks our assumption
about field names. Let's zap those pointless test cases.

f47f2004

May 20, 2019

build: fix out of source build · 612f05b2
Alexander Turenko authored 5 years ago
```
Updated small submodule with the corresponding fix.
```
Unverified

612f05b2

coio: fix getaddrinfo assertion on 0 timeout · b6466ac7

Vladislav Shpilevoy authored 5 years ago

Background. Coio provides a way to schedule arbitrary tasks
execution in worker threads. A task consists of a function to
execute, and a custom destructor.

To push a task the function coio_task_post(task, timeout) was
used. When the function returns 0, a caller can obtain a result
and should free the task manually. But the trick is that if
timeout was 0, the task was posted in a detached state. A
detached task frees its memory automatically despite
coio_task_post() result, and does not even yield. Such a task
object can't be accessed and so much the more freed manually.

coio_getaddrinfo() used coio_task_post() and freed the task when
the latter function returned 0. It led to double free when
timeout was set 0. The bug was introduced here
800cec73 in an attempt to do not
yield in say_logrotate, because it is not fiber-safe.

Now there are two functions: coio_task_execute(task, timeout),
which never detaches a task completed successfully, and
coio_task_post(task), which posts a task in a detached state.

Closes #4209

b6466ac7