Commits · f5c8b825cf194ee9bf927a1a727adfbabe1a354e · core / tarantool

Dec 06, 2018

test: replication parallel mode on · f5c8b825
Sergei Voronezhskii authored 6 years ago
```
Part of #2436, #3232
```
f5c8b825

test: use wait_cond to check follow status · f41548b7

Sergei Voronezhskii authored 6 years ago

After setting timeouts in `box.cfg` and before making a `replace` needs
to wait for replicas in `follow` status. Then if `wait_follow()` found
not `follow` status it returns true. Which immediately causes an error.

Fixes #3734
Part of #2436, #3232

f41548b7

test: put require in proper places · d2f28afa

Sergei Voronezhskii authored 6 years ago

* put `require('fiber')` after each switch server command, because
  sometimes got 'fiber' not defined error
* use `require('fio')` after `require('test_run').new()`, because
  sometimes got 'fio' not defined error

Part of #2436, #3232

d2f28afa

test: errinj for pause relay_send · 1c34c91f

Sergei Voronezhskii authored 6 years ago

Instead of using timeout we need just pause `relay_send`. Can't rely
on timeout because of various system load in parallel mode. Add new
errinj which checks boolean in loop and until it is not `True` do not
pass the method `relay_send` to the next statement.

To check the read-only mode, need to make a modification of tuple. It
is enough to call `replace` method. Instead of `delete` and then
useless verification that we have not delete tuple by using `get`
method.

And lookup the xlog files in loop with a little sleep, until the file
count is not as expected.

Update box/errinj.result because new errinj was added.

Part of #2436, #3232

1c34c91f

test: cleanup replication tests · 848a0b03

Sergei Voronezhskii authored 6 years ago

- at the end of tests which create any replication config need to call:
  * `test_run:cmd('delete server ...')` removes server object
    from `TestState.servers` list, this behaviour was taken
    from `test_run:drop_cluster()` function
  * `test_run:clenup_cluster()` which clears `box.space._cluster`
- switch on `use_unix_sockets` because of 'Address already in use'
  problems
- test `once` need to clean `once*` schemas

Part of #2436, #3232

848a0b03

sql: fix tarantoolSqlite3TupleColumnFast · 2bfe8ac5

Kirill Shcherbatov authored 6 years ago

The tarantoolSqlite3TupleColumnFast routine used to lookup
offset_slot in unallocated memory in some cases.
The assert with exact_field_count same as motivation to change
old correct assert with field_count in 7a8de281 is not correct.
assert(format->exact_field_count == 0 ||
       fieldno < format->exact_field_count);
The tarantoolSqlite3TupleColumnFast routine requires offset_slot
that has been allocated during tuple_format_create call. This
value is stored in indexed field with index that limited with
index_field_count that is <= field_count. Look at
tuple_format_alloc for more details.

The format in cursor triggering valid assertion has such
structure because first 4 tuples in _space: 257, 272, 276 and
280 have an old format of _space with only one field
(format->field_count == 1).
It happens because these 4 tuples are recovered not after tuple
with id 280 which stores actual format of _space. After tuple
280 is recovered, an actual format is set in struct space of
_space and all next tuples have full featured formats.

So for these 4 tuples tarantoolSqlite3TupleColumnFast can fail
even if a field exists, is indexed and has a name. Those
features are just described in a newer format.
(thank Gerold103 for problem explanation)

Closes #3772

2bfe8ac5

sql: fix parser.parse_only mode for triggers · ac73e345

Kirill Shcherbatov authored 6 years ago

As the parse_only flag had not worked correctly for sql triggers
sql_trigger_compile have had a Vdbe memory leak.

Closes #3838

ac73e345

box: fix checkpoint_delete · 3c8330ea

Kirill Shcherbatov authored 6 years ago

The rlist_foreach_entry iterator was used for freeing resources.
As a result there was dirty access to memory during next step of
for-loop.
Replaced with rlist_foreach_entry_safe valid for destructors.

Closes #3858

3c8330ea

Dec 04, 2018

info: remove false comments from src/info.h · c6e5bf48
Vladislav Shpilevoy authored 6 years ago

c6e5bf48

box: move info_handler interface into src/info · f1a114ca

Vladislav Shpilevoy authored 6 years ago

Box/info.h defines info_handler interface with a set
of virtual functions. It allows to hide Lua from code
not depending on this language, and is used in things
like index:info(), box.info() to build Lua table with
some info. But it does not depend on box/ so move it
to src/.

Also, this API is needed for the forthcoming SWIM
module which is going to be placed into src/lib and
needs info to dump its state to Lua from C without
strict Lua dependency.

@locker:
 - remove pointless _GNU_SOURCE definition from
   box/lua/info.c
 - remove luaT_info_handler_create declaration from
   box/lua/info.h

Needed for #3234

f1a114ca

Dec 03, 2018

lua: getpwall/getgrall error handling - follow-up fixes · a1606e91

Vladimir Davydov authored 6 years ago

 - Add the forgotten errno(0) to getgrall.
 - Throw errors from getgrall/getpwall instead of returning nil in
   case the underlying system function fails.
 - Fix the error message in getgr.
 - Remove pointless and confusing asterisk sign from error messages.
 - Do not hide a stack frame on error.

Follow-up efccac69 ("lua: fix error handling in getpwall and
getgrall").

a1606e91

lua: fix error handling in getpwall and getgrall · efccac69

Alexander Turenko authored 6 years ago

This commit fixes app-tap/pwd.test.lua test. It seems that the problem
appears after updating to glibc-2.28.

It seems that usual way to handle errors in Unix is to check errno only
when a return value indicates possibility of an error.

Related to #3766.

efccac69

Remove deprecated getaddrinfo() flags · b601d0be

Alexander Turenko authored 6 years ago

AI_IDN_ALLOW_UNASSIGNED and AI_IDN_USE_STD3_ASCII_RULES flags are
deprecated by glibc-2.28 and the deprecation warnings did cause fail of
Debug build, because of -Werror.

Fixes #3766.

b601d0be

box: move port to src/ · 1730b39a

Vladislav Shpilevoy authored 6 years ago

Basic port structure does not depend on anything but
standard types. It just gives an interface and calls
virtual functions.

Its location in box/ was ok since it was not used
anywhere in src/. But next commits will add a new
method to mpstream so as to dump port. Mpstream is
implemented in src/, so lets move port over here.

Needed for #3505

1730b39a

test: fix app/fiber.test.lua flaky fails · 0e19478c
Alexander Turenko authored 6 years ago
```
Fixes #3852.
```
0e19478c

test: fix hardcoded port in box/net.box.test.lua · f36568c0

Alexander Turenko authored 6 years ago

It allows to run the test many times in parallel to investigate flaky
test failures and decreases probability that the test fails, because
this port was already used by, say, some other test.

f36568c0

test: fix http_client.test.lua with curl-7.62 · 10518cc1

Alexander Turenko authored 6 years ago

curl-7.61.1

```
tarantool> require('http.client').new():get('http://localhost:0')
---
- status: 595
  reason: Couldn't connect to server
```

curl-7.62

```
tarantool> require('http.client').new():get('http://localhost:0')
---
- error: 'curl: URL using bad/illegal format or missing URL'
...
```

curl-7.62 returns CURLE_URL_MALFORMAT is case of zero port and tarantool
raises an error in the case. I think this behaviour is valid, so I fixed
the test.

10518cc1

Nov 29, 2018

gc: run garbage collection in background · 07191842

Vladimir Davydov authored 6 years ago

Currently, garbage collection is executed synchronously by functions
that may trigger it, such as gc_consumer_advance or gc_add_checkpoint.
As a result, one has to be very cautious when using those functions as
they may yield at their will. For example, we can't shoot off stale
consumers right in tx_prio handler - we have to use rather clumsy WAL
watcher interface instead. Besides, in future, when the garbage
collector state is persisted, we will need to call those functions from
on_commit trigger callback, where yielding is not normally allowed.

Actually, there's no reason to remove old files synchronously - we could
as well do it in the background. So this patch introduces a background
garbage collection fiber that executes gc_run when woken up. Now all
functions that might trigger garbage collection wake up this fiber
instead of executing gc_run directly.

07191842

recovery: restore garbage collector vclock after restart · baf28a59

Vladimir Davydov authored 6 years ago

After restart the garbage collector vclock is reset to the vclock of the
oldest preserved checkpoint, which is incorrect - it may be less in case
there is a replica that lagged behind, and it may be greater as well in
case the WAL thread hit ENOSPC and had to remove some WAL files to
continue. Fix it.

A note about xlog/panic_on_wal_error test. To check that replication
stops if some xlogs are missing, the test first removes xlogs on the
master, then restarts the master, then tries to start the replica
expecting that replication should fail. Well, it shouldn't - the replica
should rebootstrap instead. It didn't rebootstrap before this patch
though, because the master reported wrong garbage collector vclock (as
it didn't recover it on restart). After this patch the replica would
rebootstrap and the test would hang. Fix this by restarting the master
before removing xlog files.

baf28a59

wal: remove files needed for recovery from backup checkpoints on ENOSPC · bd7f7116

Vladimir Davydov authored 6 years ago

Tarantool always keeps box.cfg.checkpoint_count latest checkpoints. It
also never deletes WAL files needed for recovery from any of them for
the sake of redundancy, even if it gets ENOSPC while trying to write to
WAL. This patch changes that behavior: now the WAL thread is allowed to
delete backup WAL files in case of emergency ENOSPC - after all it's
better than stopping operation.

Closes #3822

bd7f7116

wal: separate checkpoint and flush paths · 74d8db74

Vladimir Davydov authored 6 years ago

Currently, wal_checkpoint() is used for two purposes. First, to make a
checkpoint (rotate = true). Second, to flush all pending WAL requests
(rotate = false). Since checkpointing has to fail if cascading rollback
is in progress so does flushing. This is confusing. Let's separate the
two paths.

While we are at it, let's also rewrite WAL checkpointing using cbus_call
instead of cpipe_push as it's a more convenient way of exchanging simple
two-hop messages between two threads.

74d8db74

json: some renames · b56103f5

Kirill Shcherbatov authored 6 years ago

We are planning to link json_path_node objects in a tree and attach some
extra information to them so that they could be used to describe a json
document structure. Let's rename it to json_token as it sounds more
appropriate for the purpose.

Also, rename json_path_parser to json_lexer as it isn't a parser,
really, it's rather a tokenizer or lexer. Besides, the new name is
shorter.

Needed for #1012

b56103f5

test: fix vinyl/errinj spurious failure · 8e13153b

Vladimir Davydov authored 6 years ago

The failing test case checks that modifications done to the space during
the final dump of a newly built index are recovered properly. It assumes
that a series of operations will complete in 0.1 seconds, but it may not
happen if the disk is slow (like on Travis CI). This results in spurious
failures. To fix this issue, let's replace ERRINJ_VY_RUN_WRITE_TIMEOUT
used by the test with ERRINJ_VY_RUN_WRITE_DELAY, which blocks index
creation until it is disabled instead of injecting a time delay as its
predecessor did.

Closes #3756

8e13153b

Don't repeast SQL stress tests with vinyl engine. · 6e07131d

Konstantin Osipov authored 6 years ago

These are stress testing some of the parser/vdbe features, no point
in replaying them against vinyl. They could just as well run in
wal_mode="none"

6e07131d

Disable gh-3332-tuple-format-leak.test, gh-3083-ephemeral-unref-tuples.test · 52a212f3
Konstantin Osipov authored 6 years ago
```
Disable these tests in regular suite until they are sped up in scope
of gh-3845
```
52a212f3
lua: moving lua error functions to separate file · 27a04953
Ilya Markov authored 6 years ago
```
Refactoring. Move lua error functions to a separate file.

A prerequisite for #677
```
27a04953
test: skip test backtrace if no libunwind support · 2aa25ba5
Sergei Voronezhskii authored 6 years ago
```
Closes #3824
```
2aa25ba5

iproto: remove iproto functions from execute.c · 474bdf36

Mergen Imeev authored 6 years ago

To make functions in execute.h more universal we should reduce
their dependence on IPROTO. This patch removes IPROTO functions
from execute.c.

Needed for #3505

474bdf36

box: add method dump_lua to port · 6ecd7ee1

Mergen Imeev authored 6 years ago

New method dump_lua dumps saved in port tuples to Lua stack. It
will allow us to call this method without any other interaction
with port.

Needed for #3505

6ecd7ee1

box: store sql text and length in sql_request · bc9e41e9

Kirill Shcherbatov authored 6 years ago

Refactored sql_request structure to store pointer to sql string
data and it's length instead of pointer to msgpack
representation.
This is required to use this structure in sql.c where the query
has a different semantics and can be obtained from stack as a C
string.

Needed for #3505.

bc9e41e9

Nov 28, 2018

lua: fix tonumber64() for strings containing "ULL" · d935d64b

Serge Petrenko authored 6 years ago

tonumber64() doesn't understand strings with "ULL" like "123ULL". The
expected output for tonumber64("123ULL") is 123, since 123ULL is a
correct number notation in lua. However, our function returns null.
This happens because suffix isn't trimmed in tonumber64.

Trim ULL/LLU, LL suffixes, but only when no base is specified or
base is equal to either 2, 10 or 16.

Closes #3431

d935d64b

test: enable parallel mode for xlog tests · 4d47162d
Sergei Voronezhskii authored 6 years ago
```
Part of #2436, #3232
```
4d47162d

Nov 27, 2018

test: enable parallel mode for wal_off tests · d837c94b

Sergei Voronezhskii authored 6 years ago

- Box configuration parameter `memtx_memory` is increased, because the
  test `lua` after `tuple` failed with the error:
  `Failed to allocate 368569 bytes in slab allocator for memtx_tuple`
  despite `collectgarbage('collect')` calls after cases with huge/many
  tuples.
  The statistics before the allocation fail gives the following values:
  ```
  box.slab.info()
  ---
  - items_size: 72786472
    items_used_ratio: 4.43%
    quota_size: 107374592
    quota_used_ratio: 93.75%
    arena_used_ratio: 6.1%
    items_used: 3222376
    quota_used: 100663296
    arena_size: 100663296
    arena_used: 6105960
  ```
  The reason of the fail seems to be a slab memory fragmentation. It is
  not clear for now whether we should consider this as a tarantool
  issue.

- Test `snapshot_stress` counts snapshot files present in the
  working directory and can reach the default 'checkpoint_count' value
  `2` if a previous test write its snapshots before.

- Restarting the default server w/o cleaning a working directory
  can leave a snapshot that holds a state saved at the middle of a test,
  before dropping of the space 'tweedledum' (because WAL is disabled),
  that can cause the error `Space 'tweedledum' already exists` for a
  following test.

- Use unix sockets because of errors `Address already in use`.

Part of #2436

d837c94b

sql: remove fiber_gc() from sqlite3VdbeHalt() · e3d931e0

Mergen Imeev authored 6 years ago

Too many autogenerated ids leads to SEGFAULT. This problem
appeared because region was cleaned twice: once in
sqlite3VdbeHalt() and once in sqlite3VdbeDelete() which was
executed during sqlite3_finalize(). Autogenerated ids that were
saved there, were fetched after sqlite3VdbeHalt() and before
sqlite3_finalize(). In this patch region cleaning in
sqlite3VdbeHalt() has been removed.

Follow up #2618
Follow up #3199

e3d931e0

sql: decode ARRAY and MAP types after SELECT · 135de5b5

Mergen Imeev authored 6 years ago

Before this patch MSGPACK received using SELECT statement through
net.box was unpacked. Fixed in this patch.

135de5b5

sql: fix error handling in sql_analysis_load() · fee95cf7

Serge Petrenko authored 6 years ago

Previously if an error occured in box_index_len() called from
sql_analysis_load(), the return code (-1 on error) was cast to uint32_t
and used later as size of memory to be allocated. This lead to assertion
failures in slab_order() since allocation size was too big. This was
discovered during investigation of #3779.
Fix error handling and add some error logging.

Follow-up #3779

fee95cf7

box: use replicaset.vclock in replica join/subscribe · f50f0b29

Vladimir Davydov authored 6 years ago

Again, this is something that was introduced by commit f2bccc18
("Use WAL vclock instead of TX vclock in most places") without any
justification.

TX has its own copy of the current vclock - there's absolutely no need
to inquire it from the WAL thread. Actually, we already use TX local
vclock in box_process_vote(). No reason to treat join/subscribe any
different. Moreover, it's even harmful - there may be a gap at the end
of a WAL file, in which case WAL vclock will be slightly ahead of TX
vclock so that should a replica try to subscribe it would never finish
syncing, see #3830.

Closes #3830

f50f0b29

box: do not rotate WAL when replica subscribes · 7439529d

Vladimir Davydov authored 6 years ago

Because this is pointless and confusing. This "feature" was silently
introduced by commit f2bccc18 ("Use WAL vclock instead of TX vclock
in most places"). Let's revert this change. This will allow us to
clearly separate WAL checkpointing from WAL flushing, which will in turn
facilitate implementation of the checkpoint-on-WAL-threshold feature.

There are two problems here, however. First, not rotating the log breaks
expectations of replication/gc test: an xlog file doesn't get deleted in
time as a consequence. This happens, because we don't delete xlogs
relayed to a replica after join stage is complete - we only do it during
subscribe stage - and if we don't rotate WAL on subscribe the garbage
collector won't be invoked. This is actually a bug - we should advance
the WAL consumer associated with a replica once join stage is complete.
This patch fixes it, but it unveils another problem - this time in the
WAL garbage collection procedure.

Turns out, when passed a vclock, the WAL garbage collection procedure
removes all WAL files that were created before the vclock. Apparently,
this isn't quite correct - if a consumer is in the middle of a WAL file,
we must not delete the WAL file, but we do. This works as long as
consumers never track vlcocks inside WAL files - currently they are
advanced only when a WAL file is closed and naturally they are advanced
to the beginning of the next WAL file. However, if we want to advance
the consumer associated with a replica when join stage ends (this is
what the previous paragraph is about), it might occur that we will
advance it to the middle of a WAL file. If that happens the WAL garbage
collector might remove a file which is actually in use by a replica.
Fix this as well.

7439529d

engine: pass vclock instead of lsn to collect_garbage callback · ca1eb666

Vladimir Davydov authored 6 years ago

First, this is consistent with other engine callbacks, such as
checkpoint or backup.

Second, a vclock can be used as a search key in a vclock set,
which in turn can make code more straightforward, e.g. look how
this patch simplifies vy_log_prev_checkpoint().

ca1eb666

Update small submodule · 6bc47d90

Vladimir Davydov authored 6 years ago

In the updated version rb_proto/rb_gen use const qualifier for the key
argument, which allows to pass pointers to const objects to search
methods.

6bc47d90