Commits · 35db70fac55c86f44b822104f361ddb825a3a48d · core / tarantool

May 17, 2018

vinyl: remove runs not referenced by any checkpoint immediately · 35db70fa

If a compacted run was created after the last checkpoint, it is not
needed to recover from any checkpoint and hence can be deleted right
away to save disk space.

Closes #3407

35db70fa

May 15, 2018

test: improve vinyl/select_consistency · 47fe6ced

Vladimir Davydov authored 6 years ago

Improve the test by decreasing range_size so that it creates a lot of
ranges for test indexes, not just one. This helped find bugs causing
the crash described in #3393.

Follow-up #3393

47fe6ced

vinyl: do not panic if secondary index is inconsistent with primary · 1558c538

Vladimir Davydov authored 6 years ago

Although the bug in vy_task_dump_complete() due to which a tuple could
be lost during dump was fixed, there still may be affected deployments
as the bug was persisted on disk. To avoid occasional crashes on such
deployments, let's make vinyl_iterator_secondary_next() skip tuples that
are present in a secondary index but missing in the primary.

Closes #3393

1558c538

vinyl: fix lost key on dump completion · 1f0023ad

Vladimir Davydov authored 6 years ago

vy_task_dump_complete() creates a slice per each range overlapping with
the newly written run. It uses vy_range_tree_psearch(min_key) to find
the first overlapping range and nsearch(max_key) to find the range
immediately following the last overlapping range. This is incorrect as
nsearch rb tree method returns the element matching the search key if it
is present in the tree. That is, if the max key written to a run turns
out to be equal the beginning of a range, the slice won't be created for
it and it will be silently and persistently lost.

The issue manifests itself as crash in vinyl_iterator_secondary_next(),
when we fail to find the tuple in the primary index corresponding to a
statement found in a secondary index.

Part of #3393

1f0023ad

vinyl: fix EQ check in run iterator · 7ee79a0a

Vladimir Davydov authored 6 years ago

vy_run_iterator_seek() is supposed to check that the resulting statement
matches the search key in case of ITER_EQ, but if the search key lies at
the beginning of the slice, it doesn't. As a result, vy_point_lookup()
may fail to find an existing tuple as demonstrated below.

Suppose we are looking for key {10} in the primary index which consists
of an empty mem and two runs:

    run 1: DELETE{15}
    run 2: INSERT{10}

vy_run_iterator_next() returns DELETE{15} for run 1 because of the
missing EQ check and vy_point_lookup() stops at run 1 (since the
terminal statement is found) and mistakenly returns NULL.

The issue manifests itself as crash in vinyl_iterator_secondary_next(),
when we fail to find the tuple in the primary index corresponding to a
statement found in a secondary index.

Part of #3393

7ee79a0a

Add test case for fiber safety of digest.pbkdf2 · ec9ec946
Alexander Turenko authored 6 years ago
```
Follows up #3396.
```
ec9ec946

May 14, 2018

Use automatic storage for digest.pbkdf2 results · 06ec3d50

Alexander Turenko authored 6 years ago

It prevents rewriting result by an another thread after coio_call(), but
before lua_pushlstring(). Such case is possible because libeio uses
thread pool internally and static __thread storage can be reused before
lua_pushlstring() if many parallel digest.pbkdf2() calls are on the fly.

Fixes #3396.

06ec3d50

May 08, 2018

socket: Fix socket test · 2b973c05

Ilya Markov authored 6 years ago

In sequential launch of app-tap/console.test, tests failed with "User
exists" and binding errors.

Make sockets path relative.
Add users cleanup.

Relates #3168

2b973c05

May 07, 2018

Don't try to lock a ddl latch in a multistatement tx · c7012534

Georgy Kirichenko authored 7 years ago

Any ddl is prohibited in a multistatement transaction, there is no
reason to try to lock a ddl latch in tis case. Locking for already
locked latch will cause an yield and a silent transaction rollback, and
this will crash or assert tarantool server.

Fixes #2783

c7012534

May 05, 2018

console: fix a bug in interactive readline usage · 92e130cb

Vladislav Shpilevoy authored 6 years ago

Spurious wakeups are possible in console, that makes readline
think that there are some data on stdin. Waked up readline
returns garbage instead of string, that crashes a server on
assertion in Lua.

Closes #3343

92e130cb

May 03, 2018

digest: fix error in base64 encode options · 6e1ac12e

Vladislav Shpilevoy authored 6 years ago

Any option of base64 leads to urlsafe encoding. It is wrong, and
caused by incorrect flag checking. Fix it.

Closes #3358

6e1ac12e

iproto: follow up patch for the fix for blocked connection · 1dcdc98e

Konstantin Osipov authored 6 years ago

* rename request_limit.test.lua to net_msg_max.test.lua
* make net_msg_max.test.lua stable (courtesy of @Gerold103)
* exclude disconnect messages from iproto_msg_max limit
* add a separate warning for throttling based on readahead buffer overflow

1dcdc98e

iproto: connection could block forever after a CALL request · f4d66dae

Vladislav Shpilevoy authored 6 years ago

Starting with 1.9, CALL request which yields releases
the intput buffer in net thread before CALL is complete.
A release trigger is fired when the CALL fiber yields.

The problem is that by default the input socket is not
included into poll() list of the event loop: thanks to an
optimization by @kostja for strict request/response scenario,
the socket is included into poll() list only after the response
is sent to the client. Thus, the following could happen:

* a client sends a long-polling request
* the request yields and maybe never finishes
* the socket is not being read until the long-polling request
  is finished

The patch is to explicitly feed EV_READ event to the event
loop on the client socket whenever we release the input buffer
for a long-polling request.

We may remove iproto_resume() from net_discard_input() along
with this patch since iproto_resume() will be called by
iproto_connection_on_input().

f4d66dae

Apr 18, 2018

wal: Update request header after sequence update · 41589229

Ilya Markov authored 6 years ago

When tuple in insert/replace request has NULL value
in the field incremented by sequence,
request body is changed, NULL is replaced by value taken from
sequence.
But request header is not updated.
So Redo log, which takes body from header if header exists,
writes the old version of request to wal.

Fixed this with updating header value after handling the sequence.

Closes #3247

41589229

replication: fix broken cases with quorum=0 · 01b8ebc3

Konstantin Belyavskiy authored 6 years ago

This commit is related with 6d81fa99
With replication_connect_quorum=0 set, previous commit broke replication
since skip applier_resume() and applier_start() parts.
Fix it and add more test cases.

Closes #3278

01b8ebc3

replication: fix bug with read-only replica as a bootstrap leader · a8ecd1e1

Konstantin Belyavskiy authored 6 years ago

When bootstrapping a new cluster, each replica from replicaset can
be chosen as a leader, but if it is 'read-only', bootstrap will
failed with an error.
Fixed it by eliminating read-only replicas from voting by adding
access rights information to IPROTO_REQUEST_VOTE reply.

Closes #3257

a8ecd1e1

Apr 11, 2018

Fix backtrace typo · b8bff028
Georgy Kirichenko authored 6 years ago
```
There was an invalid refactoring with forgotten renaming.
```
b8bff028

Fix PyPI TLS connection failures in CI on OS X … (#3336) · 531c80f0

Arseny Antonov authored 6 years ago

The reason of the failures is TLSv1.0/TLSv1.1 brownout on the PyPI side,
see [1] for more information.

[1]: pypa/packaging-problems#130

531c80f0

Apr 10, 2018
- Added new coveralls options to sync repos with coveralls site (#3327) (#3331) · 73d75ed1
  Arseny Antonov authored 6 years ago
  
  * Added new coveralls options to sync repo * Pass travis job to coverage docker
  73d75ed1
Apr 09, 2018

replication: fix bug with zero replication_connect_quorum · 6d81fa99

Konstantin Belyavskiy authored 6 years ago

If 'box.cfg.read_only' is false, 'replication' defines at least one
replica (other than itself), but they are not available at the time
of box.cfg execution and replication_connect_quorum is set to zero,
master displays 'orphan' status instead of 'running' since logic
which cnange this state is executed only after successfull connection.

Closes #3278

6d81fa99

Apr 07, 2018

vinyl: fix crash if index is dropped while read task is in progress · 2a7cf7f5

Vladimir Davydov authored 6 years ago

If a fiber waiting for a read task to complete is cancelled, it will
leave the read iterator immediately, leaving the read task pending.
If the index is dropped before the read task is complete, the task
will attempt to dereference a deleted run upon completion:

    0  0x560b4007dbbc in print_backtrace+9
    1  0x560b3ff80a1d in _ZL12sig_fatal_cbiP9siginfo_tPv+1e7
    2  0x7f52b09190c0 in __restore_rt+0
    3  0x7f52af6ea30a in bzero+5a
    4  0x560b3ffc7a99 in mempool_free+2a
    5  0x560b3ffcaeb7 in vy_page_read_cb_free+47
    6  0x560b400806a2 in cbus_call_done+3f
    7  0x560b400805ea in cmsg_deliver+30
    8  0x560b40080e4b in cbus_process+51
    9  0x560b4003046b in _ZL10tx_prio_cbP7ev_loopP10ev_watcheri+2b
    10 0x560b4023d86e in ev_invoke_pending+ca
    11 0x560b4023e772 in ev_run+5a0
    12 0x560b3ff822dc in main+5ed
    13 0x7f52af6862b1 in __libc_start_main+f1
    14 0x560b3ff801da in _start+2a
    15 (nil) in +2a

Fix this by elevating the run reference counter per each read task.

Note, currently we use vy_run::refs not only as a reference counter, but
also as a counter of slices created for the run - see how we compare it
to vy_run::compacted_slice_count in vy_task_compact_complete(). This
isn't going to work anymore, obviously. Now we need to count slices
created per each run in a separate counter, vy_run::slice_count. Anyway,
it was a rather dubious hack to abuse reference counter for counting
slices and it's good to finally get rid of it.

2a7cf7f5

vinyl: use ERRINJ_DOUBLE for ERRINJ_VY_READ_PAGE_TIMEOUT · 8dc9895f

Vladimir Davydov authored 6 years ago

We use ERRINJ_DOUBLE for all other timeout injections. This makes them
more flexible as we can inject an arbitrary timeout in tests, not just
enable some hard-coded timeout. Besides, it makes tests easier to
follow. So let's use ERRINJ_DOUBLE for ERRINJ_VY_READ_PAGE_TIMEOUT too.

8dc9895f

alter: do not crash if sequence is created for space with no indexes · 95aefec3

Vladimir Davydov authored 6 years ago

If a space has no indexes, index_find() will return NULL, which will be
happily dereferenced by on_replace_dd_sequence(). Looks like this bug
goes back to the time when we made index_find() exception-free and
introduced index_find_xc() wrapper. Fix it and add a test case.

95aefec3

Apr 05, 2018

log: Fix syslog logger · 7c7a2fa1

Ilya Markov authored 7 years ago

* Remove rewriting format of default logger in case of syslog option.
* Add facility option parsing and use parsed results in format message
  according to RFC3164. Possible values and default value of syslog
  facility are taken from nginx (https://nginx.ru/en/docs/syslog.html)
* Move initialization of logger type and format fucntion before
  initialization of descriptor in log_XXX_init, so that we can test
  format function of syslog logger.

Closes gh-3244.

7c7a2fa1

Apr 04, 2018
- Add 'key_def_new_with_parts' (temporary) · 7d089bbd
  Alexander Turenko authored 6 years ago
  
  Filed gh-3311 to remove this export soon. Fixes #3310.
  7d089bbd
Apr 03, 2018

vinyl: fail transaction immediately if it does not fit in memory · 8f63d5d9

Vladimir Davydov authored 6 years ago

If the size of a transaction is greater than the configured memory
limit (box.cfg.vinyl_memory), the transaction will hang on commit
for 60 seconds (box.cfg.vinyl_timeout) and then fail with the
following error message:

  Timed out waiting for Vinyl memory quota

This is confusing. Let's fail such transactions immediately with
OutOfMemory error.

Closes #3291

8f63d5d9

Apr 02, 2018
- Removed precise due to EOL and added artful (#3302) · 886be65a
  Arseny Antonov authored 6 years ago
  
  886be65a
Mar 30, 2018

replication: recover missing local data from replica · eae84efb

Konstantin Belyavskiy authored 6 years ago

In case of sudden power-loss, if data was not written to WAL but
already sent to remote replica, local can't recover properly and
we have different datasets. Fix it by using remote replica's data
and LSN comparison.

Based on @GeorgyKirichenko proposal and @locker race free check.

Closes #3210

eae84efb

replication: stay in orphan mode until replica is synced by vclock · 7ebc8ae4

Konstantin Belyavskiy authored 6 years ago

Stay in orphan (read-only) mode until local vclock is lower than
master's to make sure that datasets are the same across replicaset.
Update replication/catch test to reflect the change.

Suggested by @kostja

Needed for #3210

7ebc8ae4

Update LuaRocks · 3171288c
Vladimir Davydov authored 6 years ago
```
Closes #3148
```
3171288c

libev: use clock_gettime on OS X if available · 10af1cb1

Vladimir Davydov authored 6 years ago

EV_USE_REALTIME and EV_USE_MONOTONIC, which force libev to use
clock_gettime, are enabled automatically on Linux, but not on OS X. We
used to forcefully enable them for performance reasons, but this broke
compilation on certain OS X versions and so was disabled by commit
d36ba279 ("Fix gh-1777: clock_gettime detected but unavailable in
macos"). Today we need these features enabled not just because of
performance, but also to avoid crashes when time changes on the host -
see issue #2527 and commit a6c87bf9 ("Use ev_monotonic_now/time
instead of ev_now/time for timeouts"). Fortunately, we have this cmake
defined macro HAVE_CLOCKGETTIME_DECL, which is set if clock_gettime is
available. Let's enable EV_USE_REALTIME and EV_USE_MONOTONIC if this
macro is defined.

Closes #3299

10af1cb1

Mar 29, 2018

log: Fix logging large objects · 5ab4581d

Ilya Markov authored 6 years ago

The bug was that logging we passed to function write
number of bytes which may be more than size of buffer.
This may happen because formatting log string we use vsnprintf which
returns number of bytes would be written to buffer, not the actual
number.

Fix this with limiting number of bytes passing to write function.

Close #3248

5ab4581d

say: Fix log_rotate · 26a4effe

Ilya Markov authored 7 years ago

* Refactor tests.
* Add ev_async and fiber_cond for thread-safe log_rotate usage.

Follow up #3015

26a4effe

log: Fix logger.test.lua · d0dcc8b9

Ilya Markov authored 7 years ago

Fix race condition in test on log_rotate.
Test opened file that must be created by log_rotate and read from it.
But as log_rotate is executed in separate thread, file may be not
created or log may be not written yet by the time of opening in test.

Fix this with waiting creation and reading the line.

d0dcc8b9

netbox: show is_nullable and collation fields · cc935d24

Kirill Shcherbatov authored 7 years ago

Netbox does not need nullability or collation info, but some
customers do. Lets fill index parts with these fields.

Fixes #3256

cc935d24

Mar 27, 2018

Clear session storage on session stop · cd48321d

Georgy Kirichenko authored 7 years ago

* session_run_on_disconnect_triggers is called only if there are
corresponding triggers so move session_storage_cleanup to
session_destroy.
* fix session storage cleanup path: use
"box.session.aggregate_storage[sid]" instead of
"session.aggregate_storage[sid]" (what was wrong)

Fixed #3279

cd48321d

Mar 22, 2018

test: fix a failing test after merging new fio.pathjoin() implementation · 6267bc6c
Konstantin Osipov authored 7 years ago

6267bc6c

[fio] allow empty path part in pathjoin (#3260) · 49c1de8f

Alec Larson authored 7 years ago

Empty strings should be ignored, rather than throw an error.

Passing only empty strings (or nothing) to `pathjoin` will return '.' which means the current directory

Every path part passed to `pathjoin` is now converted to a string

The `gsub('/+', '/')` call already does what the removed code does, so avoid the unnecessary work
Simply check if the result equals '/' before removing a trailing '/'.

The previous check did extra work for no gain.

49c1de8f

Mar 21, 2018
- Remove empty function declaration · e954dcf7
  Vladislav Shpilevoy authored 7 years ago
  
  e954dcf7
Mar 20, 2018

iproto: get iproto obuf only right before usage · 7a147583

Vladislav Shpilevoy authored 7 years ago

It is possible to discard non-sent responses using a special
sequence of requests and yields. In details: if DML requests
yield on commit too long, there are fast read requests, and a
network is saturated, then some non-sent DML responses are
discarded.

Closes #3255

7a147583