Commits · 28fcdaa038aa2f11aace69614ba67cccdadd4f52 · core / tarantool

Dec 14, 2018

test: run full testing only on long-term branches · 28fcdaa0

Disabled LTO builds, tarballs and packages building on short-term
branches.

Removed 'allow_failures' on coverage / debug build.

Replaced matrix expansion with the list of jobs (because Travis-CI
documentation says it does not support condition jobs with matrix
expansion).

Fixes #3755.

28fcdaa0

Dec 13, 2018

Add tarantoolctl rocks doc subcommand · 7af4aeb8

Yaroslav Dynnikov authored 6 years ago

Allow exploring rocks documentation locally with
`tarantoolctl rocks doc ...`

In context of #3753

7af4aeb8

Dec 12, 2018

fio: cleanup error messages · 802672c6

Roman Khabibov authored 6 years ago

Otherwise it is hard to debug code throwing exceptions.
fio.pathjoin() was just one example among many.

@locker:
 - slightly edit error messages
 - filter out line numbers from error messages in the test

Closes #3580

802672c6

gc: fix panic if checkpoint_interval is updated during checkpoint · 1e73b71d

Vladimir Davydov authored 6 years ago

To wait for a thread writing a snap file memtx_engine_wait_checkpoint()
uses cord_cojoin(), which doesn't tolerate spurious wakeups. Hence we
must not wake up the checkpoint fiber after reconfiguring the checkpoint
interval if it's currently making a checkpoint.

Fixes 4c04808a ("Rewrite checkpoint daemon in C").

Closes #3878

1e73b71d

Rate limit certain warning messages · e6ebd5eb

Vladimir Davydov authored 6 years ago

There are a few warning messages that can easily flood the log, making
it more difficult to figure out what causes the problem. Those are
 - too long WAL write
 - waited for ... bytes of vinyl memory quota for too long
 - get/select(...) => ... took too long
 - readahead limit is reached
 - net_msg_max limit is reached

Actually, it's pointless to print each and every of them, because all
messages of the same kind are similar and don't convey any additional
information.

So this patch limits the rate at which those messages may be printed.
To achieve that, it introduces say_ratelimited() helper, which works
exactly like say() except it does nothing if too many messages of
the same kind have already been printed in the last few seconds.
The implementation is trivial - say_ratelimited() defines a static
ratelimit state variable at its call site (it's a macro) and checks it
before logging anything. If the ratelimit state says that an event may
be emitted, it will log the message, otherwise it will skip it and
eventually print the total number of skipped messages instead.

The rate limit is set to 10 messages per 5 seconds for each kind of
a warning message enumerated above.

Here's how it looks in the log:

2018-12-11 18:07:21.830 [30404] iproto iproto.cc:524 W> stopping input on connection fd 15, aka 127.0.0.1:12345, peer of 127.0.0.1:59212, readahead limit is reached
2018-12-11 18:07:21.831 [30404] iproto iproto.cc:524 W> stopping input on connection fd 15, aka 127.0.0.1:12345, peer of 127.0.0.1:59212, readahead limit is reached
2018-12-11 18:07:21.831 [30404] iproto iproto.cc:524 W> stopping input on connection fd 15, aka 127.0.0.1:12345, peer of 127.0.0.1:59212, readahead limit is reached
2018-12-11 18:07:21.831 [30404] iproto iproto.cc:524 W> stopping input on connection fd 15, aka 127.0.0.1:12345, peer of 127.0.0.1:59212, readahead limit is reached
2018-12-11 18:07:21.831 [30404] iproto iproto.cc:524 W> stopping input on connection fd 15, aka 127.0.0.1:12345, peer of 127.0.0.1:59212, readahead limit is reached
2018-12-11 18:07:21.832 [30404] iproto iproto.cc:524 W> stopping input on connection fd 15, aka 127.0.0.1:12345, peer of 127.0.0.1:59212, readahead limit is reached
2018-12-11 18:07:21.832 [30404] iproto iproto.cc:524 W> stopping input on connection fd 15, aka 127.0.0.1:12345, peer of 127.0.0.1:59212, readahead limit is reached
2018-12-11 18:07:21.832 [30404] iproto iproto.cc:524 W> stopping input on connection fd 15, aka 127.0.0.1:12345, peer of 127.0.0.1:59212, readahead limit is reached
2018-12-11 18:07:21.832 [30404] iproto iproto.cc:524 W> stopping input on connection fd 15, aka 127.0.0.1:12345, peer of 127.0.0.1:59212, readahead limit is reached
2018-12-11 18:07:21.832 [30404] iproto iproto.cc:524 W> stopping input on connection fd 15, aka 127.0.0.1:12345, peer of 127.0.0.1:59212, readahead limit is reached
2018-12-11 18:07:26.851 [30404] iproto iproto.cc:524 W> 9635 messages suppressed

Closes #2218

e6ebd5eb

Introduce basic rate limiter · c1198e26
Vladimir Davydov authored 6 years ago
```
We will use it to limit the rate of log messages.

Needed for #2218
```
c1198e26

Dec 11, 2018

sio: revert semantical changes made in scope of conversion to C · f041f2d0

Konstantin Osipov authored 6 years ago

As described by Vlad in gh-3875 it's incorrect to rely on the state
of the diagnostics area to determine whether there was a EINTR
style error on sio or not. The area may be dirty with some previous
error.

f041f2d0

test: fix box/backup spurious failure · 3030f27f
Vladimir Davydov authored 6 years ago
```
Follow-up 07191842 ("gc: run garbage collection in background").

Closes #3855
```
3030f27f

iproto: fire on_disconnect right after disconnect · a032ce7c

Vladislav Shpilevoy authored 6 years ago

Before the patch on_disconnect triggers were called
only after last outstanding request had finished. It
was enough for most goals. But after box.session.push
was implemented, it appeared that it is impossible to
return an error from push() if a connection is closed.

Tx session just does not know that a connection is
already closed. The patch splits destroy and
disconnect phases of an iproto connection lifecycle.
Disconnect calls on_disconnect triggers and resets
iproto socket fd. Destroy frees resources.

Needed for #3859

a032ce7c

iproto: rename disconnect cmsg to destroy · 4d8d2a04

Vladislav Shpilevoy authored 6 years ago

Disconnect cmsg is a message that is used by iproto
thread to notify tx thread that a connection is dead
and has no outstanding requests. That is its
tx-related resourses are freed and the connection is
deleted.

But the text above is clear definition of destroy. The
patch harmonizes cmsg name and its puprose.

Secondly, the patch is motivated by #3859 which
requires separate disconnect and destroy phases.

Needed for #3859

4d8d2a04

box: manage format fields with json tree class · 42cc6779

Kirill Shcherbatov authored 6 years ago

As we going to work with format fields in a unified way, we should use
the json tree class for managing top-level tuple format fields.

@locker: comments and style fixes.

Needed for #1012

42cc6779

evio: turn into C · 4b8bf88c
Vladislav Shpilevoy authored 6 years ago
```
Needed for #3234
```
4b8bf88c

coio: fix file descriptor leak on accept · f64d79f3

Vladislav Shpilevoy authored 6 years ago

coio_accept() calls evio_setsockopt_client, which
throws an exception and just accepted socket leaks.
Yes, server socket is protected, but not new client
socket.

The bug existed even before exceptions are removed
from evio.

f64d79f3

evio: remove exceptions · de7588fa

Vladislav Shpilevoy authored 6 years ago

Remove them to be able to convert evio to C - final
step to make it usable in SWIM.

Needed for #3234

de7588fa

coio: fix double close of a file descriptor · f755748f

Vladislav Shpilevoy authored 6 years ago

coio_service_on_accept is called by evio by an
on_accept pointer. If evio obtains not zero from
on_accept pointer, it closes accepted socket. But
coio_service_on_accept closes it too, when fiber_new
fails. It is double close.

Note that the bug existed even when on_accept was able
to throw.

f755748f

evio: make on_accept be nothrow · 97dd87fa

Vladislav Shpilevoy authored 6 years ago

Evio is going to be C, because it is needed in SWIM to
1) support UNIX sockets in future;
2) do not care about setting SocketError directly. It
is not possible via bare sio, because sio_bind ignores
EADDRINUSE.

A first step to make evio C - eliminate exceptions
from its callbacks available to other modules. The
only callback it has - on_accept.

Needed for #3234

97dd87fa

sio: turn into C · dafa197b
Vladislav Shpilevoy authored 6 years ago
```
Needed for #3234
```
dafa197b

Dec 10, 2018

sio: remove exceptions · 73237320

Konstantin Osipov authored 6 years ago

Propagate exceptions up to the caller. Function sio_writev().

In scope of #3234

73237320

sio: remove exceptions · 03c72165

Konstantin Osipov authored 6 years ago

Propagate exceptions up to the caller. Functions sio_read(),
sio_write().

Add an assert for sio_write() with zero bytes.

In scope of #3234

03c72165

sio: remove exceptions · 2c741648

Konstantin Osipov authored 6 years ago

Propagate exceptions up to the caller. Functions sio_sendto(),
sio_recvfrom().

In scope of #3234

2c741648

sio: remove exceptions · 4382f7fe
Konstantin Osipov authored 6 years ago
```
Propagate exceptions up to the caller.

In scope of #3234
```
4382f7fe

sio: prepare for conversion to plain C. · a9fc08b0

Konstantin Osipov authored 6 years ago

Describe in module comments the principles of API construction
for sio.c. Add necessary comments. Add a helper function
which makes the code a little simpler.

In scope of gh-3234.

a9fc08b0

Rename json_path lib to json · d1536f9a

Vladimir Davydov authored 6 years ago

It is now suitable not only for handling JSON paths, but also for
building complex JSON structures.

Follow-up b56103f5 ("json: some renames").

d1536f9a

json: introduce json_path_cmp, json_path_validate · 78438b8a

Kirill Shcherbatov authored 6 years ago

Introduced json_path_validate routine to ensure user-defined
JSON path is valid. This will be required to raise an error if
an incorrect user-defined jason-path is detected.

Introduced json_path_cmp routine to compare JSON paths that may
have different representation.
Note that:
 - in case of paths that have same token-sequence prefix,
   the path having more tokens is assumed to be greater
 - both paths to compare must be valid

@locker: move json_path_validate to the object file (no need to
pollute the header with this cold routine).

Needed for #1012

78438b8a

json: implement json_tree class · e2d72a96

Kirill Shcherbatov authored 6 years ago

New JSON tree class would store JSON paths for tuple fields
for registered non-plain indexes. It is a hierarchical data
structure that organize JSON nodes produced by parser.
Class provides API to lookup node by path and iterate over the
tree.

JSON Indexes patch require such functionality to make lookup
for tuple_fields by path, make initialization of field map and
build vynyl_stmt msgpack for secondary index via JSON tree
iteration.

@locker:
 - Forbid root = NULL in json_tree_lookup_path.
 - Make loop variable first in foreach macros argument list.
 - Use int instead of uint32_t unless uint32_t is specifically
   required.
 - Set max_child_idx to -1 if children array is empty to avoid
   ambiguity.
 - Fix child index allocation for JSON_TOKEN_STR.
 - Rework json_tree_add and json_tree_del to make them more
   straightforward.
 - Add comments and cleanup code.

Needed for #1012

e2d72a96

json: make index_base support for json_lexer · ab96fa26

Kirill Shcherbatov authored 6 years ago

Introduced a new index_base field for json_lexer class - this
value is a base field offset for emitted JSON_TOKEN_NUM tokens.
Thus, we get rid of the need to perform manual casts using the
TUPLE_INDEX_BASE constant in the majority of cases. This will
also ensure that the extracted tuples are correctly inserted
into the numerical level of JSON tree.

@locker: use int instead of unsigned for index_base.

Needed for #1012

ab96fa26

json: use int instead of uint64_t for array indexes · ed23ef3a

Vladimir Davydov authored 6 years ago

Unsigned types are like a plague - should you use it once, and it will
quickly spread throughout your whole code base, because comparing an
unsigned value with a signed one without type conversion will make the
compiler whine. However, they are less efficient, because the compiler
has to guarantee that integer overflow works predictably for them.
That said let's make json_token::num signed. Let's also use a plain int
for it rather than int64_t, because it's highly unlikely that the
capacity of int won't be enough to store a tuple index.

ed23ef3a

Dec 09, 2018

sio: remove unused functions · 4ed1c41d

Vladislav Shpilevoy authored 6 years ago

Next patches remove exceptions from sio and convert it
to C. So as to do not care about unused functions they
are deleted.

4ed1c41d

wal: trigger checkpoint if there are too many WALs · db8c7aa3

Vladimir Davydov authored 6 years ago

Closes #1082

@TarantoolBot document
Title: Document box.cfg.checkpoint_wal_threshold

Tarantool makes checkpoints every box.cfg.checkpoint_interval seconds
and keeps last box.cfg.checkpoint_count checkpoints. It also keeps all
intermediate WAL files. Currently, it isn't possible to set a checkpoint
trigger based on the sum size of WAL files, which makes it difficult to
estimate the minimal amount of disk space that needs to be allotted to a
Tarantool instance for storing WALs to eliminate the possibility of
ENOSPC errors. For example, under normal conditions a Tarantool instance
may write 1 GB of WAL files every box.cfg.checkpoint_interval seconds
and so one may assume that 1 GB times box.cfg.checkpoint_count should be
enough for the WAL partition, but there's no guarantee it won't write 10
GB between checkpoints when the load is extreme.

So we've agreed that we must provide users with one more configuration
option that could be used to impose the limit on the sum size of WAL
files. The new option is called box.cfg.checkpoint_wal_threshold. Once
the configured threshold is exceeded, the WAL thread notifies the
checkpoint daemon that it's time to make a new checkpoint and delete
old WAL files. Note, the new option only limits the size of WAL files
created since the last checkpoint, because backup WAL files are not
needed for recovery and can be deleted in case of emergency ENOSPC, for
more details see tarantool/tarantool#1082, tarantool/tarantool#3397,
tarantool/tarantool#3822.

The default value of the new option is 1 exabyte (10^18 byte), which
actually means that the feature is disabled.

db8c7aa3

wal: pass struct instead of vclock to checkpoint methods · f33fbbf8

Vladimir Davydov authored 6 years ago

Currently, we only need to pass a vclock between TX and WAL during
checkpointing. However, in order to implement auto-checkpointing
triggered when WAL size exceeds a certain threshold, we will need to
pass some extra info so that we can properly reset the counter
accounting the WAL size in the WAL thread. To make it possible, let's
move wal_checkpoint struct, which is used internally by WAL to pass a
checkpoint vclock, to the header and require the caller to pass it to
wal_begin/commit_checkpoint instead of just a vclock.

f33fbbf8

Rewrite checkpoint daemon in C · 4c04808a

Vladimir Davydov authored 6 years ago

Long time ago, when the checkpoint daemon was added to Tarantool, it was
responsible not only for making periodic checkpoints, but also for
maintaining the configured number of checkpoints and removing old snap
and xlog files, so it was much easier to implement it in Lua than in C.
However, over time, all its responsibilities have been reimplemented in
C and moved to the server code so that now it just calls box.snapshot()
periodically. Let's rewrite this simple procedure in C as well - this
will allow us to easily add more complex logic there, e.g. triggering
checkpoint when WAL files exceed a configured threshold.

Note, this patch removes a few cases from xlog/checkpoint_daemon test
that tested the internal state of the checkpoint daemon, which isn't
available in Lua anymore. This is OK as those cases are covered by
unit/checkpoint_schedule test.

4c04808a

Introduce checkpoint schedule module · 382568b1

Vladimir Davydov authored 6 years ago

This is a very simple module that incorporates the logic for calculating
the time of the next scheduled checkpoint given the configured interval
between checkpoints. It doesn't have any dependencies, which allows to
cover it with a unit test. It will be used by the checkpoint daemon once
we rewrite it in C. Rationale: in future we might want to introduce more
complex rules for scheduling checkpoints (cron-like may be) and it will
be really nice to have this logic neatly separated and tested.

382568b1

gc: some renames · bdb6825b

Vladimir Davydov authored 6 years ago

GC module is responsible not only for garbage collection, but also for
tracking consumers and making checkpoints. Soon it will also incorporate
the checkpoint daemon. Let's prefix all members related to the cleanup
procedure accordingly to avoid confusion.

bdb6825b

box: move checkpointing to gc module · 4ef765d5

Vladimir Davydov authored 6 years ago

Garbage collection module seems to be the best way to accommodate the
checkpoint daemon, but to move it there, we first need to move the code
performing checkpoints there to avoid cross-dependency between box.cc
and gc.c.

4ef765d5

box: don't use box_checkpoint_is_in_progress outside box.cc · b4650492

Vladimir Davydov authored 6 years ago

We only use box_checkpoint_is_in_progress in SIGUSR1 signal handler to
print a warning in case checkpointing cannot be started, because it's
already done by another fiber. Actually, it's not necessary to use it
there - instead we can simply log the error returned by box_checkpoint,
which will be self-explaining ER_CHECKPOINT_IN_PROGRESS in this case.
So let's make box_checkpoint_is_in_progress private to box.cc - this
will simplify moving the checkpoint daemon to the gc module.

While we are at it, remove the unused snapshot_version declaration.

b4650492

box: fix certain cfg options initialized twice on recovery · a6f22d19

Vladimir Davydov authored 6 years ago

Certain dynamic configuration options are initialized right in box.cc,
because they are needed for recovery. All such options are supposed to
be present in dynamic_cfg_skip_at_load table so that load_cfg.lua won't
try to set them again upon recovery completion. However, not all of them
happen to be there - sometime we simply forgot to patch this table after
introduction of a new configuration option. This patch adds all the
missing ones except checkpoint_count - there's no point to initialize
checkpoint_count in box.cc so it removes it from box.cc instead.

a6f22d19

wal: simplify watcher API · f2db4075

Vladimir Davydov authored 6 years ago

This patch reverts changes done in order to make WAL watcher API
suitable for notiying TX about WAL garbage collection triggered on
ENOSPC, namely:

 b073b017 wal: add event_mask to wal_watcher
 7077341e wal: pass wal_watcher_msg to wal_watcher callback

We don't need them anymore, because now we piggyback the notification
on the WAL request message that triggered ENOSPC.

f2db4075

gc: do not use WAL watcher API for deactivating stale consumers · d32166a6

Vladimir Davydov authored 6 years ago

The WAL thread may delete old WAL files if it gets ENOSPC error.
Currently, we use WAL watcher API to notify the TX thread about it so
that it can shoot off stale replicas. This looks ugly, because WAL
watcher API was initially designed to propagate WAL changes to relay
threads and the new event WAL_EVENT_GC, which was introduced for
notifying about ENOSPC-driven garbage collection, isn't used anywhere
else. Besides, there's already a pipe from WAL to TX - we could reuse it
instead of opening another one.

If we followed down that path, then in order to trigger a checkpoint
from the WAL thread (see #1082), we would have to introduce yet another
esoteric WAL watcher event, making the whole design look even uglier.
That said, let's rewrite the garbage collection notification procedure
using a plane callback instead of abusing WAL watcher API.

d32166a6

netbox: fix wait_connected ignorance · a539fdc2

Vladislav Shpilevoy authored 6 years ago

After this patch d2468dac it became possible to
wrap an existing connection into netbox API. A regular
netbox.connect function was refactored so as to reuse
connection establishment code.

But connection should be established in a worker
fiber, not in a caller's one. Otherwise it is
impossible to do not wait for connect result.

The patch just moves connection establishment into a
worker fiber, without any functional changes.

Closes #3856

a539fdc2

Fix premature cdata collecting in luaT_pusherror() · 88de7c34

Alexander Turenko authored 6 years ago

This is follow up of 28c7e667 to fix
luaT_pusherror() itself, not only luaT_error().

Fixes #1955 (again).

88de7c34