Commits · 89db11f40fc88430bbb0995fa502940b72180a61 · core / tarantool

Apr 14, 2021

build: install libcurl headers · 38d0b0c1

Roman Khabibov authored 4 years ago

Ship libcurl headers to system path "${PREFIX}/include/tarantool"
in the case of libcurl included as bundled library or static
build. It is needed to use SMTP client with tarantool's libcurl
instead of system libcurl.

See related issue: https://github.com/tarantool/smtp/issues/24

Closes #4559

Unverified

38d0b0c1

build: enable smtp · 4bde1dbc

Roman Khabibov authored 4 years ago

Enable smtp and smtps protocols in bundled libcurl. It is needed
to use SMTP client with tarantool's libcurl instead of system
libcurl.

See related issue: https://github.com/tarantool/smtp/issues/24

Part of #4559

Unverified

4bde1dbc

luajit: bump new version · ef55e488

Sergey Kaplun authored 3 years ago


LuaJIT submodule is bumped to introduce the following changes:
* tools: introduce --leak-only memprof parser option

Within this changeset the new Lua module providing post-processing
routines for parsed memory events is introduced:
* memprof/process.lua: post-process the collected events

The changes provide an option showing only heap difference. One can
launch memory profile parser with the introduced option via the
following command:
$ tarantool -e 'require("memprof")(arg)' - --leak-only filename.bin

Closes #5812

Reviewed-by: Igor Munkin <imun@tarantool.org>
Signed-off-by: Igor Munkin <imun@tarantool.org>

Unverified

ef55e488

box: implement box.lib module · f463b5fa

Cyrill Gorcunov authored 4 years ago


Currently to run "C" function from some external module one
have to register it first in "_func" system space. This is
a problem if node is in read-only mode (replica).

Still people would like to have a way to run such functions
even in ro mode. For this sake we implement "box.lib" lua module.

Unlike `box.schema.func` interface the `box.lib` does not defer module
loading procedure until first call of a function. Instead a module
is loaded immediately and if some error happens (say shared
library is corrupted or not found) it pops up early.

The need of use stored C procedures implies that application is
running under serious loads most likely there is modular structure
present on Lua level (ie same shared library is loaded in different
sources) thus we cache the loaded library and reuse it on next
load attempts. To verify that cached library is up to day the
module_cache engine test for file attributes (device, inode, size,
modification time) on every load attempt.

Since both `box.schema.func` and `box.lib` are using caching to minimize
module loading procedure the pass-through caching scheme is
implemented:

 - box.lib relies on module_cache engine for caching;
 - box.schema.func does snoop into box.lib hash table when attempt
   to load a new module, if module is present in box.lib hash then
   it simply referenced from there and added into own hash table;
   in case if module is not present then it loaded from the scratch
   and put into both hashes;
 - the module_reload action in box.schema.func invalidates module_cache
   or fill it if entry is not present.

Closes #4642

Co-developed-by: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

@TarantoolBot document
Title: box.lib module

Overview
========

`box.lib` module provides a way to create, delete and execute
`C` procedures from shared libraries. Unlike `box.schema.func`
methods the functions created with `box.lib` help are not persistent
and live purely in memory. Once a node get turned off they are
vanished. An initial purpose for them is to execute them on
nodes which are running in read-only mode.

Module functions
================

`box.lib.load(path) -> obj | error`
-----------------------------------

Loads a module from `path` and return an object instance
associate with the module, otherwise an error is thrown.

The `path` should not end up with shared library extension
(such as `.so`), only a file name shall be there.

Possible errors:

- IllegalParams: module path is either not supplied
  or not a string.
- SystemError: unable to open a module due to a system error.
- ClientError: a module does not exist.
- OutOfMemory: unable to allocate a module.

Example:

``` Lua
-- Without error handling
m = box.lib.load('path/to/library)

-- With error handling
m, err = pcall(box.lib.load, 'path/to/library')
if err ~= nil then
    print(err)
end
```

`module:unload() -> true | error`
---------------------------------

Unloads a module. Returns `true` on success, otherwise an error
is thrown. Once the module is unloaded one can't load new
functions from this module instance.

Possible errors:

- IllegalParams: a module is not supplied.
- IllegalParams: a module is already unloaded.

Example:

``` Lua
m = box.lib.load('path/to/library')
--
-- do something with module
--
m:unload()
```

If there are functions from this module referenced somewhere
in other places of Lua code they still can be executed because
the module continue sitting in memory until the last reference
to it is closed.

If the module become a target to the Lua's garbage collector
then unload is called implicitly.

`module:load(name) -> obj | error`
----------------------------------

Loads a new function with name `name` from the previously
loaded `module` and return a callable object instance
associated with the function. On failure an error is thrown.

Possible errors:
 - IllegalParams: function name is either not supplied
   or not a string.
 - IllegalParams: attempt to load a function but module
   has been unloaded already.
 - ClientError: no such function in the module.
 - OutOfMemory: unable to allocate a function.

Example:

``` Lua
-- Load a module if not been loaded yet.
m = box.lib.load('path/to/library')
-- Load a function with the `foo` name from the module `m`.
func = m:load('foo')
```

In case if there is no need for further loading of other
functions from the same module then the module might be
unloaded immediately.

``` Lua
m = box.lib.load('path/to/library')
func = m:load('foo')
m:unload()
```

`function:unload() -> true | error`
-----------------------------------

Unloads a function. Returns `true` on success, otherwise
an error is thrown.

Possible errors:
 - IllegalParams: function name is either not supplied
   or not a string.
 - IllegalParams: the function already unloaded.

Example:

``` Lua
m = box.lib.load('path/to/library')
func = m:load('foo')
--
-- do something with function and cleanup then
--
func:unload()
m:unload()
```

If the function become a target to the Lua's garbage collector
then unload is called implicitly.

Executing a loaded function
===========================

Once function is loaded it can be executed as an ordinary Lua call.
Lets consider the following example. We have a `C` function which
takes two numbers and returns their sum.

``` C
int
cfunc_sum(box_function_ctx_t *ctx, const char *args, const char *args_end)
{
	uint32_t arg_count = mp_decode_array(&args);
	if (arg_count != 2) {
		return box_error_set(__FILE__, __LINE__, ER_PROC_C, "%s",
				     "invalid argument count");
	}
	uint64_t a = mp_decode_uint(&args);
	uint64_t b = mp_decode_uint(&args);

	char res[16];
	char *end = mp_encode_uint(res, a + b);
	box_return_mp(ctx, res, end);
	return 0;
}
```

The name of the function is `cfunc_sum` and the function is built into
`cfunc.so` shared library.

First we should load it as

``` Lua
m = box.lib.load('cfunc')
cfunc_sum = m:load('cfunc_sum')
```

Once successfully loaded we can execute it. Lets call the
`cfunc_sum` with wrong number of arguments

``` Lua
cfunc_sum()
 | ---
 | - error: invalid argument count
```

We will see the `"invalid argument count"` message in output.
The error message has been set by the `box_error_set` in `C`
code above.

On success the sum of arguments will be printed out.

``` Lua
cfunc_sum(1, 2)
 | ---
 | - 3
```

The functions may return multiple results. For example a trivial
echo function which prints arguments passed in.

``` Lua
cfunc_echo(1,2,3)
 | ---
 | - 1
 | - 2
 | - 3
```

Module and function caches
==========================

Loading a module is relatively slow procedure because operating
system needs to read the library, resolve its symbols and etc.
Thus to speedup this procedure if the module is loaded for a first
time we put it into an internal cache. If module is sitting in
the cache already and new request to load comes in -- we simply
reuse a previous copy. In case if module is updated on a storage
device then on new load attempt we detect that file attributes
(such as device number, inode, size, modification time) get changed
and reload module from the scratch. Note that newly loaded module
does not intersect with previously loaded modules, the continue
operating with code previously read from cache.

Thus if there is a need to update a module then all module instances
should be unloaded (together with functions) and loaded again.

Similar caching technique applied to functions -- only first function
allocation cause symbol resolving, next ones are simply obtained from
a function cache.

f463b5fa

box/func: fix modules functions restore · b9f2bf4e

Cyrill Gorcunov authored 3 years ago


In commit 96938faf (Add hot function reload for C procedures)
an ability to hot reload of modules has been introduced.
When module is been reloaded his functions are resolved to
new symbols but if something went wrong it is supposed
to restore old symbols from the old module.

Actually current code restores only one function and may
crash if there a bunch of functions to restore. Lets fix it.

Fixes #5968

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

b9f2bf4e

applier: process synchro rows after WAL write · b259e930

Vladislav Shpilevoy authored 3 years ago

Applier used to process synchronous rows CONFIRM and ROLLBACK
right after receipt before they are written to WAL.

That led to a bug that the confirmed data became visible, might be
accessed by user requests, then the node restarted before CONFIRM
finished its WAL write, and the data was not visible again. That
is just like if it would be rolled back, which is not acceptable.

Another case - CONFIRM WAL write could simply fail due to any
reason (no disk space, OOM), but the transactions would remain
confirmed anyway.

Also that produced some hacks in the limbo's code to support the
confirmation and rollback of transactions not yet written to WAL.

The patch makes the synchro rows processed only after they are
written to WAL. Although the 'rollback' case above might still
happen if the xlogs were in the kernel caches, and the machine was
powered off before they were flushed to disk. But that is not
related to qsync specifically.

To handle the synchro rows after WAL write the patch makes them go
to WAL in a blocking way (journal_write() instead of
journal_write_try_async()). Otherwise it could happen that a
CONFIRM/ROLLBACK is being written to WAL and would clear the limbo
afterwards, but a new transaction arrives with a different owner,
and it conflicts with the current limbo owner.

Closes #5213

b259e930

update: allow update absent nullable fields · 2bb373b9

Mary Feofanova authored 4 years ago

Update operations could not insert with gaps. This patch changes
the behavior so that the update operation fills the missing fields
with nulls.
Part of #3378

@TarantoolBot document
Title: Allow update absent nullable fields
Update operations could not insert with gaps. Changed the behavior
so that the update operation fills the missing fields with nulls.
For example we create space `s = box.schema.create_space('s')`,
then create index for this space `pk = s:create_index('pk')`, and
then insert tuple in space `s:insert{1, 2}`. After all of this we
try to update this tuple `s:update({1}, {{'!', 5, 6}})`. In previous
version this operation fails with ER_NO_SUCH_FIELD_NO error, and now
it's finished with success and there is [1, 2, null, null, 6] tuple in
space.

2bb373b9

Apr 13, 2021

iproto: implement ability to run multiple iproto threads · 2ede3be3

mechanik20051988 authored 3 years ago

There are users that have specific workloads where iproto thread
is the bottleneck of throughput: iproto thread's code is 100% loaded
while TX thread's core is not. For such cases it would be nice to have
a capability to create several iproto threads.

Closes #5645

@TarantoolBot document
Title: implement ability to run multiple iproto threads
Implement ability to run multiple iproto threads, which is useful
in some specific workloads where iproto thread is the bottleneck
of throughput. To specify count of iproto threads, user should used
iproto_threads option in box.cfg. For example if user want to start
8 iproto threads, he must enter `box.cfg{iproto_threads=8}`. Default
iproto threads count == 1. This option is not dynamic, so user can't
change it after first setting, until server restart. Distribution of
connections per threads is managed by OS kernel.

2ede3be3

build: fix configuring using cmake3 command · 820d2be6

Alexander Turenko authored 3 years ago

`cmake` command was hardcoded for configuring libcurl, however only
`cmake3` may be installed in a system. Now we use the same cmake command
for configuring libcurl as one that is used for configuring tarantool
itself.

The problem exists since 2.6.0-196-g2b0760192 ('build: enable cmake in
curl build').

Fixes #5955

820d2be6

Apr 12, 2021

qsync: provide box.info.synchro interface for monitoring · bce3b581

Cyrill Gorcunov authored 4 years ago


In commit 14fa5fd8 (cfg: support symbolic evaluation of
replication_synchro_quorum) we implemented support of
symbolic evaluation of `replication_synchro_quorum` parameter
and there is no easy way to obtain it current run-time value,
ie evaluated number value.

Moreover we would like to fetch queue length on transaction
limbo for tests and extend this statistics in future. Thus
lets add them.

Closes #5191

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

@TarantoolBot document
Title: Provide `box.info.synchro` interface

The `box.info.synchro` leaf provides information about details of
synchronous replication.

In particular `quorum` represent the current value of synchronous
replication quorum defined by `replication_synchro_quorum`
configuration parameter because it can be set as dynamic formula
such as `N/2+1` and the value depends on current number of replicas.

Since synchronous replication does not commit data immediately
but waits for its propagation to replicas the data sits in a queue
gathering `commit` responses from remote nodes. Current number of
entries waiting in the queue is shown via `queue.len` member.

A typical output is the following

``` Lua
tarantool> box.info.synchro
---
- queue:
    len: 0
  quorum: 1
...
```

The `len` member shows current number of entries in the queue.
And the `quorum` member shows an evaluated value of
`replication_synchro_quorum` parameter.

bce3b581

Apr 05, 2021

recovery: make it transactional · 9311113d

Vladislav Shpilevoy authored 3 years ago

Recovery used to be performed row by row. It was fine because
anyway all the persisted rows are supposed to be committed, and
should not meet any problems during recovery so a transaction
could be applied partially.

But it became not true after the synchronous replication
introduction. Synchronous transactions might be in the log, but
can be followed by a ROLLBACK record which is supposed to delete
them.

During row-by-row recovery, firstly, the synchro rows each turned
into a sync transaction. Which is probably fine. But the rows on
non-sync spaces which were a part of a sync transaction, could be
applied right away bypassing the limbo leading to all kind of the
sweet errors like duplicate keys, or inconsistency of a partially
applied transaction.

The patch makes the recovery transactional. Either an entire
transaction is recovered, or it is rolled back which normally
happens only for synchro transactions followed by ROLLBACK.

In force recovery of a broken log the consistency is not
guaranteed though.

Closes #5874

9311113d

replication: do not ignore replica vclock on register · f42fee5a

Serge Petrenko authored 4 years ago

There was a bug in box_process_register. It decoded replica's vclock but
never used it when sending the registration stream. So the replica might
lose the data in range (replica_vclock, start_vclock).

Follow-up #5566

f42fee5a

replication: tolerate synchro rollback during final join · 3ec0e87f

Serge Petrenko authored 4 years ago

Both box_process_register and box_process_join had guards ensuring that
not a single rollback occured for transactions residing in WAL around
replica's _cluster registration.
Both functions would error on a rollback and make the replica retry
final join.

The reason for that was that replica couldn't process synchronous
transactions correctly during final join, because it applied the final
join stream row-by-row.

This path with retrying final join was a dead end, because even if
master manages to receive no ROLLBACK messages around N-th retry of
box.space._cluster:insert{}, replica would still have to receive and
process all the data dating back to its first _cluster registration
attempt.
In other words, the guard against sending synchronous rows to the
replica didn't work.

Let's remove the guard altogether, since now replica is capable of
processing synchronous txs in final join stream and even retrying final
join in case the _cluster registration was rolled back.

Closes #5566

3ec0e87f

applier: fix not releasing the latch on apply_synchro_row() fail · 9ad1bd15

Serge Petrenko authored 3 years ago

Once apply_synchro_row() failed, applier_apply_tx() would simply raise
an error without unlocking replica latch. This lead to all the appliers
hanging indefinitely on trying to lock the latch for this replica.

In scope of #5566

9ad1bd15

swim: check types in __serialize methods · 1d121c12

Vladislav Shpilevoy authored 3 years ago

In swim Lua code none of the __serialize methods checked the
argument type assuming that nobody would call them directly and
mess with the types. But it happened, and is not hard to fix, so
the patch does it.

The serialization functions are sanitized for the swim object,
swim member, and member event.

Closes #5952

1d121c12

swim: fix crash on bad member_by_uuid() call · fe33a108

Vladislav Shpilevoy authored 3 years ago

In Lua swim object's method member_by_uuid() could crash if called
with no arguments. UUID was then passed as NULL, and dereferenced.

The patch makes member_by_uuid() treat NULL like nil UUID and
return NULL (member not found). The reason is that
swim_member_by_uuid() can't fail. It can only return a member or
not. It never sets a diag error.

Closes #5951

fe33a108

lua: fix tuple leak in <key_def>.compare_with_key · db766c52

Alexander Turenko authored 4 years ago

The key difference between lbox_encode_tuple_on_gc() and
luaT_tuple_encode() is that the latter never raises a Lua error, but
passes an error using the diagnostics area.

Aside of the tuple leak, the patch fixes fiber region's memory 'leak'
(till fiber_gc()). Before the patch, the memory that is used for
serialization of the key is not freed (region_truncate()) when the
serialization fails. It is verified in the gh-5388-<...> test.

While I'm here, added a test case that just verifies correct behaviour
in case of a key serialization failure (added into key_def.test.lua).
The case does not verify whether a tuple leaks and it is successful as
before this patch as well after the patch. I don't find a simple way to
check the tuple leak within a test. Verified manually using the
reproducer from the linked issue.

Fixes #5388

db766c52

Apr 02, 2021

vinyl: skip vylog if it's newer than snap · 149ccce9

Nikita Pettik authored 4 years ago

Having data in different engines checkpoint process is handled this way:
 - wait_checkpoint memtx
 - wait_checkpoint vinyl
 - commit_checkpoint memtx
 - commit_checkpoint vinyl

In contrast to commit_checkpoint which does not tolerate fails (if
something goes wrong e.g. renaming of snapshot file - instance simply
crashes), wait_checkpoint may fail. As a part of wait_checkpoint for
vinyl engine vy_log rotation takes place: old vy_log is closed and new
one is created. At this moment, wait_checkpoint of memtx engine has
already created new *inprogress* snapshot featuring bumped vclock.
While recovering from this configuration, vclock of the latest snapshot
is used as a reference.

At the initial recovery stage (vinyl_engine_begin_initial_recovery),
we check that snapshot's vclock matches with vylog's one (they should be
the same since normally vylog is rotated along with snapshot). On the
other hand, in the directory we have old snapshot and new vylog (and new
.inprogress snapshot). In such a situation recovery (even in force mode)
was aborted. The only way to fix this dead end, user has to manually
delete last vy_log file.

Let's proceed with the same resolution while user runs force_recovery
mode: delete last vy_log file and update vclock value. If user uses
casual recovery, let's print verbose message how to fix this situation
manually.

Closes #5823

149ccce9

sql: ignore \0 in string passed to Lua-function · 22e2e4ea

Mergen Imeev authored 3 years ago

Prior to this patch string passed to user-defined Lua-function from SQL
was cropped in case it contains '\0'. At the same time, it wasn't
cropped if it is passed to the function from BOX. After this patch the
string won't be cropped when passed from SQL if it contain '\0'.

Closes #5938

22e2e4ea

Mar 31, 2021

gc/xlog: delay xlog cleanup until relays are subscribed · 2fd51aea

Cyrill Gorcunov authored 4 years ago


In case if replica managed to be far behind the master node
(so there are a number of xlog files present after the last
master's snapshot) then once master node get restarted it
may clean up the xlogs needed by the replica to subscribe
in a fast way and instead the replica will have to rejoin
reading a number of data back.

Lets try to address this by delaying xlog files cleanup
until replicas are got subscribed and relays are up
and running. For this sake we start with cleanup fiber
spinning in nop cycle ("paused" mode) and use a delay
counter to wait until relays decrement them.

This implies that if `_cluster` system space is not empty
upon restart and the registered replica somehow vanished
completely and won't ever come back, then the node
administrator has to drop this replica from `_cluster`
manually.

Note that this delayed cleanup start doesn't prevent
WAL engine from removing old files if there is no
space left on a storage device. The WAL will simply
drop old data without a question.

We need to take into account that some administrators
might not need this functionality at all, for this
sake we introduce "wal_cleanup_delay" configuration
option which allows to enable or disable the delay.

Closes #5806

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

@TarantoolBot document
Title: Add wal_cleanup_delay configuration parameter

The `wal_cleanup_delay` option defines a delay in seconds
before write ahead log files (`*.xlog`) are getting started
to prune upon a node restart.

This option is ignored in case if a node is running as
an anonymous replica (`replication_anon = true`). Similarly
if replication is unused or there is no plans to use
replication at all then this option should not be considered.

An initial problem to solve is the case where a node is operating
so fast that its replicas do not manage to reach the node state
and in case if the node is restarted at this moment (for various
reasons, for example due to power outage) then `*.xlog` files might
be pruned during restart. In result replicas will not find these
files on the main node and have to reread all data back which
is a very expensive procedure.

Since replicas are tracked via `_cluster` system space this we use
its content to count subscribed replicas and when all of them are
up and running the cleanup procedure is automatically enabled even
if `wal_cleanup_delay` is not expired.

The `wal_cleanup_delay` should be set to:

 - `0` to disable the cleanup delay;
 - `>= 0` to wait for specified number of seconds.

By default it is set to `14400` seconds (ie `4` hours).

In case if registered replica is lost forever and timeout is set to
infinity then a preferred way to enable cleanup procedure is not setting
up a small timeout value but rather to delete this replica from `_cluster`
space manually.

Note that the option does *not* prevent WAL engine from removing
old `*.xlog` files if there is no space left on a storage device,
WAL engine can remove them in a force way.

Current state of `*.xlog` garbage collector can be found in
`box.info.gc()` output. For example

``` Lua
 tarantool> box.info.gc()
 ---
   ...
   is_paused: false
```

The `is_paused` shows if cleanup fiber is paused or not.

2fd51aea

Mar 29, 2021
- luajit: bump new version · cec37a6b
  Igor Munkin authored 3 years ago
  
  * tools: make memprof parser output user-friendly Closes #5811 Part of #5657
  Unverified
  
  cec37a6b
- luajit: bump new version · f5a9ee1d
  Igor Munkin authored 3 years ago
  
  * memprof: report stack resizing as internal event Closes #5842 Follows up #5442
  Unverified
  
  f5a9ee1d
Mar 24, 2021

buffer: remove Lua registers · 911ca60e

Vladislav Shpilevoy authored 4 years ago

Lua buffer module used to have a couple of preallocated objects of
type 'union c_register'. It was a bunch of C scalar and array
types intended for use instead of ffi.new() where it was needed to
allocate a temporary object like 'int[1]' just to be able to pass
'int *' into a C function via FFI.

It was a bit faster than ffi.new() even for small sizes. For
instance (when JIT works), getting a register to use it as
'int[1]' cost around 0.2-0.3 ns while ffi.new('int[1]') costs
around 0.4 ns. Also the code looked cleaner.

But Lua registers were global and therefore had the same issue as
IBUF_SHARED and static_alloc() in Lua - no ownership, and sudden
reuse when GC starts right the register is still in use in some
Lua code. __gc handlers could wipe the register values making the
original code behave unpredictably.

IBUF_SHARED was fixed by proper ownership implementation, but it
is not necessary with Lua registers. It could be done with the
buffer.ffi_stash_new() feature, but its performance is about 0.8
ns which is worse than plain ffi.new() for simple scalar types.

This patch eliminates Lua registers, and uses ffi.new() instead
everywhere.

Closes #5632

911ca60e

Mar 22, 2021
- Add changelog entry for #5451 · a87a8ed4
  Oleg Babin authored 4 years ago
  
  This patch adds previously missing changelog entry. Follow-up #5451
  a87a8ed4
Mar 19, 2021

lua: separate sched and script diag · f4e248c0

Vladislav Shpilevoy authored 4 years ago

When Lua main script was launched, the sched fiber passed its own
diag to the script's fiber. When the script was finished, it put
its error into the diag. The sched fiber then checked if the diag
is empty to detect an error.

But it wasn't really correct. The error could also happen right in
the scheduler fiber in a libev callback. For example, in one of
ev_io callbacks in SWIM. Then the process would end with an error
even if the script was finished successfully.

These errors were not related to the main fiber executing the
script.

The patch makes so the scheduler fiber's diag no longer is used as
an indication of an error in the script. Instead, a new diag is
created on the stack of the scheduler's fiber, where the Lua
script saves the error.

Closes #5864

f4e248c0

wal: introduce limits on simultaneous writes · de93b448

Serge Petrenko authored 4 years ago

Since the introduction of asynchronous commit, which doesn't wait for a
WAL write to succeed, it's quite easy to clog WAL with huge amounts
write requests. For now, it's only possible from an applier, since it's
the only user of async commit at the moment.

This happens when replica is syncing with master and reads new
transactions at a pace higher than it can write them to WAL (see docbot
request for detailed explanation).

To ameliorate such behavior, we need to introduce some limit on
not-yet-finished WAL write requests. This is what this commit is trying
to do.
A new counter is added to wal writer: queue_size (in bytes) together with a
corresponding configuration setting: `wal_queue_max_size`.
The counter is increased on every new submitted request, and decreased once
the tx thread receives a confirmation that a specific request was written.

Actually, the limit is added to an abstract journal queue, but
currently works only for wal writer, since it's the only possible journal
when applier is working.

Once size reaches its maximum value, applier is blocked until
some of the write requests are finished.

The size limit isn't strict, i.e. if there's at least one free byte, the
whole write request fits and no blocking is involved.

The feature is ready for `box.commit{is_async=true}`. Once it's
implemented, it should check whether the queue is full and let the user
decide what to do next. Either wait or roll the tx back.

Closes #5536

@TarantoolBot document
Title: new configuration option: 'wal_queue_max_size'

`wal_queue_max_size` puts a limit on the amount of concurrent write requests
submitted to WAL.
`wal_queue_max_size` is measured in number of bytes to be written (0
means unlimited, which was the default behaviour before).
The option only affects replica behaviour at the moment, and defaults
to 16 megabytes. The option limits the pace at which replica reads new
transactions from master.

Here's when the option comes in handy:

Before this option was introduced such a situation could be possible:
there are 2 servers, a master and a replica, and the replica is down for
some period of time. While the replica is down, master serves requests
at a reasonable pace, possibly close to its WAL throughput limit. Once the
replica reconnects, it has to receive all the data master has piled up and
there's no limit in speed at which master sends the data to replica, and,
without the option, there was no limit in speed at which replica submitted
corresponding write requests to WAL.

This lead to a situation when replica's WAL was never in time to serve the
requests and the amount of pending requests was constantly growing.
There was no limit for memory WAL write requests take, and this clogging
of WAL write queue could even lead to replica using up all the available
memory.

Now, when `wal_queue_max_size` is set, appliers will stop reading new
transactions once the limit is reached. This will let WAL process all the
requests that have piled up and free all the excess memory.

de93b448

Implement on_shutdown API · 3010f024

mechanik20051988 authored 4 years ago

Implemented on_shutdown API, which allows to register functions
that will be called when the tarantool stopped. Functions will
be called in the reverse order they are registered. So the module
developer registers one fuction that starts module termination and
waits for its competition. This function should be fast or used an
asynchronous waiting mechanism (coio_wait or cord_cojoin for example).

Closes #5723

@TarantoolBot document
Title: Implement on_shutdown API
Implemented on_shutdown API, which allows to register functions
that will be called when the tarantool stopped. Functions will
be called in the reverse order they are registered. So the module
developer registers one fuction that starts module termination and
waits for its competition. This function should be fast or used an
asynchronous waiting mechanism (coio_wait or cord_cojoin for example).

3010f024

lua: change on_shutdown triggers behaviour · 357f1551

mechanik20051988 authored 4 years ago

Previously lua on_shutdown triggers were started sequentially, now
each of triggers starts in a separate fiber. Tarantool waits for 3.0
seconds to their completion by default. User has the option to change
this value using new implemented box.ctl.set_on_shutdown_timeout function.
If timeout has expired, tarantool immediately stops, without waiting for
other triggers completion.
Also moved ev_break from trigger to the on_shutdown_f function, after
calling all on_shutdown lua triggers, because now all triggers are
started asynchronously in fibers, and we should call ev_break only
after all triggers are finished.

Part of #5723

@TarantoolBot document
Title: Changed Lua on_shutdown triggers behaviour.
Previously lua on_shutdown triggers were started sequentially, now
each of triggers starts in a separate fiber. Tarantool waits for 3.0
seconds to their completion by default. User has the option to change
this value using new implemented box.ctl.set_on_shutdown_timeout function.
If timeout has expired, tarantool immediately stops, without waiting for
other triggers completion.

357f1551

box: rename granularity option in box.cfg{} to slab_alloc_granularity · 501da2bf

mechanik20051988 authored 4 years ago

Renamed granularity option to slab_alloc_granularity, according
to the name of the other options for small allocator.

Follow-up #5518

501da2bf

Mar 15, 2021
- Bump test-run (Python 3 support) · f203bf33
  Sergey Bronnikov authored 4 years ago
  
  Closes #5652
  f203bf33
Mar 12, 2021

lua: fix tarantool -e always enters interactive mode · 0787483c

Artem Starshov authored 4 years ago

The reason why tarantool -e always enters interactive mode is that
statement after option -e isn't considered as a script.

In man PUC-Rio lua there are different names for statement -e (stat)
and script, but they have the same behavior regarding interactive
mode. (Also cases, when interpreter loads stdin, have the same behaviour).

NOTE: test for this code fix uses errinjs, and the last one should work only
in debug mode, so added `release_disabled` in suite.ini. But there is a bug in
test-run: `release_disable` disables tests at each build type. Partially this
problem is descripted in tarantool/test-run#199.

Fixes #5040

0787483c

Mar 11, 2021

memxt: add granularity option to box.cfg{} · 53c0e910

mechanik20051988 authored 4 years ago

Granularity is an option that allows user to set
multiplicity of memory allocation in small allocator.
Granulatiry must be exponent of two and >= 4. By default
granularity value == sizeof(intptr_t), as it was before,
when this option was not provided.

@TarantoolBot document
Title: Add 'granularity' option to box.cfg{}
Add granularity option that allows user to set multiplicity
of memory allocation in small allocator. Granularity determines
not only alignment of objects, but also size of the objects in
the pool. Thus, the greater the granularity, the greater the
memory loss per one memory allocation, but tuples with different
sizes are allocated from the same mempool, and we do not lose
memory on the slabs, when we have highly distributed tuple sizes.
This is somewhat similar to a large alloc factor. The smaller the
granularity, the less memory loss per allocation, if the user has
many small tuples of approximately the same size, it will be nice
to set granularity == 4 to save memory.

This option must be set once during start, default value
== sizeof(intptr_t) (8 on 64 bit platforms), as it was before, when
this option was not provided. Granularity must be exponent of two
and >= 4. Together with the slab_alloc_factor, this option gives you
full control over the behavior of small allocator.

Closes #5518

53c0e910

Mar 04, 2021
- luajit: bump new version · 3fe840b0
  Igor Munkin authored 4 years ago
  
  * core: fix cdata decrementing Closes #5820 Follows up #5187
  Unverified
  
  3fe840b0
Mar 02, 2021
- Add changelog entry for patch "relay: yield explicitly every N sent rows" · 84a2f707
  Serge Petrenko authored 4 years ago
  
  Follow-up #5762
  84a2f707
Feb 28, 2021

build: adjust LuaJIT build system · 07c83aab

Igor Munkin authored 4 years ago

LuaJIT submodule is bumped to introduce the following changes:
* test: run luacheck static analysis via CMake
* test: fix warnings found with luacheck in misclib*
* test: run LuaJIT tests via CMake
* build: replace GNU Make with CMake
* build: preserve the original build system

Since LuaJIT build system is ported to CMake in scope of the changeset
mentioned above, the module building the LuaJIT bundled in Tarantool is
completely reworked. There is no option to build Tarantool against
another prebuilt LuaJIT due to a91962c0
('Until Bug#962848 is fixed, don't try to compile with external
LuaJIT'), so all redundant options defining the libluajit to be used in
Tarantool are dropped with the related auxiliary files.

To run LuaJIT related tests or static analysis for Lua files within
LuaJIT repository, <LuaJIT-test> and <LuaJIT-luacheck> targets are used
respectively as a dependency of the corresponding Tarantool targets.

As an additional dependency to run LuaJIT tests, prove[1] utility is
required, so the necessary binary packages are added to the lists with
build requirements.

[1]: https://metacpan.org/pod/TAP::Harness#prove



Closes #4862
Closes #5470
Closes #5631

Reviewed-by: Sergey Kaplun <skaplun@tarantool.org>
Reviewed-by: Timur Safin <tsafin@tarantool.org>
Signed-off-by: Igor Munkin <imun@tarantool.org>

Unverified

07c83aab

Feb 24, 2021

github-ci: add workflow to run fuzzing testing · cc456a42

Sergey Bronnikov authored 4 years ago

Tarantool has been integrated to OSS-Fuzz infrastructure, and
implemented fuzzers are passed now [1]. Patch adds a new workflow that
will execute fuzzers on each push to a master branch using OSS-Fuzz
infrastructure [2]. To reduce a load on testing machines job triggers on
changes in directory with Tarantool's source code, fuzzers source code,
corpus files and GH Actions workflow file for fuzzing.

OSS-Fuzz provides web interface with fuzzing statistics, found errors
and other useful information. Access to OSS-Fuzz web interface is
available for persons whose email addresses specified in project.yml,
committed to oss-fuzz repository [4]. Bugs found in OSS-Fuzz reported to
bugtracker [5].

Using fuzzing testing documented in Tarantool's wiki [6].

1. https://github.com/google/oss-fuzz/pull/4723
2. https://google.github.io/oss-fuzz/getting-started/continuous-integration/#integrating-into-your-repository
3. https://google.github.io/oss-fuzz/further-reading/clusterfuzz/
4. https://github.com/google/oss-fuzz
5. https://bugs.chromium.org/p/oss-fuzz/issues/list
6. https://github.com/tarantool/tarantool/wiki/Fuzzing

Follows up #1809

cc456a42

netbox: fix memory corruption in net.box module · 8c72b4c4

mechanik20051988 authored 4 years ago

There was a bug in the netbox module related to access
to previously released memory. To understand the essence
of error, you need to understand how GC works in Lua:
- GC checks the reachability of objects in Lua in one cycle
  and cleans out those that were unreachable.
- Lua GC object is an entity whose memory is managed by the GC,
  for example: table, function, userdata, cdata.
  In our case it's cdata object, with struct error payload.
- ffi.gc allows us to clean up Lua GC object payload at the time
  of deleting the GC object.
- Finalizer in ffi.gc is hung on the Lua GC object.

So after ffi.cast in our case first err object becomes unreachable.
It will be cleaned after some time and if finalizer hangs on it,
payload will also be cleaned. So payload in new err object
(struct error in our case) becomes invalid.

8c72b4c4

Feb 15, 2021

doc: fix recently added changelog entry · 4299daf7

Alexander Turenko authored 4 years ago

It is fixup for 8ac47898 ('memtx: fix
test for gh5304 issue and memtx_space_is_recovering function').

Without .md extension the entry will be missed by
`tools/gen-release-notes`.

Follows up #5304.

Unverified

4299daf7

Feb 12, 2021

memtx: fix test for gh5304 issue and memtx_space_is_recovering function · 8ac47898

mechanik20051988 authored 4 years ago

In previous version of patch we compared memtx state with
MEMTX_FINAL_RECOVERY to check that memtx recovery completed.
This is not quite true, memtx_state == MEMTX_FINAL_RECOVERY
means that the recovery from snapshot is finished, but recovery
from wals not. We need to compare memtx_state with MEMTX_OK
to check that recovery totally finished.
In previous test version on_replace trigger (created on
_user space) is never called. It's because is_recovery_finished()
always returns false: on_schema_init is invoked BEFORE
user's data recovery process (so trigger is not created at all
at this moment).
In new test version you can see correct user case:
we create on_replace trigger on _index system space,
which replaces/inserts/updates tuples in temp and loc spaces.
So each time user creates new space and index for it,
trigger replaces/inserts/updates tuples in temp and loc spaces.
Because trigger replaces/inserts/updates tuple with same
primary key, we get error when insert trigger called.

Follow-up #5304

8ac47898

Add changelog for #5764 (de6c76b6 ) · df99d492
Nikita Pettik authored 4 years ago

df99d492