Commits · fa7e6f7df8745a6f40bf4cb6a8e01be3fffbc93d · core / tarantool

Apr 02, 2021

sql: ignore \0 in string passed to C-function · fa7e6f7d

Mergen Imeev authored 4 years ago

Prior to this patch string passed to user-defined C-function from SQL
was cropped in case it contains '\0'. At the same time, it wasn't
cropped if it is passed to the function from BOX. Now it isn't cropped
when passed from SQL.

Part of #5938

fa7e6f7d

Mar 31, 2021

github-ci: switch off swap use · fd6ee6d5

Alexander V. Tikhonov authored 4 years ago

Github Actions provides hosts for Linux base runners in the following
configurations:

2 Cores
7 Gb memory
4 Gb swap memory

To avoid of issues with hanging/slowing tests on high memory use
like [1], hosts configurations must avoid of swap memory use. All
of the tests workflows run inside dockers containers. This patch
sets in docker run configurations memory limits based on current
github actions hosts - 7Gb memory w/o swap memory increase.

Checked 10 full runs (29 workflows in each run used the change) and
got single failed test on gevent() routine in test-run. This result much
better than w/o this patch when 3-4 of workflows fail on each full run.

It could happen because swappiness set to default value:

cat /sys/fs/cgroup/memory/memory.swappiness
60

From documentation on swappiness [2]:

This control is used to define the rough relative IO cost of swapping
and filesystem paging, as a value between 0 and 200. At 100, the VM
assumes equal IO cost and will thus apply memory pressure to the page
cache and swap-backed pages equally; lower values signify more
expensive swap IO, higher values indicates cheaper.
Keep in mind that filesystem IO patterns under memory pressure tend to
be more efficient than swap's random IO. An optimal value will require
experimentation and will also be workload-dependent.

We may try to tune how often anonymous pages are swapped using the
swappiness parameter, but our goal is to stabilize timings (and make
them as predictable as possible), so the best option is to disable swap
at all and work on descreasing memory consumption for huge tests.

For Github Actions host configurations with 7Gb RAM it means that after
2.8Gb RAM was used swap began to use. But in testing we have some tests
that use 2.5Gb of RAM like 'box/net_msg_max.test.lua' and memory
fragmentation could cause after the test run swap use [3].

Also found that disk cache could use some RAM and it also was the cause
of fast memory use and start swapping. It can be periodically dropped
from memory [4] using 'drop_cache' system value setup, but it won't fix
the overall issue with swap use.

After freed cached pages in RAM another system kernel option can be
tuned [5][6] 'vfs_cache_pressure'. This percentage value controls the
tendency of the kernel to reclaim the memory which is used for caching
of directory and inode objects. Increasing it significantly beyond
default value of 100 may have negative performance impact. Reclaim code
needs to take various locks to find freeable directory and inode
objects. With 'vfs_cache_pressure=1000', it will look for ten times more
freeable objects than there are. This patch won't do this change, but
it can be done as the next change.

To fix the issue there were made changes:

- For jobs that run tests and use actions/environment and don't use
Github Actions container tag, it was set 'sudo swapoff -a' command
in actions/environment action.

- For jobs that run tests and use Github Actions container tag the
previous solution doesn't work. It was decided to hard-code the
memory value based on found on Github Actions hosts memory size
7Gb. It was set for Github container tag as additional options:
options: '--init --memory=7G --memory-swap=7G'
This changes were made temporary till these containers tags will
be removed within resolving tarantool/tarantool-qa#101 issue for
workflows:
debug_coverage
release
release_asan_clang11
release_clang
release_lto
release_lto_clang11
static_build
static_build_cmake_linux

- For VMware VMs like with FreeBSD added 'sudo swapoff -a' command
before build commands.

- For OSX on Github actions hosts swapping already disabled:
sysctl vm.swapusage
vm.swapusage: total = 0.00M used = 0.00M free = 0.00M (encrypted)
Also manual switching off swap currently not possible due to do
System Integrity Protection (SIP) must be disabled [7], but we
don't have such access on Github Actions hosts. For local hosts
it must be done manually with [8]:
sudo nvram boot-args="vm_compressor=2"
Added swap status control to be sure that host correctly configured:
sysctl vm.swapusage

Closes tarantool/tarantool-qa#99

[1]: https://github.com/tarantool/tarantool-qa/issues/93
[2]: https://github.com/torvalds/linux/blob/1e43c377a79f9189fea8f2711b399d4e8b4e609b/Documentation/admin-guide/sysctl/vm.rst#swappiness
[3]: https://unix.stackexchange.com/questions/2658/why-use-swap-when-there-is-more-than-enough-free-space-in-ram
[4]: https://kubuntu.ru/node/13082
[5]: https://www.kernel.org/doc/Documentation/sysctl/vm.txt
[6]: http://devhead.ru/read/uskorenie-raboty-linux
[7]: https://osxdaily.com/2010/10/08/mac-virtual-memory-swap/
[8]: https://gist.github.com/dan-palmer/3082266#gistcomment-3667471

fd6ee6d5

test: box-tap/gc -- add test for is_paused field · 83ec719c

Cyrill Gorcunov authored 4 years ago


Once simple bootstrap is complete and there is no
replicas used we should run with gc unpaused.

Part-of #5806

Acked-by: Serge Petrenko <sergepetrenko@tarantool.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

83ec719c

test: add a test for wal_cleanup_delay option · 5437afe2
Cyrill Gorcunov authored 4 years ago
```
Part-of #5806

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
```
5437afe2

gc/xlog: delay xlog cleanup until relays are subscribed · 2fd51aea

Cyrill Gorcunov authored 4 years ago


In case if replica managed to be far behind the master node
(so there are a number of xlog files present after the last
master's snapshot) then once master node get restarted it
may clean up the xlogs needed by the replica to subscribe
in a fast way and instead the replica will have to rejoin
reading a number of data back.

Lets try to address this by delaying xlog files cleanup
until replicas are got subscribed and relays are up
and running. For this sake we start with cleanup fiber
spinning in nop cycle ("paused" mode) and use a delay
counter to wait until relays decrement them.

This implies that if `_cluster` system space is not empty
upon restart and the registered replica somehow vanished
completely and won't ever come back, then the node
administrator has to drop this replica from `_cluster`
manually.

Note that this delayed cleanup start doesn't prevent
WAL engine from removing old files if there is no
space left on a storage device. The WAL will simply
drop old data without a question.

We need to take into account that some administrators
might not need this functionality at all, for this
sake we introduce "wal_cleanup_delay" configuration
option which allows to enable or disable the delay.

Closes #5806

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

@TarantoolBot document
Title: Add wal_cleanup_delay configuration parameter

The `wal_cleanup_delay` option defines a delay in seconds
before write ahead log files (`*.xlog`) are getting started
to prune upon a node restart.

This option is ignored in case if a node is running as
an anonymous replica (`replication_anon = true`). Similarly
if replication is unused or there is no plans to use
replication at all then this option should not be considered.

An initial problem to solve is the case where a node is operating
so fast that its replicas do not manage to reach the node state
and in case if the node is restarted at this moment (for various
reasons, for example due to power outage) then `*.xlog` files might
be pruned during restart. In result replicas will not find these
files on the main node and have to reread all data back which
is a very expensive procedure.

Since replicas are tracked via `_cluster` system space this we use
its content to count subscribed replicas and when all of them are
up and running the cleanup procedure is automatically enabled even
if `wal_cleanup_delay` is not expired.

The `wal_cleanup_delay` should be set to:

 - `0` to disable the cleanup delay;
 - `>= 0` to wait for specified number of seconds.

By default it is set to `14400` seconds (ie `4` hours).

In case if registered replica is lost forever and timeout is set to
infinity then a preferred way to enable cleanup procedure is not setting
up a small timeout value but rather to delete this replica from `_cluster`
space manually.

Note that the option does *not* prevent WAL engine from removing
old `*.xlog` files if there is no space left on a storage device,
WAL engine can remove them in a force way.

Current state of `*.xlog` garbage collector can be found in
`box.info.gc()` output. For example

``` Lua
 tarantool> box.info.gc()
 ---
   ...
   is_paused: false
```

The `is_paused` shows if cleanup fiber is paused or not.

2fd51aea

Mar 29, 2021

luajit: bump new version · cec37a6b
Igor Munkin authored 4 years ago
```
* tools: make memprof parser output user-friendly

Closes #5811
Part of #5657
```
Unverified

cec37a6b
luajit: bump new version · f5a9ee1d
Igor Munkin authored 4 years ago
```
* memprof: report stack resizing as internal event

Closes #5842
Follows up #5442
```
Unverified

f5a9ee1d

hotfix: change aligned_alloc to posix_memalign · 3c25c667

mechanik20051988 authored 4 years ago

Changed aligned_alloc to posix_memalign because in some
macOS systems aligned_alloc function is not available.

3c25c667

Mar 26, 2021

test: update test-run (--memtx-allocator) · 6d6a153b

Alexander Turenko authored 4 years ago

Added the `--memtx-allocator <string>` test-run option, which just sets
the `MEMTX_ALLOCATOR` environment variable. The variable is available in
the testing code (including instance files) and supposed to be used to
verify the upcoming `box.cfg{memtx_allocator = <small|system>}` option.

Alternatively one can just set the `MEMTX_ALLOCATOR` environment
variable manually.

Beware: The option does not set the allocator in tarantool automatically
in some way. Nope. A test should read the variable and set the box.cfg
option.

[1]: https://github.com/tarantool/test-run/pull/281

Part of #5419

Unverified

6d6a153b

Mar 25, 2021

ssl_cert_paths_discover: delete unused headers · 23447252

HustonMmmavr authored 4 years ago

* Remove unnecessary `#include "tt_static.h"` from
  src/ssl_cert_paths_discover.c
* Fix typo at test/app-tap/ssl-cert-paths-discover.test.lua
  call `os.exit` instead of `os:exit`

A follow up on #5615

23447252

rfc: describe an inter-fiber debugger · c0e6748d

Sergey Ostanevich authored 4 years ago


Resolves #5857

Reviewed-by: Igor Munkin <imun@tarantool.org>
Signed-off-by: Igor Munkin <imun@tarantool.org>

Unverified

c0e6748d

Mar 24, 2021

lib: fix memory leak in rope_insert · 51940800

Iskander Sagitov authored 4 years ago

Found that in case of exiting the rope_insert function with an error
some nodes are created but not deleted.

This commit fixes it and adds the test.

Test checks  that in case of this error the number of
allocated nodes and the number of freed nodes are the same.

Closes #5788

51940800

buffer: remove Lua registers · 911ca60e

Vladislav Shpilevoy authored 4 years ago

Lua buffer module used to have a couple of preallocated objects of
type 'union c_register'. It was a bunch of C scalar and array
types intended for use instead of ffi.new() where it was needed to
allocate a temporary object like 'int[1]' just to be able to pass
'int *' into a C function via FFI.

It was a bit faster than ffi.new() even for small sizes. For
instance (when JIT works), getting a register to use it as
'int[1]' cost around 0.2-0.3 ns while ffi.new('int[1]') costs
around 0.4 ns. Also the code looked cleaner.

But Lua registers were global and therefore had the same issue as
IBUF_SHARED and static_alloc() in Lua - no ownership, and sudden
reuse when GC starts right the register is still in use in some
Lua code. __gc handlers could wipe the register values making the
original code behave unpredictably.

IBUF_SHARED was fixed by proper ownership implementation, but it
is not necessary with Lua registers. It could be done with the
buffer.ffi_stash_new() feature, but its performance is about 0.8
ns which is worse than plain ffi.new() for simple scalar types.

This patch eliminates Lua registers, and uses ffi.new() instead
everywhere.

Closes #5632

911ca60e

sio: introduce and use sio_snprintf() · fde44b56

Vladislav Shpilevoy authored 4 years ago

sio_strfaddr() can't be used in the places where static buffer
is not acceptable - in any code which wants to push the value to
Lua, or the address string must be long living.

The patch introduces sio_snprintf(), which does the same, but
saves the result into a provided buffer with a limited size.

In the Lua C code the patch saves the address string on the stack
which makes it safe against Lua GC interruptions.

Part of #5632

fde44b56

sio: increase SERVICE_NAME_MAXLEN size · 6b331f7a

Vladislav Shpilevoy authored 4 years ago

It was 32, and couldn't fit long IPv6 and Unix socket addresses.

The patch makes it 200 so now it fits any supported addresses
family used in the code.

Having SERVICE_NAME_MAXLEN valid is necessary to be able to save
a complete address string on the stack in the places where the
static buffer returned by sio_strfaddr() can't be used safely. For
instance, in the code working with Lua due to Lua GC which might
be invoked any moment and in a __gc handler could overwrite the
static buffer.

Needed for #5632

6b331f7a

sio: rework sio_strfaddr() · 441cb814

Vladislav Shpilevoy authored 4 years ago

The function was overcomplicated, and made it harder to update it
in the next patches with functional changes.

The main source of the complication was usage of both inet_ntoa()
and getnameinfo(). The latter is more universal, it can cover the
case of the former.

The patch makes it use only getnameinfo() for IP addresses
regardless of v4 or v6.

Needed for #5632

441cb814

lua: use lua_pushfstring() instead of tt_sprintf() · b3872a38

Vladislav Shpilevoy authored 4 years ago

In a few places to push a formatted string was used 2 calls:
tt_sprintf() + lua_pushstring(). It wasn't necessary because Lua
API has lua_pushfstring() with a big enough subset of printf
format features.

But more importantly - it was a bug. lua_pushstring() is a GC
point. Before copying the passed string it tries to invoke Lua GC,
which might invoke a __gc handler for some cdata, where static
alloc might be used, and it can rewrite the string passed to
lua_pushstring() in the beginning of the stack.

Part of #5632

b3872a38

buffer: remove static_alloc() from Lua · ae1821fe

Vladislav Shpilevoy authored 4 years ago

Static_alloc() uses a fixed size circular BSS memory buffer. It is
often used in C when need to allocate something of a size smaller
than the static buffer temporarily. And it was thought that it
might be also useful in Lua when backed up by ffi.new() for large
allocations.

It was useful, and faster than ffi.new() on sizes > 128 and less
than the static buffer size, but it wasn't correct to use it. By
the same reason why IBUF_SHARED global variable should not have
been used as is. Because without a proper ownership the buffer
might be reused in some unexpected way.

Just like with IBUF_SHARED, the static buffer could be reused
during Lua GC in one of __gc handlers. Essentially, at any moment
on almost any line of a Lua script.

IBUF_SHARED was fixed by proper ownership implementation, but it
is not possible with the static buffer. Because there is no such a
thing like a static buffer object which can be owned, and even if
there would be, cost of its support wouldn't be much better than
for the new cord_ibuf API. That would make the static buffer close
to pointless.

This patch eliminates static_alloc() from Lua, and uses cord_ibuf
instead almost everywhere except a couple of places where
ffi.new() is good enough.

Part of #5632

ae1821fe

uri: replace static_alloc with ffi stash and ibuf · 7175b43e

Vladislav Shpilevoy authored 4 years ago

static_alloc() appears not to be safe to use in Lua, because it
does not provide any ownership protection for the returned values.

The problem appears when something is allocated, then Lua GC
starts, and some __gc handlers might also use static_alloc(). In
Lua and in C - both lead to the buffer being corrupted in its
original usage place.

The patch is a part of activity of getting rid of static_alloc()
in Lua. It removes it from uri Lua module and makes it use the
new FFI stash feature, which helps to cache frequently used and
heavy to allocate FFI values.

In one place static_alloc() was used for an actual buffer - it was
replaced with cord_ibuf which is equally fast when preallocated.

ffi.new() for temporary struct uri is not used, because

- It produces a new GC object;

- ffi.new('struct uri') costs around 20ns while FFI stash
  costs around 0.8ns. The hack with 'struct uri[1]' does not help
  because size of uri is > 128 bytes;

- Without JIT ffi.new() costs about the same as the stash, not
  better as well;

The patch makes uri perf a bit better in the places where
static_alloc() was used, because its cost was around 7ns for one
allocation.

7175b43e

uuid: drop tt_uuid_str() from Lua · acf8745e

Vladislav Shpilevoy authored 4 years ago

The function converts struct tt_uuid * to a string. The string is
allocated on the static buffer, which can't be used in Lua due to
unpredictable GC behaviour. It can start working any moment even
if tt_uuid_str() has returned, but its result wasn't passed to
ffi.string() yet. Then the buffer might be overwritten.

Lua uuid now uses tt_uuid_to_string() which does the same but
takes the buffer pointer. The buffer is stored in an ffi stash,
because it is x4 times faster than ffi.new('char[37]') (where 37
is length of a UUID string + terminating 0) (2.4 ns vs 0.8 ns).

After this patch UUID is supposed to be fully compatible with Lua
GC handlers.

Part of #5632

acf8745e

uuid: replace static_alloc with ffi stash · 5cbb6907

Vladislav Shpilevoy authored 4 years ago

static_alloc() appears not to be safe to use in Lua, because it
does not provide any ownership protection for the returned values.

The problem appears when something is allocated, then Lua GC
starts, and some __gc handlers might also use static_alloc(). In
Lua and in C - both lead to the buffer being corrupted in its
original usage place.

The patch is a part of activity of getting rid of static_alloc()
in Lua. It removes it from uuid Lua module and makes it use the
new FFI stash feature, which helps to cache frequently used and
heavy to allocate FFI values.

ffi.new() is not used, because

- It produces a new GC object;

- ffi.new('struct tt_uuid') costs around 300ns while FFI stash
  costs around 0.8ns (although it is magically fixed when
  ffi.new('struct tt_uuid[1]') is used);

- Without JIT ffi.new() costs about the same as the stash, ~280ns
  for small objects like tt_uuid.

The patch makes uuid perf a bit better in the places where
static_alloc() was used, because its cost was around 7ns for one
allocation.

5cbb6907

buffer: implement ffi stash · a5549a44

Vladislav Shpilevoy authored 4 years ago

Buffer module now exposes ffi_stash_new() function which returns 2
functions take() and put().

FFI stash implements proper ownership of global heavy-to-create
objects which can only be created via FFI. Such as structs,
pointers, arrays.

It should help to fix buffer's registers (buffer.reg1,
buffer.reg2, buffer.reg_array), and other global FFI objects such
as 'struct port_c' in schema.lua.

The issue is that when these objects are global, they might be
re-used right during usage in case Lua starts GC and invokes
__gc handlers. Just like it happened with IBUF_SHARED and
static_alloc().

Part of #5632

a5549a44

cord_buf: introduce ownership management · c20e0449

Vladislav Shpilevoy authored 4 years ago

The global ibuf used for hot Lua and Lua C code didn't have
ownership management. As a result, it could be reused in some
unexpected ways during Lua GC via __gc handlers, even if it was
currently in use in some code below the stack.

The patch makes cord_ibuf_take() steal the global buffer from its
global stash, and assign to the current fiber. cord_ibuf_put()
puts it back to the stash, and detaches from the fiber. If yield
happens before cord_ibuf_put(), the buffer is detached
automatically.

Fiber attach/detach is done via on_yield/on_stop triggers. The
buffer is not supposed to survive a yield, so this allows to
free/put the buffer back to the stash even if the owner didn't do
that. For instance, if a Lua exception was raised before
cord_ibuf_put() was called.

This makes cord buffer being safe to use in any yield-free code,
even if Lua GC might be started. And in non-Lua code as well.

Part of #5632

c20e0449

cord_buf: introduce cord_buf API · ade45685

Vladislav Shpilevoy authored 4 years ago

There was a global ibuf object called tarantool_lua_ibuf. It was
used in all the places working with Lua which didn't have yields,
and where fiber's region could be potentially slower due to not
being able to guarantee the allocated memory is contiguous.

Yields during the ibuf usage were prohibited because another fiber
would take the same ibuf and override its previous content which
was still used by another fiber.

But it wasn't taken into account that there is Lua GC. It can be
invoked from any Lua function in Lua C code, and almost on any
line in the Lua scripts. During GC some deleted objects might have
GC handlers installed as __gc metamethods. From the handler they
could call Tarantool functions, including the ones using the
global ibuf.

Therefore ibuf could be overridden not only at yields, but almost
in any moment. Because with the Lua GC at hand, the multitasking
is not strictly "cooperative" anymore.

It is necessary to implement ownership for the global buffer. The
patch prepares the API for this: the buffer is moved to its own
file, and has methods take(), put(), and drop().

Take() is supposed to make the current fiber own the buffer. Put()
makes it available again. Drop() does the same but also clears the
buffer (frees its memory). The ownership itself is a subject for
the next patches. Here only the API is prepared.

The patch "hits" performance a little. Previously the get of
buffer.IBUF_SHARED cost around 1 ns. Now cord_ibuf_take() +
cord_ibuf_put() cost around 5 ns together. The next patches will
make it worse, up to 15 ns until #5871 is done.

Part of #5632

ade45685

iconv: take errno before reseting the context · 8c489744

Vladislav Shpilevoy authored 4 years ago

In Lua iconv_convert() in case ffi.C.tnt_iconv() with normal
arguments failed, tried to clear iconv context by calling the
function again with all arguments NULL. Then it looked at errno.

But the second call could do anything with errno. For instance, it
could also fail, and change errno.

The patch saves errno into a variable before calling tnt_iconv()
second time.

It still does not give a perfect protection as it was discovered
in scope of #5632, but still better.

The patch is mostly motivated by the next patches about #5632
which will add another call to the error path, and it should
better be after errno save.

Needed for #5632

8c489744

tuple: pass global ibuf explicitly where possible · fabf0d57

Vladislav Shpilevoy authored 4 years ago

Code in lua/tuple.c used global tarantool_lua_ibuf in many places
relying on it never being changed and not reused by other code
until a yield.

But it is not so. In fact, as it was discovered in #5632, in any
Lua function may be started GC. Any GC handler might touch some
API also using tarantool_lua_ibuf inside.

This makes the first usage in lua/tuple.c invalid - the buffer
could be reset or reallocated or its wpos/rpos could change during
GC.

In order to fix this, first of all there should be clear points
where the buffer is taken, and where it becomes not needed
anymore.

The patch makes code in lua/tuple.c take tarantool_lua_ibuf when
it is needed first time. Not during usage. The same is done for
the fiber region for the API symmetry.

Part of #5632

fabf0d57

test: don't use IBUF_SHARED in the tests · d0f0fc47

Vladislav Shpilevoy authored 4 years ago

In msgpack test it is used only to check that 'struct ibuf *' can
be passed to encode() functions. But soon IBUF_SHARED will be
deleted, and its alternative won't be yield-tolerant. This means
it can't be used in this test. There are yields between the buffer
usages.

In varbinary test it is used in a too complicated way to be able
to put it back normally. And otherwise its usage does not make
much sense - without put() it is going to be created from the
scratch on non-first usage until a yield.

In the module_api test it is used to check if some function works
with 'struct ibuf *'. Can be done without IBUF_SHARED.

Part of #5632

d0f0fc47

fio: don't use shared buffer in pread() · 24d86294

Vladislav Shpilevoy authored 4 years ago

fio:pread() used buffer.IBUF_SHARED, which might be reused after a
yield. As a result, if pread() was called from 2 different fibers
or in parallel with something else using IBUF_SHARED, it would
turn the buffer into garbage for all parallel usages.

The same problem existed for read(), and was fixed in
c7c24f84 ("fio: Fix race condition
in fio.read"). But apparently pread() was missed.

What is worse, the original commit's test passed even without the
fix from that commit. Because it didn't check the results of
read()s called from 2 fibers.

The patch fixes pread() and adds a test covering both read() and
pread(). The old test from the original commit is dropped.

Follow up #3187

24d86294

Mar 22, 2021

test: update test-run (hang worker fix) · 680990a0

Alexander Turenko authored 4 years ago

This update fixes a sporadic problem with hanging test-run workers. The
reason is an incorrect garbage collector handler. See [1] for details.

This is not the last test-run problem, which leads to a hang worker: at
least there is known problem [2].

[1]: https://github.com/tarantool/test-run/pull/275
[2]: https://github.com/tarantool/test-run/issues/276

Part of tarantool/tarantool-qa#96

Unverified

680990a0

small: fix flaky test in small · 9e72c48a

mechanik20051988 authored 4 years ago

There was error in test: in case when rand() % OSCILLATION_MAX return 0,
no memory allocation is made, so fail_unless(obuf_capacity(&buf) > 0)
check failed. A small refactoring was also done: add slab_arena_destroy
for graceful resources release, removed global seed value, removed unused
value from enum.

Closes #5345

9e72c48a

Add changelog entry for #5451 · a87a8ed4
Oleg Babin authored 4 years ago
```
This patch adds previously missing changelog entry.

Follow-up #5451
```
a87a8ed4

Mar 19, 2021

lua: separate sched and script diag · f4e248c0

Vladislav Shpilevoy authored 4 years ago

When Lua main script was launched, the sched fiber passed its own
diag to the script's fiber. When the script was finished, it put
its error into the diag. The sched fiber then checked if the diag
is empty to detect an error.

But it wasn't really correct. The error could also happen right in
the scheduler fiber in a libev callback. For example, in one of
ev_io callbacks in SWIM. Then the process would end with an error
even if the script was finished successfully.

These errors were not related to the main fiber executing the
script.

The patch makes so the scheduler fiber's diag no longer is used as
an indication of an error in the script. Instead, a new diag is
created on the stack of the scheduler's fiber, where the Lua
script saves the error.

Closes #5864

f4e248c0

swim: add SO_BROADCAST option · 1844eb6f

Vladislav Shpilevoy authored 4 years ago

Swim node couldn't talk to broadcast network interfaces because
the option SO_BROADCAST wasn't set.

It worked fine for localhost broadcast, but failed for all the
other IPs. There is no a test, because the tests work for the
localhost only anyway.

It still fails on Mac though in case the swim node was bound to
127.0.0.1. Then somewhy sendto() raises EADDRNOTAVAIL on attempt
to broadcast beyond the local machine. It happens on Linux too,
but with EINVAL error. These errors are ignored because are not
critical.

Part of #5864

1844eb6f

base64: fix decoder output buffer overrun (reads) · 778d34e8

Sergey Nikiforov authored 4 years ago

Was caught by base64 test with enabled ASAN.

It also caused data corruption - garbage instead of "extra bits" was
saved into state->result if there was no space in output buffer.

Decode state removed along with helper functions.

Added test for "zero-sized output buffer" case.

Fixes: #3069
(cherry picked from commit 7214add2c7f2a86265a5e08f2184029a19fc184d)

778d34e8

wal: introduce limits on simultaneous writes · de93b448

Serge Petrenko authored 4 years ago

Since the introduction of asynchronous commit, which doesn't wait for a
WAL write to succeed, it's quite easy to clog WAL with huge amounts
write requests. For now, it's only possible from an applier, since it's
the only user of async commit at the moment.

This happens when replica is syncing with master and reads new
transactions at a pace higher than it can write them to WAL (see docbot
request for detailed explanation).

To ameliorate such behavior, we need to introduce some limit on
not-yet-finished WAL write requests. This is what this commit is trying
to do.
A new counter is added to wal writer: queue_size (in bytes) together with a
corresponding configuration setting: `wal_queue_max_size`.
The counter is increased on every new submitted request, and decreased once
the tx thread receives a confirmation that a specific request was written.

Actually, the limit is added to an abstract journal queue, but
currently works only for wal writer, since it's the only possible journal
when applier is working.

Once size reaches its maximum value, applier is blocked until
some of the write requests are finished.

The size limit isn't strict, i.e. if there's at least one free byte, the
whole write request fits and no blocking is involved.

The feature is ready for `box.commit{is_async=true}`. Once it's
implemented, it should check whether the queue is full and let the user
decide what to do next. Either wait or roll the tx back.

Closes #5536

@TarantoolBot document
Title: new configuration option: 'wal_queue_max_size'

`wal_queue_max_size` puts a limit on the amount of concurrent write requests
submitted to WAL.
`wal_queue_max_size` is measured in number of bytes to be written (0
means unlimited, which was the default behaviour before).
The option only affects replica behaviour at the moment, and defaults
to 16 megabytes. The option limits the pace at which replica reads new
transactions from master.

Here's when the option comes in handy:

Before this option was introduced such a situation could be possible:
there are 2 servers, a master and a replica, and the replica is down for
some period of time. While the replica is down, master serves requests
at a reasonable pace, possibly close to its WAL throughput limit. Once the
replica reconnects, it has to receive all the data master has piled up and
there's no limit in speed at which master sends the data to replica, and,
without the option, there was no limit in speed at which replica submitted
corresponding write requests to WAL.

This lead to a situation when replica's WAL was never in time to serve the
requests and the amount of pending requests was constantly growing.
There was no limit for memory WAL write requests take, and this clogging
of WAL write queue could even lead to replica using up all the available
memory.

Now, when `wal_queue_max_size` is set, appliers will stop reading new
transactions once the limit is reached. This will let WAL process all the
requests that have piled up and free all the excess memory.

de93b448

Implement on_shutdown API · 3010f024

mechanik20051988 authored 4 years ago

Implemented on_shutdown API, which allows to register functions
that will be called when the tarantool stopped. Functions will
be called in the reverse order they are registered. So the module
developer registers one fuction that starts module termination and
waits for its competition. This function should be fast or used an
asynchronous waiting mechanism (coio_wait or cord_cojoin for example).

Closes #5723

@TarantoolBot document
Title: Implement on_shutdown API
Implemented on_shutdown API, which allows to register functions
that will be called when the tarantool stopped. Functions will
be called in the reverse order they are registered. So the module
developer registers one fuction that starts module termination and
waits for its competition. This function should be fast or used an
asynchronous waiting mechanism (coio_wait or cord_cojoin for example).

3010f024

lua: change on_shutdown triggers behaviour · 357f1551

mechanik20051988 authored 4 years ago

Previously lua on_shutdown triggers were started sequentially, now
each of triggers starts in a separate fiber. Tarantool waits for 3.0
seconds to their completion by default. User has the option to change
this value using new implemented box.ctl.set_on_shutdown_timeout function.
If timeout has expired, tarantool immediately stops, without waiting for
other triggers completion.
Also moved ev_break from trigger to the on_shutdown_f function, after
calling all on_shutdown lua triggers, because now all triggers are
started asynchronously in fibers, and we should call ev_break only
after all triggers are finished.

Part of #5723

@TarantoolBot document
Title: Changed Lua on_shutdown triggers behaviour.
Previously lua on_shutdown triggers were started sequentially, now
each of triggers starts in a separate fiber. Tarantool waits for 3.0
seconds to their completion by default. User has the option to change
this value using new implemented box.ctl.set_on_shutdown_timeout function.
If timeout has expired, tarantool immediately stops, without waiting for
other triggers completion.

357f1551

Rename on_shutdown triggers list head · 3b7fe7d6

mechanik20051988 authored 4 years ago

Since the function for registering on_shutdown triggers for
tarantool modules was decided to be named box_on_shutdown,
the head of the trigger list with a similar name was renamed.

Part of #5723

3b7fe7d6

Implement trigger_fiber_run function · c1020a69

mechanik20051988 authored 4 years ago

Implemented function for starting a chain of triggers
in separate fibers, which is required for on_shutdown
API implementation.

Part of #5723

c1020a69

Implement fiber_join_timeout function · 4ac5a3cb

mechanik20051988 authored 4 years ago

Implemented fiber_join_timeout function, which allows to wait
for the completion of the fiber for a specified period of time.
Function returns fiber execution status to the caller or -1
if the timeout exceeded and set diag. Needed for further
on_shutdown API implementation.

Part of #5723

4ac5a3cb