Commits · 645141133478fcd10b1ac4bd2e993a84da6a7750 · core / tarantool

Apr 14, 2021

update: allow update(delete) absent nullable fields · 64514113

mechanik20051988 authored 3 years ago

In previous patch update(insert) operation for absent nullable fields
was allowed. This patch allows to update(delete) operation for absent
nullable fileds.
Closes #3378

64514113

update: allow update absent nullable fields · 2bb373b9

Mary Feofanova authored 4 years ago

Update operations could not insert with gaps. This patch changes
the behavior so that the update operation fills the missing fields
with nulls.
Part of #3378

@TarantoolBot document
Title: Allow update absent nullable fields
Update operations could not insert with gaps. Changed the behavior
so that the update operation fills the missing fields with nulls.
For example we create space `s = box.schema.create_space('s')`,
then create index for this space `pk = s:create_index('pk')`, and
then insert tuple in space `s:insert{1, 2}`. After all of this we
try to update this tuple `s:update({1}, {{'!', 5, 6}})`. In previous
version this operation fails with ER_NO_SUCH_FIELD_NO error, and now
it's finished with success and there is [1, 2, null, null, 6] tuple in
space.

2bb373b9

test: change compact.test.lua bad upsert · 4b0f81ea

Mary Feofanova authored 4 years ago

Prepare this test for upcoming #3378 fix:
bad upserts will become good, so we need
another way to do them.

4b0f81ea

Apr 13, 2021

iproto: implement ability to run multiple iproto threads · 2ede3be3

mechanik20051988 authored 3 years ago

There are users that have specific workloads where iproto thread
is the bottleneck of throughput: iproto thread's code is 100% loaded
while TX thread's core is not. For such cases it would be nice to have
a capability to create several iproto threads.

Closes #5645

@TarantoolBot document
Title: implement ability to run multiple iproto threads
Implement ability to run multiple iproto threads, which is useful
in some specific workloads where iproto thread is the bottleneck
of throughput. To specify count of iproto threads, user should used
iproto_threads option in box.cfg. For example if user want to start
8 iproto threads, he must enter `box.cfg{iproto_threads=8}`. Default
iproto threads count == 1. This option is not dynamic, so user can't
change it after first setting, until server restart. Distribution of
connections per threads is managed by OS kernel.

2ede3be3

box: change ER_TUPLE_FOUND message · d11fb306

Iskander Sagitov authored 4 years ago

ER_TUPLE_FOUND message shows only space and index, let's also show old
tuple and new tuple.

This commit changes error message in code and in tests. Test sql/checks
and sql-tap/aler remain the same due to problems in showing their old
and new tuples in error message.

Closes #5567

d11fb306

box: add field name to field mismatch errors · 12b7155d
Iskander Sagitov authored 3 years ago
```
Add field name to field mismatch error message.

Part of #4707
```
12b7155d
box: add info to mismatch errors · 24b90815
Iskander Sagitov authored 4 years ago
```
Add got type to field mismatch error message.

Part of #4707
```
24b90815

box: fix uint32_t overflow bug · dea91629

Iskander Sagitov authored 3 years ago

Previously tuple_field_u32 and tuple_next_u32 stored uint64_t value in
uint32_t field. This commit fixes it.

Part of #4707

dea91629

Apr 12, 2021

feedback_daemon: count and report some events · aa97a185

Serge Petrenko authored 3 years ago

Bump `feedback_version` to 7 and introduce a new field: `feedback.events`.
It holds a counter for every event we may choose to register later on.

Currently the possible events are "create_space", "drop_space",
"create_index", "drop_index".

All the registered events and corresponding counters are sent in a
report in `feedback.events` field.

Also, the first registered event triggers the report sending right away.
So, we may follow such events like "first space/index created/dropped"

Closes #5750

aa97a185

feedback_daemon: rename `send_test` to `send` · 670acf0d

Serge Petrenko authored 3 years ago

feedback_daemon.send() will come in handy once we implement triggers to
dispatch feedback after some events, for example, right on initial
instance configuration.

So, it's not a testing method anymore, hence the new name.

Part of #5750

670acf0d

feedback_daemon: include server uptime in the report · c5d595bc

Serge Petrenko authored 3 years ago

We are going to send feedback right after initial `box.cfg{}` call, so
include server uptime in the report to filter out short-living CI
instances.

Also, while we're at it, fix a typo in feedback_daemon test.

Prerequisite #5750

c5d595bc

qsync: provide box.info.synchro interface for monitoring · bce3b581

Cyrill Gorcunov authored 4 years ago


In commit 14fa5fd8 (cfg: support symbolic evaluation of
replication_synchro_quorum) we implemented support of
symbolic evaluation of `replication_synchro_quorum` parameter
and there is no easy way to obtain it current run-time value,
ie evaluated number value.

Moreover we would like to fetch queue length on transaction
limbo for tests and extend this statistics in future. Thus
lets add them.

Closes #5191

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

@TarantoolBot document
Title: Provide `box.info.synchro` interface

The `box.info.synchro` leaf provides information about details of
synchronous replication.

In particular `quorum` represent the current value of synchronous
replication quorum defined by `replication_synchro_quorum`
configuration parameter because it can be set as dynamic formula
such as `N/2+1` and the value depends on current number of replicas.

Since synchronous replication does not commit data immediately
but waits for its propagation to replicas the data sits in a queue
gathering `commit` responses from remote nodes. Current number of
entries waiting in the queue is shown via `queue.len` member.

A typical output is the following

``` Lua
tarantool> box.info.synchro
---
- queue:
    len: 0
  quorum: 1
...
```

The `len` member shows current number of entries in the queue.
And the `quorum` member shows an evaluated value of
`replication_synchro_quorum` parameter.

bce3b581

Apr 07, 2021

test: fix and refactoring force_recovery test · 75e65b6a

mechanik20051988 authored 3 years ago

Test checks possibility of recovery with force_recovery option.
For these purpose, snapshot is damaged in test and possibility of
recovery is checked. In previous version snapshot size was too small,
so sometimes in test, system data, needed for recovery was corrupted.
Also moved same code to functions, deleted one of two identical test
case (in previous version we write different count of garbage in the
middle of snapshot, it's meaningless, because result is almost the same,
only the amount of data that can be read from snapshot differs).
Follow-up #5422

75e65b6a

Apr 05, 2021

recovery: make it transactional · 9311113d

Vladislav Shpilevoy authored 4 years ago

Recovery used to be performed row by row. It was fine because
anyway all the persisted rows are supposed to be committed, and
should not meet any problems during recovery so a transaction
could be applied partially.

But it became not true after the synchronous replication
introduction. Synchronous transactions might be in the log, but
can be followed by a ROLLBACK record which is supposed to delete
them.

During row-by-row recovery, firstly, the synchro rows each turned
into a sync transaction. Which is probably fine. But the rows on
non-sync spaces which were a part of a sync transaction, could be
applied right away bypassing the limbo leading to all kind of the
sweet errors like duplicate keys, or inconsistency of a partially
applied transaction.

The patch makes the recovery transactional. Either an entire
transaction is recovered, or it is rolled back which normally
happens only for synchro transactions followed by ROLLBACK.

In force recovery of a broken log the consistency is not
guaranteed though.

Closes #5874

9311113d

replication: do not ignore replica vclock on register · f42fee5a

Serge Petrenko authored 4 years ago

There was a bug in box_process_register. It decoded replica's vclock but
never used it when sending the registration stream. So the replica might
lose the data in range (replica_vclock, start_vclock).

Follow-up #5566

f42fee5a

replication: tolerate synchro rollback during final join · 3ec0e87f

Serge Petrenko authored 4 years ago

Both box_process_register and box_process_join had guards ensuring that
not a single rollback occured for transactions residing in WAL around
replica's _cluster registration.
Both functions would error on a rollback and make the replica retry
final join.

The reason for that was that replica couldn't process synchronous
transactions correctly during final join, because it applied the final
join stream row-by-row.

This path with retrying final join was a dead end, because even if
master manages to receive no ROLLBACK messages around N-th retry of
box.space._cluster:insert{}, replica would still have to receive and
process all the data dating back to its first _cluster registration
attempt.
In other words, the guard against sending synchronous rows to the
replica didn't work.

Let's remove the guard altogether, since now replica is capable of
processing synchronous txs in final join stream and even retrying final
join in case the _cluster registration was rolled back.

Closes #5566

3ec0e87f

applier: fix not releasing the latch on apply_synchro_row() fail · 9ad1bd15

Serge Petrenko authored 4 years ago

Once apply_synchro_row() failed, applier_apply_tx() would simply raise
an error without unlocking replica latch. This lead to all the appliers
hanging indefinitely on trying to lock the latch for this replica.

In scope of #5566

9ad1bd15

swim: check types in __serialize methods · 1d121c12

Vladislav Shpilevoy authored 3 years ago

In swim Lua code none of the __serialize methods checked the
argument type assuming that nobody would call them directly and
mess with the types. But it happened, and is not hard to fix, so
the patch does it.

The serialization functions are sanitized for the swim object,
swim member, and member event.

Closes #5952

1d121c12

swim: fix crash on bad member_by_uuid() call · fe33a108

Vladislav Shpilevoy authored 3 years ago

In Lua swim object's method member_by_uuid() could crash if called
with no arguments. UUID was then passed as NULL, and dereferenced.

The patch makes member_by_uuid() treat NULL like nil UUID and
return NULL (member not found). The reason is that
swim_member_by_uuid() can't fail. It can only return a member or
not. It never sets a diag error.

Closes #5951

fe33a108

lua: fix tuple leak in <key_def>.compare_with_key · db766c52

Alexander Turenko authored 4 years ago

The key difference between lbox_encode_tuple_on_gc() and
luaT_tuple_encode() is that the latter never raises a Lua error, but
passes an error using the diagnostics area.

Aside of the tuple leak, the patch fixes fiber region's memory 'leak'
(till fiber_gc()). Before the patch, the memory that is used for
serialization of the key is not freed (region_truncate()) when the
serialization fails. It is verified in the gh-5388-<...> test.

While I'm here, added a test case that just verifies correct behaviour
in case of a key serialization failure (added into key_def.test.lua).
The case does not verify whether a tuple leaks and it is successful as
before this patch as well after the patch. I don't find a simple way to
check the tuple leak within a test. Verified manually using the
reproducer from the linked issue.

Fixes #5388

db766c52

Apr 02, 2021

vinyl: remove vylog newer than snap in casual recovery · 33254d91

Nikita Pettik authored 4 years ago

As a follow-up to the previous patch, let's check also emptiness of the
vylog being removed. During vylog rotation all entries are squashed
(e.g. "delete range" annihilates "insert range"), written to the new
vylog and at the end of new vylog SNAPSHOT marker is placed. If the last
entry in the vylog is SNAPSHOT, we can safely remove it without
hesitation.  So it is OK to remove it even during casual recovery
process. However, if it contains rows after SNAPSHOT marker, removal of
vylog may cause data loss. In this case we still can remove it only in
force_recovery mode.

Follow-up #5823

33254d91

vinyl: skip vylog if it's newer than snap · 149ccce9

Nikita Pettik authored 4 years ago

Having data in different engines checkpoint process is handled this way:
 - wait_checkpoint memtx
 - wait_checkpoint vinyl
 - commit_checkpoint memtx
 - commit_checkpoint vinyl

In contrast to commit_checkpoint which does not tolerate fails (if
something goes wrong e.g. renaming of snapshot file - instance simply
crashes), wait_checkpoint may fail. As a part of wait_checkpoint for
vinyl engine vy_log rotation takes place: old vy_log is closed and new
one is created. At this moment, wait_checkpoint of memtx engine has
already created new *inprogress* snapshot featuring bumped vclock.
While recovering from this configuration, vclock of the latest snapshot
is used as a reference.

At the initial recovery stage (vinyl_engine_begin_initial_recovery),
we check that snapshot's vclock matches with vylog's one (they should be
the same since normally vylog is rotated along with snapshot). On the
other hand, in the directory we have old snapshot and new vylog (and new
.inprogress snapshot). In such a situation recovery (even in force mode)
was aborted. The only way to fix this dead end, user has to manually
delete last vy_log file.

Let's proceed with the same resolution while user runs force_recovery
mode: delete last vy_log file and update vclock value. If user uses
casual recovery, let's print verbose message how to fix this situation
manually.

Closes #5823

149ccce9

sql: ignore \0 in string passed to Lua-function · 22e2e4ea

Mergen Imeev authored 4 years ago

Prior to this patch string passed to user-defined Lua-function from SQL
was cropped in case it contains '\0'. At the same time, it wasn't
cropped if it is passed to the function from BOX. After this patch the
string won't be cropped when passed from SQL if it contain '\0'.

Closes #5938

22e2e4ea

sql: ignore \0 in string passed to C-function · fa7e6f7d

Mergen Imeev authored 4 years ago

Prior to this patch string passed to user-defined C-function from SQL
was cropped in case it contains '\0'. At the same time, it wasn't
cropped if it is passed to the function from BOX. Now it isn't cropped
when passed from SQL.

Part of #5938

fa7e6f7d

Mar 31, 2021

test: box-tap/gc -- add test for is_paused field · 83ec719c

Cyrill Gorcunov authored 4 years ago


Once simple bootstrap is complete and there is no
replicas used we should run with gc unpaused.

Part-of #5806

Acked-by: Serge Petrenko <sergepetrenko@tarantool.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

83ec719c

test: add a test for wal_cleanup_delay option · 5437afe2
Cyrill Gorcunov authored 4 years ago
```
Part-of #5806

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
```
5437afe2

gc/xlog: delay xlog cleanup until relays are subscribed · 2fd51aea

Cyrill Gorcunov authored 4 years ago


In case if replica managed to be far behind the master node
(so there are a number of xlog files present after the last
master's snapshot) then once master node get restarted it
may clean up the xlogs needed by the replica to subscribe
in a fast way and instead the replica will have to rejoin
reading a number of data back.

Lets try to address this by delaying xlog files cleanup
until replicas are got subscribed and relays are up
and running. For this sake we start with cleanup fiber
spinning in nop cycle ("paused" mode) and use a delay
counter to wait until relays decrement them.

This implies that if `_cluster` system space is not empty
upon restart and the registered replica somehow vanished
completely and won't ever come back, then the node
administrator has to drop this replica from `_cluster`
manually.

Note that this delayed cleanup start doesn't prevent
WAL engine from removing old files if there is no
space left on a storage device. The WAL will simply
drop old data without a question.

We need to take into account that some administrators
might not need this functionality at all, for this
sake we introduce "wal_cleanup_delay" configuration
option which allows to enable or disable the delay.

Closes #5806

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

@TarantoolBot document
Title: Add wal_cleanup_delay configuration parameter

The `wal_cleanup_delay` option defines a delay in seconds
before write ahead log files (`*.xlog`) are getting started
to prune upon a node restart.

This option is ignored in case if a node is running as
an anonymous replica (`replication_anon = true`). Similarly
if replication is unused or there is no plans to use
replication at all then this option should not be considered.

An initial problem to solve is the case where a node is operating
so fast that its replicas do not manage to reach the node state
and in case if the node is restarted at this moment (for various
reasons, for example due to power outage) then `*.xlog` files might
be pruned during restart. In result replicas will not find these
files on the main node and have to reread all data back which
is a very expensive procedure.

Since replicas are tracked via `_cluster` system space this we use
its content to count subscribed replicas and when all of them are
up and running the cleanup procedure is automatically enabled even
if `wal_cleanup_delay` is not expired.

The `wal_cleanup_delay` should be set to:

 - `0` to disable the cleanup delay;
 - `>= 0` to wait for specified number of seconds.

By default it is set to `14400` seconds (ie `4` hours).

In case if registered replica is lost forever and timeout is set to
infinity then a preferred way to enable cleanup procedure is not setting
up a small timeout value but rather to delete this replica from `_cluster`
space manually.

Note that the option does *not* prevent WAL engine from removing
old `*.xlog` files if there is no space left on a storage device,
WAL engine can remove them in a force way.

Current state of `*.xlog` garbage collector can be found in
`box.info.gc()` output. For example

``` Lua
 tarantool> box.info.gc()
 ---
   ...
   is_paused: false
```

The `is_paused` shows if cleanup fiber is paused or not.

2fd51aea

Mar 25, 2021

ssl_cert_paths_discover: delete unused headers · 23447252

HustonMmmavr authored 4 years ago

* Remove unnecessary `#include "tt_static.h"` from
  src/ssl_cert_paths_discover.c
* Fix typo at test/app-tap/ssl-cert-paths-discover.test.lua
  call `os.exit` instead of `os:exit`

A follow up on #5615

23447252

Mar 24, 2021

lib: fix memory leak in rope_insert · 51940800

Iskander Sagitov authored 4 years ago

Found that in case of exiting the rope_insert function with an error
some nodes are created but not deleted.

This commit fixes it and adds the test.

Test checks  that in case of this error the number of
allocated nodes and the number of freed nodes are the same.

Closes #5788

51940800

buffer: remove Lua registers · 911ca60e

Vladislav Shpilevoy authored 4 years ago

Lua buffer module used to have a couple of preallocated objects of
type 'union c_register'. It was a bunch of C scalar and array
types intended for use instead of ffi.new() where it was needed to
allocate a temporary object like 'int[1]' just to be able to pass
'int *' into a C function via FFI.

It was a bit faster than ffi.new() even for small sizes. For
instance (when JIT works), getting a register to use it as
'int[1]' cost around 0.2-0.3 ns while ffi.new('int[1]') costs
around 0.4 ns. Also the code looked cleaner.

But Lua registers were global and therefore had the same issue as
IBUF_SHARED and static_alloc() in Lua - no ownership, and sudden
reuse when GC starts right the register is still in use in some
Lua code. __gc handlers could wipe the register values making the
original code behave unpredictably.

IBUF_SHARED was fixed by proper ownership implementation, but it
is not necessary with Lua registers. It could be done with the
buffer.ffi_stash_new() feature, but its performance is about 0.8
ns which is worse than plain ffi.new() for simple scalar types.

This patch eliminates Lua registers, and uses ffi.new() instead
everywhere.

Closes #5632

911ca60e

buffer: remove static_alloc() from Lua · ae1821fe

Vladislav Shpilevoy authored 4 years ago

Static_alloc() uses a fixed size circular BSS memory buffer. It is
often used in C when need to allocate something of a size smaller
than the static buffer temporarily. And it was thought that it
might be also useful in Lua when backed up by ffi.new() for large
allocations.

It was useful, and faster than ffi.new() on sizes > 128 and less
than the static buffer size, but it wasn't correct to use it. By
the same reason why IBUF_SHARED global variable should not have
been used as is. Because without a proper ownership the buffer
might be reused in some unexpected way.

Just like with IBUF_SHARED, the static buffer could be reused
during Lua GC in one of __gc handlers. Essentially, at any moment
on almost any line of a Lua script.

IBUF_SHARED was fixed by proper ownership implementation, but it
is not possible with the static buffer. Because there is no such a
thing like a static buffer object which can be owned, and even if
there would be, cost of its support wouldn't be much better than
for the new cord_ibuf API. That would make the static buffer close
to pointless.

This patch eliminates static_alloc() from Lua, and uses cord_ibuf
instead almost everywhere except a couple of places where
ffi.new() is good enough.

Part of #5632

ae1821fe

uri: replace static_alloc with ffi stash and ibuf · 7175b43e

Vladislav Shpilevoy authored 4 years ago

static_alloc() appears not to be safe to use in Lua, because it
does not provide any ownership protection for the returned values.

The problem appears when something is allocated, then Lua GC
starts, and some __gc handlers might also use static_alloc(). In
Lua and in C - both lead to the buffer being corrupted in its
original usage place.

The patch is a part of activity of getting rid of static_alloc()
in Lua. It removes it from uri Lua module and makes it use the
new FFI stash feature, which helps to cache frequently used and
heavy to allocate FFI values.

In one place static_alloc() was used for an actual buffer - it was
replaced with cord_ibuf which is equally fast when preallocated.

ffi.new() for temporary struct uri is not used, because

- It produces a new GC object;

- ffi.new('struct uri') costs around 20ns while FFI stash
  costs around 0.8ns. The hack with 'struct uri[1]' does not help
  because size of uri is > 128 bytes;

- Without JIT ffi.new() costs about the same as the stash, not
  better as well;

The patch makes uri perf a bit better in the places where
static_alloc() was used, because its cost was around 7ns for one
allocation.

7175b43e

uuid: drop tt_uuid_str() from Lua · acf8745e

Vladislav Shpilevoy authored 4 years ago

The function converts struct tt_uuid * to a string. The string is
allocated on the static buffer, which can't be used in Lua due to
unpredictable GC behaviour. It can start working any moment even
if tt_uuid_str() has returned, but its result wasn't passed to
ffi.string() yet. Then the buffer might be overwritten.

Lua uuid now uses tt_uuid_to_string() which does the same but
takes the buffer pointer. The buffer is stored in an ffi stash,
because it is x4 times faster than ffi.new('char[37]') (where 37
is length of a UUID string + terminating 0) (2.4 ns vs 0.8 ns).

After this patch UUID is supposed to be fully compatible with Lua
GC handlers.

Part of #5632

acf8745e

cord_buf: introduce ownership management · c20e0449

Vladislav Shpilevoy authored 4 years ago

The global ibuf used for hot Lua and Lua C code didn't have
ownership management. As a result, it could be reused in some
unexpected ways during Lua GC via __gc handlers, even if it was
currently in use in some code below the stack.

The patch makes cord_ibuf_take() steal the global buffer from its
global stash, and assign to the current fiber. cord_ibuf_put()
puts it back to the stash, and detaches from the fiber. If yield
happens before cord_ibuf_put(), the buffer is detached
automatically.

Fiber attach/detach is done via on_yield/on_stop triggers. The
buffer is not supposed to survive a yield, so this allows to
free/put the buffer back to the stash even if the owner didn't do
that. For instance, if a Lua exception was raised before
cord_ibuf_put() was called.

This makes cord buffer being safe to use in any yield-free code,
even if Lua GC might be started. And in non-Lua code as well.

Part of #5632

c20e0449

cord_buf: introduce cord_buf API · ade45685

Vladislav Shpilevoy authored 4 years ago

There was a global ibuf object called tarantool_lua_ibuf. It was
used in all the places working with Lua which didn't have yields,
and where fiber's region could be potentially slower due to not
being able to guarantee the allocated memory is contiguous.

Yields during the ibuf usage were prohibited because another fiber
would take the same ibuf and override its previous content which
was still used by another fiber.

But it wasn't taken into account that there is Lua GC. It can be
invoked from any Lua function in Lua C code, and almost on any
line in the Lua scripts. During GC some deleted objects might have
GC handlers installed as __gc metamethods. From the handler they
could call Tarantool functions, including the ones using the
global ibuf.

Therefore ibuf could be overridden not only at yields, but almost
in any moment. Because with the Lua GC at hand, the multitasking
is not strictly "cooperative" anymore.

It is necessary to implement ownership for the global buffer. The
patch prepares the API for this: the buffer is moved to its own
file, and has methods take(), put(), and drop().

Take() is supposed to make the current fiber own the buffer. Put()
makes it available again. Drop() does the same but also clears the
buffer (frees its memory). The ownership itself is a subject for
the next patches. Here only the API is prepared.

The patch "hits" performance a little. Previously the get of
buffer.IBUF_SHARED cost around 1 ns. Now cord_ibuf_take() +
cord_ibuf_put() cost around 5 ns together. The next patches will
make it worse, up to 15 ns until #5871 is done.

Part of #5632

ade45685

test: don't use IBUF_SHARED in the tests · d0f0fc47

Vladislav Shpilevoy authored 4 years ago

In msgpack test it is used only to check that 'struct ibuf *' can
be passed to encode() functions. But soon IBUF_SHARED will be
deleted, and its alternative won't be yield-tolerant. This means
it can't be used in this test. There are yields between the buffer
usages.

In varbinary test it is used in a too complicated way to be able
to put it back normally. And otherwise its usage does not make
much sense - without put() it is going to be created from the
scratch on non-first usage until a yield.

In the module_api test it is used to check if some function works
with 'struct ibuf *'. Can be done without IBUF_SHARED.

Part of #5632

d0f0fc47

fio: don't use shared buffer in pread() · 24d86294

Vladislav Shpilevoy authored 4 years ago

fio:pread() used buffer.IBUF_SHARED, which might be reused after a
yield. As a result, if pread() was called from 2 different fibers
or in parallel with something else using IBUF_SHARED, it would
turn the buffer into garbage for all parallel usages.

The same problem existed for read(), and was fixed in
c7c24f84 ("fio: Fix race condition
in fio.read"). But apparently pread() was missed.

What is worse, the original commit's test passed even without the
fix from that commit. Because it didn't check the results of
read()s called from 2 fibers.

The patch fixes pread() and adds a test covering both read() and
pread(). The old test from the original commit is dropped.

Follow up #3187

24d86294

Mar 19, 2021

base64: fix decoder output buffer overrun (reads) · 778d34e8

Sergey Nikiforov authored 4 years ago

Was caught by base64 test with enabled ASAN.

It also caused data corruption - garbage instead of "extra bits" was
saved into state->result if there was no space in output buffer.

Decode state removed along with helper functions.

Added test for "zero-sized output buffer" case.

Fixes: #3069
(cherry picked from commit 7214add2c7f2a86265a5e08f2184029a19fc184d)

778d34e8

wal: introduce limits on simultaneous writes · de93b448

Serge Petrenko authored 4 years ago

Since the introduction of asynchronous commit, which doesn't wait for a
WAL write to succeed, it's quite easy to clog WAL with huge amounts
write requests. For now, it's only possible from an applier, since it's
the only user of async commit at the moment.

This happens when replica is syncing with master and reads new
transactions at a pace higher than it can write them to WAL (see docbot
request for detailed explanation).

To ameliorate such behavior, we need to introduce some limit on
not-yet-finished WAL write requests. This is what this commit is trying
to do.
A new counter is added to wal writer: queue_size (in bytes) together with a
corresponding configuration setting: `wal_queue_max_size`.
The counter is increased on every new submitted request, and decreased once
the tx thread receives a confirmation that a specific request was written.

Actually, the limit is added to an abstract journal queue, but
currently works only for wal writer, since it's the only possible journal
when applier is working.

Once size reaches its maximum value, applier is blocked until
some of the write requests are finished.

The size limit isn't strict, i.e. if there's at least one free byte, the
whole write request fits and no blocking is involved.

The feature is ready for `box.commit{is_async=true}`. Once it's
implemented, it should check whether the queue is full and let the user
decide what to do next. Either wait or roll the tx back.

Closes #5536

@TarantoolBot document
Title: new configuration option: 'wal_queue_max_size'

`wal_queue_max_size` puts a limit on the amount of concurrent write requests
submitted to WAL.
`wal_queue_max_size` is measured in number of bytes to be written (0
means unlimited, which was the default behaviour before).
The option only affects replica behaviour at the moment, and defaults
to 16 megabytes. The option limits the pace at which replica reads new
transactions from master.

Here's when the option comes in handy:

Before this option was introduced such a situation could be possible:
there are 2 servers, a master and a replica, and the replica is down for
some period of time. While the replica is down, master serves requests
at a reasonable pace, possibly close to its WAL throughput limit. Once the
replica reconnects, it has to receive all the data master has piled up and
there's no limit in speed at which master sends the data to replica, and,
without the option, there was no limit in speed at which replica submitted
corresponding write requests to WAL.

This lead to a situation when replica's WAL was never in time to serve the
requests and the amount of pending requests was constantly growing.
There was no limit for memory WAL write requests take, and this clogging
of WAL write queue could even lead to replica using up all the available
memory.

Now, when `wal_queue_max_size` is set, appliers will stop reading new
transactions once the limit is reached. This will let WAL process all the
requests that have piled up and free all the excess memory.

de93b448

Implement on_shutdown API · 3010f024

mechanik20051988 authored 4 years ago

Implemented on_shutdown API, which allows to register functions
that will be called when the tarantool stopped. Functions will
be called in the reverse order they are registered. So the module
developer registers one fuction that starts module termination and
waits for its competition. This function should be fast or used an
asynchronous waiting mechanism (coio_wait or cord_cojoin for example).

Closes #5723

@TarantoolBot document
Title: Implement on_shutdown API
Implemented on_shutdown API, which allows to register functions
that will be called when the tarantool stopped. Functions will
be called in the reverse order they are registered. So the module
developer registers one fuction that starts module termination and
waits for its competition. This function should be fast or used an
asynchronous waiting mechanism (coio_wait or cord_cojoin for example).

3010f024