Commits · f42fee5a58243356e04571f2b3bc610a0831671d · core / tarantool

Apr 05, 2021

replication: do not ignore replica vclock on register · f42fee5a

Serge Petrenko authored 4 years ago

There was a bug in box_process_register. It decoded replica's vclock but
never used it when sending the registration stream. So the replica might
lose the data in range (replica_vclock, start_vclock).

Follow-up #5566

f42fee5a

replication: tolerate synchro rollback during final join · 3ec0e87f

Serge Petrenko authored 4 years ago

Both box_process_register and box_process_join had guards ensuring that
not a single rollback occured for transactions residing in WAL around
replica's _cluster registration.
Both functions would error on a rollback and make the replica retry
final join.

The reason for that was that replica couldn't process synchronous
transactions correctly during final join, because it applied the final
join stream row-by-row.

This path with retrying final join was a dead end, because even if
master manages to receive no ROLLBACK messages around N-th retry of
box.space._cluster:insert{}, replica would still have to receive and
process all the data dating back to its first _cluster registration
attempt.
In other words, the guard against sending synchronous rows to the
replica didn't work.

Let's remove the guard altogether, since now replica is capable of
processing synchronous txs in final join stream and even retrying final
join in case the _cluster registration was rolled back.

Closes #5566

3ec0e87f

applier: make final join transactional · eceec305

Serge Petrenko authored 4 years ago

Now applier assembles rows into transactions not only on subscribe
stage, but also during final join / register.

This was necessary for correct handling of rolled back synchronous
transactions in final join stream.

Part of #5566

eceec305

applier: remove excess last_row_time update from subscribe loop · efdd14f4

Serge Petrenko authored 4 years ago

applier->last_row_time is updated in applier_read_tx_row, which's called
at least once per each subscribe loop iteration. So there's no need to
have a separate last_row_time update inside the loop body itself.

Part of #5566

efdd14f4

applier: fix not releasing the latch on apply_synchro_row() fail · 9ad1bd15

Serge Petrenko authored 3 years ago

Once apply_synchro_row() failed, applier_apply_tx() would simply raise
an error without unlocking replica latch. This lead to all the appliers
hanging indefinitely on trying to lock the latch for this replica.

In scope of #5566

9ad1bd15

applier: extract plain tx application from applier_apply_tx() · caab8d52

Serge Petrenko authored 4 years ago

The new routine, called apply_plain_tx(), may be used not only by
applier_apply_tx(), but also by final join, once we make it
transactional, and recovery, once it's also turned transactional.

Also, while we're at it. Remove excess fiber_gc() call from
applier_subscribe loop. Let's better make sure fiber_gc() is called on
any return from applier_apply_tx().

Prerequisite #5874
Part of #5566

caab8d52

applier: extract tx boundary checks from applier_read_tx into a separate routine · cb30cc4c

Serge Petrenko authored 4 years ago

Introduce a new routine, set_next_tx_row(), which checks tx boundary
violation and appends the new row to the current tx in case everything
is ok.

set_next_tx_row() is extracted from applier_read_tx() because it's a
common part of transaction assembly both for recovery and applier.

The only difference for recovery will be that the routine which's
responsible for tx assembly won't read rows. It'll be a callback ran on
each new row being read from WAL.

Prerequisite #5874
Part-of #5566

cb30cc4c

replication: fix a hang on final join retry · eb908469

Serge Petrenko authored 4 years ago

Since the introduction of synchronous replication it became possible for
final join to fail on master side due to not being able to gather acks
for some tx around _cluster registration.

A replica receives an error in this case: either ER_SYNC_ROLLBACK or
ER_SYNC_QUORUM_TIMEOUT. The errors lead to applier retrying final join,
but with wrong state, APPLIER_REGISTER, which should be used only on an
anonymous replica. This lead to a hang in fiber executing box.cfg,
because it waited for APPLIER_JOINED state, which was never entered.

Part-of #5566

eb908469

swim: check types in __serialize methods · 1d121c12

Vladislav Shpilevoy authored 3 years ago

In swim Lua code none of the __serialize methods checked the
argument type assuming that nobody would call them directly and
mess with the types. But it happened, and is not hard to fix, so
the patch does it.

The serialization functions are sanitized for the swim object,
swim member, and member event.

Closes #5952

1d121c12

swim: fix crash on bad member_by_uuid() call · fe33a108

Vladislav Shpilevoy authored 3 years ago

In Lua swim object's method member_by_uuid() could crash if called
with no arguments. UUID was then passed as NULL, and dereferenced.

The patch makes member_by_uuid() treat NULL like nil UUID and
return NULL (member not found). The reason is that
swim_member_by_uuid() can't fail. It can only return a member or
not. It never sets a diag error.

Closes #5951

fe33a108

github-ci: drop not needed tag on branches commits · e23e7ac2

Alexander V. Tikhonov authored 3 years ago

We should remove a tag after fetching of a remote repository.
Ported the following commits:

  0f575e01 ('gitlab-ci: fix tag removal for a branch push job')
  0f564f34 ('gitlab-ci: remove tag from pushed branch commit')

Drop a tag that points to a current commit (if any) on a job triggered
by pushing to a branch (as against of pushing a tag). Otherwise we may
get two jobs for the same x.y.z-0-gxxxxxxxxx build: one is run by
pushing a branch and another by pushing a tag. The idea is to hide the
new tag from the branch job as if a tag would be pushed strictly after
all branch jobs for the same commit.

Closes tarantool/tarantool-qa#103

e23e7ac2

lua: fix tuple leak in <key_def>.compare_with_key · db766c52

Alexander Turenko authored 4 years ago

The key difference between lbox_encode_tuple_on_gc() and
luaT_tuple_encode() is that the latter never raises a Lua error, but
passes an error using the diagnostics area.

Aside of the tuple leak, the patch fixes fiber region's memory 'leak'
(till fiber_gc()). Before the patch, the memory that is used for
serialization of the key is not freed (region_truncate()) when the
serialization fails. It is verified in the gh-5388-<...> test.

While I'm here, added a test case that just verifies correct behaviour
in case of a key serialization failure (added into key_def.test.lua).
The case does not verify whether a tuple leaks and it is successful as
before this patch as well after the patch. I don't find a simple way to
check the tuple leak within a test. Verified manually using the
reproducer from the linked issue.

Fixes #5388

db766c52

github-ci: use vardir option in tests runs · 474eda49

Alexander V. Tikhonov authored 3 years ago

Got warning message:

  [014] WARGING: unix socket's "/home/ubuntu/actions-runner/_work/tarantool/tarantool/test/var/014_box/gh-5422-broken_snapshot.socket-iproto" path has length 108 symbols that is longer than 107. That likely will cause failing of tests.

It caused the following fail:

  [038] Starting instance autobootstrap_guest1...
  [038] Start failed: builtin/box/console.lua:865: failed to create server unix/:/home/ubuntu/actions-runner/_work/tarantool/tarantool/test/var/038_replication/autobootstrap_guest1.socket-admin: No buffer space available

To avoid of it use vardir option in tests runs to decrease paths length.

Closes tarantool/tarantool-qa#104

474eda49

github-ci: avoid of use container tags in actions · 58fe0fcb

Alexander V. Tikhonov authored 3 years ago

Changed the following workflows:

  luacheck
  debug_coverage
  release*
  static_build
  static_build_cmake_linux

It was changed the OS in which the test run from debian to ubuntu.
Also changed the way how this OS was booted - before the change it
was booted as docker container using Github Actions tag from inside
the worklfows. And it caused all the workflow steps to be run inside
it. After the change no container run anymore on the running host.
Github Actions host uses for now with its native OS set in 'runs'
tag. It was decided to use the latest one OS `ubuntu-20.04` which is
already the default for 'ubuntu-latest' tag.

This change gave us the abilities to:
 - Remove extra container step in workflow.
 - Switch off swap using 'swapoff' command.
 - Use the same OS as Github Actions uses by default.
 - Setup our local hosts using Github Actions image snapshot.
 - Enable use of actions/checkout@v2.3.4 which is better than v1.
 - Light bootstrap of packages in local .*.mk makefile for:
     build: libreadline-dev libunwind-dev
     tests: pip install -r test-run/requirements.txt

Closes tarantool/tarantool-qa#101

58fe0fcb

Apr 02, 2021

vinyl: remove vylog newer than snap in casual recovery · 33254d91

Nikita Pettik authored 4 years ago

As a follow-up to the previous patch, let's check also emptiness of the
vylog being removed. During vylog rotation all entries are squashed
(e.g. "delete range" annihilates "insert range"), written to the new
vylog and at the end of new vylog SNAPSHOT marker is placed. If the last
entry in the vylog is SNAPSHOT, we can safely remove it without
hesitation.  So it is OK to remove it even during casual recovery
process. However, if it contains rows after SNAPSHOT marker, removal of
vylog may cause data loss. In this case we still can remove it only in
force_recovery mode.

Follow-up #5823

33254d91

vinyl: skip vylog if it's newer than snap · 149ccce9

Nikita Pettik authored 4 years ago

Having data in different engines checkpoint process is handled this way:
 - wait_checkpoint memtx
 - wait_checkpoint vinyl
 - commit_checkpoint memtx
 - commit_checkpoint vinyl

In contrast to commit_checkpoint which does not tolerate fails (if
something goes wrong e.g. renaming of snapshot file - instance simply
crashes), wait_checkpoint may fail. As a part of wait_checkpoint for
vinyl engine vy_log rotation takes place: old vy_log is closed and new
one is created. At this moment, wait_checkpoint of memtx engine has
already created new *inprogress* snapshot featuring bumped vclock.
While recovering from this configuration, vclock of the latest snapshot
is used as a reference.

At the initial recovery stage (vinyl_engine_begin_initial_recovery),
we check that snapshot's vclock matches with vylog's one (they should be
the same since normally vylog is rotated along with snapshot). On the
other hand, in the directory we have old snapshot and new vylog (and new
.inprogress snapshot). In such a situation recovery (even in force mode)
was aborted. The only way to fix this dead end, user has to manually
delete last vy_log file.

Let's proceed with the same resolution while user runs force_recovery
mode: delete last vy_log file and update vclock value. If user uses
casual recovery, let's print verbose message how to fix this situation
manually.

Closes #5823

149ccce9

errinj: introduce ERROR_INJECT_TERMINATE() macro · a240e019

Nikita Pettik authored 4 years ago

It is conditional injection that terminates execution calling assert(0)
if given condition is true. It is quite useful since allows us to
emulate situations when instance is suddenly shutdown: due to sigkill
for example.

Needed for #5823

a240e019

xlog: introduce xdir_remove_file_by_vclock() function · 4d6e2b73
Nikita Pettik authored 4 years ago
```
Needed for #5823
```
4d6e2b73
vinyl: make vy_log_begin_recovery() take force_recovery param · f729343c
Nikita Pettik authored 4 years ago
```
Needed for #5823
```
f729343c

sql: ignore \0 in string passed to Lua-function · 22e2e4ea

Mergen Imeev authored 3 years ago

Prior to this patch string passed to user-defined Lua-function from SQL
was cropped in case it contains '\0'. At the same time, it wasn't
cropped if it is passed to the function from BOX. After this patch the
string won't be cropped when passed from SQL if it contain '\0'.

Closes #5938

22e2e4ea

sql: ignore \0 in string passed to C-function · fa7e6f7d

Mergen Imeev authored 3 years ago

Prior to this patch string passed to user-defined C-function from SQL
was cropped in case it contains '\0'. At the same time, it wasn't
cropped if it is passed to the function from BOX. Now it isn't cropped
when passed from SQL.

Part of #5938

fa7e6f7d

Mar 31, 2021

github-ci: switch off swap use · fd6ee6d5

Alexander V. Tikhonov authored 4 years ago

Github Actions provides hosts for Linux base runners in the following
configurations:

2 Cores
7 Gb memory
4 Gb swap memory

To avoid of issues with hanging/slowing tests on high memory use
like [1], hosts configurations must avoid of swap memory use. All
of the tests workflows run inside dockers containers. This patch
sets in docker run configurations memory limits based on current
github actions hosts - 7Gb memory w/o swap memory increase.

Checked 10 full runs (29 workflows in each run used the change) and
got single failed test on gevent() routine in test-run. This result much
better than w/o this patch when 3-4 of workflows fail on each full run.

It could happen because swappiness set to default value:

cat /sys/fs/cgroup/memory/memory.swappiness
60

From documentation on swappiness [2]:

This control is used to define the rough relative IO cost of swapping
and filesystem paging, as a value between 0 and 200. At 100, the VM
assumes equal IO cost and will thus apply memory pressure to the page
cache and swap-backed pages equally; lower values signify more
expensive swap IO, higher values indicates cheaper.
Keep in mind that filesystem IO patterns under memory pressure tend to
be more efficient than swap's random IO. An optimal value will require
experimentation and will also be workload-dependent.

We may try to tune how often anonymous pages are swapped using the
swappiness parameter, but our goal is to stabilize timings (and make
them as predictable as possible), so the best option is to disable swap
at all and work on descreasing memory consumption for huge tests.

For Github Actions host configurations with 7Gb RAM it means that after
2.8Gb RAM was used swap began to use. But in testing we have some tests
that use 2.5Gb of RAM like 'box/net_msg_max.test.lua' and memory
fragmentation could cause after the test run swap use [3].

Also found that disk cache could use some RAM and it also was the cause
of fast memory use and start swapping. It can be periodically dropped
from memory [4] using 'drop_cache' system value setup, but it won't fix
the overall issue with swap use.

After freed cached pages in RAM another system kernel option can be
tuned [5][6] 'vfs_cache_pressure'. This percentage value controls the
tendency of the kernel to reclaim the memory which is used for caching
of directory and inode objects. Increasing it significantly beyond
default value of 100 may have negative performance impact. Reclaim code
needs to take various locks to find freeable directory and inode
objects. With 'vfs_cache_pressure=1000', it will look for ten times more
freeable objects than there are. This patch won't do this change, but
it can be done as the next change.

To fix the issue there were made changes:

- For jobs that run tests and use actions/environment and don't use
Github Actions container tag, it was set 'sudo swapoff -a' command
in actions/environment action.

- For jobs that run tests and use Github Actions container tag the
previous solution doesn't work. It was decided to hard-code the
memory value based on found on Github Actions hosts memory size
7Gb. It was set for Github container tag as additional options:
options: '--init --memory=7G --memory-swap=7G'
This changes were made temporary till these containers tags will
be removed within resolving tarantool/tarantool-qa#101 issue for
workflows:
debug_coverage
release
release_asan_clang11
release_clang
release_lto
release_lto_clang11
static_build
static_build_cmake_linux

- For VMware VMs like with FreeBSD added 'sudo swapoff -a' command
before build commands.

- For OSX on Github actions hosts swapping already disabled:
sysctl vm.swapusage
vm.swapusage: total = 0.00M used = 0.00M free = 0.00M (encrypted)
Also manual switching off swap currently not possible due to do
System Integrity Protection (SIP) must be disabled [7], but we
don't have such access on Github Actions hosts. For local hosts
it must be done manually with [8]:
sudo nvram boot-args="vm_compressor=2"
Added swap status control to be sure that host correctly configured:
sysctl vm.swapusage

Closes tarantool/tarantool-qa#99

[1]: https://github.com/tarantool/tarantool-qa/issues/93
[2]: https://github.com/torvalds/linux/blob/1e43c377a79f9189fea8f2711b399d4e8b4e609b/Documentation/admin-guide/sysctl/vm.rst#swappiness
[3]: https://unix.stackexchange.com/questions/2658/why-use-swap-when-there-is-more-than-enough-free-space-in-ram
[4]: https://kubuntu.ru/node/13082
[5]: https://www.kernel.org/doc/Documentation/sysctl/vm.txt
[6]: http://devhead.ru/read/uskorenie-raboty-linux
[7]: https://osxdaily.com/2010/10/08/mac-virtual-memory-swap/
[8]: https://gist.github.com/dan-palmer/3082266#gistcomment-3667471

fd6ee6d5

test: box-tap/gc -- add test for is_paused field · 83ec719c

Cyrill Gorcunov authored 4 years ago


Once simple bootstrap is complete and there is no
replicas used we should run with gc unpaused.

Part-of #5806

Acked-by: Serge Petrenko <sergepetrenko@tarantool.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

83ec719c

test: add a test for wal_cleanup_delay option · 5437afe2
Cyrill Gorcunov authored 4 years ago
```
Part-of #5806

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
```
5437afe2

gc/xlog: delay xlog cleanup until relays are subscribed · 2fd51aea

Cyrill Gorcunov authored 4 years ago


In case if replica managed to be far behind the master node
(so there are a number of xlog files present after the last
master's snapshot) then once master node get restarted it
may clean up the xlogs needed by the replica to subscribe
in a fast way and instead the replica will have to rejoin
reading a number of data back.

Lets try to address this by delaying xlog files cleanup
until replicas are got subscribed and relays are up
and running. For this sake we start with cleanup fiber
spinning in nop cycle ("paused" mode) and use a delay
counter to wait until relays decrement them.

This implies that if `_cluster` system space is not empty
upon restart and the registered replica somehow vanished
completely and won't ever come back, then the node
administrator has to drop this replica from `_cluster`
manually.

Note that this delayed cleanup start doesn't prevent
WAL engine from removing old files if there is no
space left on a storage device. The WAL will simply
drop old data without a question.

We need to take into account that some administrators
might not need this functionality at all, for this
sake we introduce "wal_cleanup_delay" configuration
option which allows to enable or disable the delay.

Closes #5806

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

@TarantoolBot document
Title: Add wal_cleanup_delay configuration parameter

The `wal_cleanup_delay` option defines a delay in seconds
before write ahead log files (`*.xlog`) are getting started
to prune upon a node restart.

This option is ignored in case if a node is running as
an anonymous replica (`replication_anon = true`). Similarly
if replication is unused or there is no plans to use
replication at all then this option should not be considered.

An initial problem to solve is the case where a node is operating
so fast that its replicas do not manage to reach the node state
and in case if the node is restarted at this moment (for various
reasons, for example due to power outage) then `*.xlog` files might
be pruned during restart. In result replicas will not find these
files on the main node and have to reread all data back which
is a very expensive procedure.

Since replicas are tracked via `_cluster` system space this we use
its content to count subscribed replicas and when all of them are
up and running the cleanup procedure is automatically enabled even
if `wal_cleanup_delay` is not expired.

The `wal_cleanup_delay` should be set to:

 - `0` to disable the cleanup delay;
 - `>= 0` to wait for specified number of seconds.

By default it is set to `14400` seconds (ie `4` hours).

In case if registered replica is lost forever and timeout is set to
infinity then a preferred way to enable cleanup procedure is not setting
up a small timeout value but rather to delete this replica from `_cluster`
space manually.

Note that the option does *not* prevent WAL engine from removing
old `*.xlog` files if there is no space left on a storage device,
WAL engine can remove them in a force way.

Current state of `*.xlog` garbage collector can be found in
`box.info.gc()` output. For example

``` Lua
 tarantool> box.info.gc()
 ---
   ...
   is_paused: false
```

The `is_paused` shows if cleanup fiber is paused or not.

2fd51aea

Mar 29, 2021

luajit: bump new version · cec37a6b

Igor Munkin authored 3 years ago

* tools: make memprof parser output user-friendly

Closes #5811
Part of #5657

cec37a6b

luajit: bump new version · f5a9ee1d

Igor Munkin authored 3 years ago

* memprof: report stack resizing as internal event

Closes #5842
Follows up #5442

f5a9ee1d

hotfix: change aligned_alloc to posix_memalign · 3c25c667

mechanik20051988 authored 4 years ago

Changed aligned_alloc to posix_memalign because in some
macOS systems aligned_alloc function is not available.

3c25c667

Mar 26, 2021

test: update test-run (--memtx-allocator) · 6d6a153b

Alexander Turenko authored 4 years ago

Added the `--memtx-allocator <string>` test-run option, which just sets
the `MEMTX_ALLOCATOR` environment variable. The variable is available in
the testing code (including instance files) and supposed to be used to
verify the upcoming `box.cfg{memtx_allocator = <small|system>}` option.

Alternatively one can just set the `MEMTX_ALLOCATOR` environment
variable manually.

Beware: The option does not set the allocator in tarantool automatically
in some way. Nope. A test should read the variable and set the box.cfg
option.

[1]: https://github.com/tarantool/test-run/pull/281

Part of #5419

6d6a153b

Mar 25, 2021

ssl_cert_paths_discover: delete unused headers · 23447252

HustonMmmavr authored 4 years ago

* Remove unnecessary `#include "tt_static.h"` from
  src/ssl_cert_paths_discover.c
* Fix typo at test/app-tap/ssl-cert-paths-discover.test.lua
  call `os.exit` instead of `os:exit`

A follow up on #5615

23447252

rfc: describe an inter-fiber debugger · c0e6748d

Sergey Ostanevich authored 4 years ago


Resolves #5857

Reviewed-by: Igor Munkin <imun@tarantool.org>
Signed-off-by: Igor Munkin <imun@tarantool.org>

c0e6748d

Mar 24, 2021

lib: fix memory leak in rope_insert · 51940800

Iskander Sagitov authored 4 years ago

Found that in case of exiting the rope_insert function with an error
some nodes are created but not deleted.

This commit fixes it and adds the test.

Test checks  that in case of this error the number of
allocated nodes and the number of freed nodes are the same.

Closes #5788

51940800

buffer: remove Lua registers · 911ca60e

Vladislav Shpilevoy authored 4 years ago

Lua buffer module used to have a couple of preallocated objects of
type 'union c_register'. It was a bunch of C scalar and array
types intended for use instead of ffi.new() where it was needed to
allocate a temporary object like 'int[1]' just to be able to pass
'int *' into a C function via FFI.

It was a bit faster than ffi.new() even for small sizes. For
instance (when JIT works), getting a register to use it as
'int[1]' cost around 0.2-0.3 ns while ffi.new('int[1]') costs
around 0.4 ns. Also the code looked cleaner.

But Lua registers were global and therefore had the same issue as
IBUF_SHARED and static_alloc() in Lua - no ownership, and sudden
reuse when GC starts right the register is still in use in some
Lua code. __gc handlers could wipe the register values making the
original code behave unpredictably.

IBUF_SHARED was fixed by proper ownership implementation, but it
is not necessary with Lua registers. It could be done with the
buffer.ffi_stash_new() feature, but its performance is about 0.8
ns which is worse than plain ffi.new() for simple scalar types.

This patch eliminates Lua registers, and uses ffi.new() instead
everywhere.

Closes #5632

911ca60e

sio: introduce and use sio_snprintf() · fde44b56

Vladislav Shpilevoy authored 4 years ago

sio_strfaddr() can't be used in the places where static buffer
is not acceptable - in any code which wants to push the value to
Lua, or the address string must be long living.

The patch introduces sio_snprintf(), which does the same, but
saves the result into a provided buffer with a limited size.

In the Lua C code the patch saves the address string on the stack
which makes it safe against Lua GC interruptions.

Part of #5632

fde44b56

sio: increase SERVICE_NAME_MAXLEN size · 6b331f7a

Vladislav Shpilevoy authored 4 years ago

It was 32, and couldn't fit long IPv6 and Unix socket addresses.

The patch makes it 200 so now it fits any supported addresses
family used in the code.

Having SERVICE_NAME_MAXLEN valid is necessary to be able to save
a complete address string on the stack in the places where the
static buffer returned by sio_strfaddr() can't be used safely. For
instance, in the code working with Lua due to Lua GC which might
be invoked any moment and in a __gc handler could overwrite the
static buffer.

Needed for #5632

6b331f7a

sio: rework sio_strfaddr() · 441cb814

Vladislav Shpilevoy authored 4 years ago

The function was overcomplicated, and made it harder to update it
in the next patches with functional changes.

The main source of the complication was usage of both inet_ntoa()
and getnameinfo(). The latter is more universal, it can cover the
case of the former.

The patch makes it use only getnameinfo() for IP addresses
regardless of v4 or v6.

Needed for #5632

441cb814

lua: use lua_pushfstring() instead of tt_sprintf() · b3872a38

Vladislav Shpilevoy authored 4 years ago

In a few places to push a formatted string was used 2 calls:
tt_sprintf() + lua_pushstring(). It wasn't necessary because Lua
API has lua_pushfstring() with a big enough subset of printf
format features.

But more importantly - it was a bug. lua_pushstring() is a GC
point. Before copying the passed string it tries to invoke Lua GC,
which might invoke a __gc handler for some cdata, where static
alloc might be used, and it can rewrite the string passed to
lua_pushstring() in the beginning of the stack.

Part of #5632

b3872a38

buffer: remove static_alloc() from Lua · ae1821fe

Vladislav Shpilevoy authored 4 years ago

Static_alloc() uses a fixed size circular BSS memory buffer. It is
often used in C when need to allocate something of a size smaller
than the static buffer temporarily. And it was thought that it
might be also useful in Lua when backed up by ffi.new() for large
allocations.

It was useful, and faster than ffi.new() on sizes > 128 and less
than the static buffer size, but it wasn't correct to use it. By
the same reason why IBUF_SHARED global variable should not have
been used as is. Because without a proper ownership the buffer
might be reused in some unexpected way.

Just like with IBUF_SHARED, the static buffer could be reused
during Lua GC in one of __gc handlers. Essentially, at any moment
on almost any line of a Lua script.

IBUF_SHARED was fixed by proper ownership implementation, but it
is not possible with the static buffer. Because there is no such a
thing like a static buffer object which can be owned, and even if
there would be, cost of its support wouldn't be much better than
for the new cord_ibuf API. That would make the static buffer close
to pointless.

This patch eliminates static_alloc() from Lua, and uses cord_ibuf
instead almost everywhere except a couple of places where
ffi.new() is good enough.

Part of #5632

ae1821fe

uri: replace static_alloc with ffi stash and ibuf · 7175b43e

Vladislav Shpilevoy authored 4 years ago

static_alloc() appears not to be safe to use in Lua, because it
does not provide any ownership protection for the returned values.

The problem appears when something is allocated, then Lua GC
starts, and some __gc handlers might also use static_alloc(). In
Lua and in C - both lead to the buffer being corrupted in its
original usage place.

The patch is a part of activity of getting rid of static_alloc()
in Lua. It removes it from uri Lua module and makes it use the
new FFI stash feature, which helps to cache frequently used and
heavy to allocate FFI values.

In one place static_alloc() was used for an actual buffer - it was
replaced with cord_ibuf which is equally fast when preallocated.

ffi.new() for temporary struct uri is not used, because

- It produces a new GC object;

- ffi.new('struct uri') costs around 20ns while FFI stash
  costs around 0.8ns. The hack with 'struct uri[1]' does not help
  because size of uri is > 128 bytes;

- Without JIT ffi.new() costs about the same as the stash, not
  better as well;

The patch makes uri perf a bit better in the places where
static_alloc() was used, because its cost was around 7ns for one
allocation.

7175b43e

uuid: drop tt_uuid_str() from Lua · acf8745e

Vladislav Shpilevoy authored 4 years ago

The function converts struct tt_uuid * to a string. The string is
allocated on the static buffer, which can't be used in Lua due to
unpredictable GC behaviour. It can start working any moment even
if tt_uuid_str() has returned, but its result wasn't passed to
ffi.string() yet. Then the buffer might be overwritten.

Lua uuid now uses tt_uuid_to_string() which does the same but
takes the buffer pointer. The buffer is stored in an ffi stash,
because it is x4 times faster than ffi.new('char[37]') (where 37
is length of a UUID string + terminating 0) (2.4 ns vs 0.8 ns).

After this patch UUID is supposed to be fully compatible with Lua
GC handlers.

Part of #5632

acf8745e