Commits · 6a60768cc7cd70aad3999dbb4caa371078cd28c0 · core / tarantool

Sep 13, 2022

fiber: add fiber_set_ctx & fiber_get_ctx functions · 6a60768c

Georgy Moshkin authored 3 years ago

Before this change there was no way to create a fiber that accepts
parameters without yielding from the current fiber using the c api. You
could pass the function arguments when calling fiber_start, but that
forces you to yield, which is not acceptable in some scenarios (e.g.
within a transaction).

This commit introduces 2 new functions to the api: fiber_set_ctx for
setting an pointer to a context of the given fiber and fiber_get_ctx for
accessing that context.

Closes https://github.com/tarantool/tarantool/issues/7669

@TarantoolBot document
Title: fiber: add fiber_set_ctx & fiber_get_ctx functions

Add 2 api functions: `fiber_set_ctx` & `fiber_get_ctx` which can be used
for passing data to a fiber. Previously this could be done via the
`fiber_start` function, except that this would force the current fiber
to yield, which is not acceptable in some scenarios (e.g. during a
transaction). Now you can create a fiber with `fiber_new`, set it's
contents with `fiber_set_ctx`, make it ready for execution with
`fiber_wakeup` and keep executing the current fiber.

6a60768c

test: slight refactoring of replication-py tests · d13b06bd

Yaroslav Lobankov authored 2 years ago

- Remove unused imports
- Remove unnecessary creation of 'replica' instance objects
- Use `<instance>.iproto.uri` object attribute instead of calling
  `box.cfg.listen` via admin connection

NO_DOC=testing stuff
NO_TEST=testing stuff
NO_CHANGELOG=testing stuff

d13b06bd

test: bump test-run to new version · 4335b442

Yaroslav Lobankov authored 2 years ago

Bump test-run to new version with the following improvements:

- Report job summary on GitHub Actions [1]
- Free port auto resolving for TarantoolServer and AppServer [2]

Also, this patch includes the following changes:

- removing `use_unix_sockets` option from all suite.ini config files
  due to permanent using Unix sockets for admin connection recently
  introduced in test-run
- switching replication-py tests to Unix sockets for iproto connection
- fixing replication-py/swap.test.py and swim/swim.test.lua tests

[1] tarantool/test-run#341
[2] tarantool/test-run#348

NO_DOC=testing stuff
NO_TEST=testing stuff
NO_CHANGELOG=testing stuff

4335b442

Sep 12, 2022

read_view: add option to include temporary spaces · fa56fbf2

Vladimir Davydov authored 2 years ago

 - Filter out temporary spaces on read view creation unless the
   read_view_opts::needs_temporary_spaces flag is set and drop
   temporary space filter from checkpoint and join code.
 - Pass read_view_opts to engine_create_read_view. In case of memtx,
   delay garbage collection of temporary tuples if the flag is set.

Needed for https://github.com/tarantool/tarantool-ee/issues/213

NO_DOC=internal
NO_TEST=ee
NO_CHANGELOG=internal

fa56fbf2

httpc: decode body in http response · 6a364cbc

Sergey Bronnikov authored 2 years ago

@TarantoolBot document
Title: Document a body decoding in http response

New method "response:decode()" has been introduced for a HTTP response.
It allows to decode body to a Lua object. Decoding depends on content
type passed in HTTP header and decoding function for it defined in table
`http.decoders` or `http.new().decoders`. Function `response:decode()`
will raise an error when content type is present in the response, but
there is no appropriate decoder.

By default decoders for the following content types are defined:

- to JSON with content-type "application/json"
- to MsgPack with content-type "application/msgpack"
- to YAML with content-type "application/yaml"

If a content type is not present in the response, :decode() assumes
"application/json".

```
tarantool> resp = require('http.client').put(
    'http://127.0.0.1:8080', '{"productId": 123456, "quantity": 100}')
---
...

tarantool> resp.body
---
- '{"productId": 123456, "quantity": 100}'
...

tarantool>
tarantool> resp:decode()
---
- productId: 123456
- quantity: 100
...

tarantool>
```

For content types missed in `http.decoders` user could define it's own
by defining a new record with a key equal to desired MIME type in
lowercase and it's decoding function that must accept HTTP body, content
type and must return a decoded value:

```
local http = require("http")
local xml = require("luarapidxml")

http.decoders = {
    ['application/xml'] = function(body, _content_type)
        return xml.decode(body)
    end,
}
```

Closes #6833

6a364cbc

httpc: encode body in http request · 4818e988

Sergey Bronnikov authored 2 years ago

client_object:request() could pass a body as a string value to HTTP
request when body is used. User must encode body to a string value as a
preparation step. However, we can do encoding for user
automatically when it possible.

@TarantoolBot document
Title: Document body encoding in http request

Now user could pass a "body" option as a value with standard Lua types
(table, nil, string, userdata, cdata, boolean or number) as well as
Tarantool's own data types like decimal, uuid, datetime. In such case
that value will be encoded to a string automatically where it possible.

Encoding depends on body's data type: `nil` converted to an empty
string, types (`string`, `number`, `boolean` and `nil`) will be
converted to a string using standard Lua function `tostring()`, types
cdata, userdata and tables will be encoded by functions defined in
`http.encoders` or `http.new().encoders` table. In latter case encoding
function depends on a content type passed to HTTP request using HTTP
header. Default content type "application/json" and appropriate encoder
are used when content type is missed in HTTP request.

By default encoders for the following content types are defined:

- "application/json"
- "application/msgpack"
- "application/yaml"

For content types missed in `http.encoders` user could define it's own
encoder by defining a new record with a key equal to desired MIME type
in lowercase and it's encoding function that must accept HTTP body and
content type and must return a string with serialized data:

```
local http = require("http")
local xml = require("luarapidxml")

http.encoders = {
    ['application/xml'] = function(body, _content_type)
        return xml.encode(body)
    end,
}
```

Be careful with defining your own encoding function - header with content
type could contain content type like "text/html" as well as a header
options [1], like "text/html; charset=UTF-8". Content type is passed to
encoder function *with* options.

JSON encoder will use default configuration settings defined in JSON
module (see `json.cfg()`). Header with content type will be set to
"application/json" if it was not defined by user in request's options.

```
local http = require("http")
local client = http.new()

local body = {
    a = 1,
    b = 2,
    c = 3,
}
client:request("POST", uri, body)
client:post(uri, body)
```

1. https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type

Part of #6833

4818e988

Use MT-Safe strerror_r instead of strerror · 44f46dc8

Vladimir Davydov authored 2 years ago

strerror() is MT-Unsafe, because it uses a static buffer under the hood.
We should use strerror_r() instead, which takes a user-provided buffer.
The problem is there are two implementations of strerror_r(): XSI and
GNU. The first one returns an error code and always writes the message
to the beginning of the buffer while the second one returns a pointer to
a location within the buffer where the message starts. Let's introduce a
macro HAVE_STRERROR_R_GNU set if the GNU version is available and define
tt_strerror() which writes the message to the static buffer, like
tt_cstr() or tt_sprintf().

Note, we have to export tt_strerror(), because it is used by Lua via
FFI. We also need to make it available in the module API header, because
the say_syserror() macro uses strerror() directly. In order to avoid
adding tt_strerror() to the module API, we introduce an internal helper
function _say_strerror(), which calls tt_strerror().

NO_DOC=bug fix
NO_TEST=code is covered by existing tests

44f46dc8

Sep 09, 2022

popen: fix a race between setpgrp() and killpg() · e2207fdc

Alexander Turenko authored 2 years ago

In brief: `vfork()` on Mac OS 12 and newer doesn't suspend the parent
process, so we should wait for `setpgrp()` to use `killpg()`. See more
detailed description of the problem in a comment of the
`popen_wait_group_leadership()` function.

The solution is to spin in a loop and check child's process group. It
looks as the most simple and direct solution. Other possible solutions
requires to estimate cons and pros of using extra file descriptor or
assigning a signal number for the child -> parent communication.

There are the following alternatives and variations:

* Create a pipe and notify the parent from the child about the
  `setpgrp()` call.

  It costs extra file descriptor, so I decided to don't do that.
  However if we'll need some channel to deliver information from the
  child to the parent for another task, it'll worth to reimplement this
  function too.

  One possible place, where we may need such channel is delivery of
  child's errors to the parent. Now the child writes them directly to
  logger's fd and it requires some tricky code to keep and close the
  descriptor at right points. Also it doesn't allow to catch those
  errors in the parent, but we may need it for #4925.
* Notify the parent about `setpgrp()` using a signal.

  It seems too greedly to assign a specific signal for such local
  problem. It is also unclear how to guarantee that it'll not break any
  user's code: a user can load a dynamic library, which uses some
  signals on its own.

  However we can consider using this approach here if we'll design some
  common interprocess notification system.
* We can use the fiber cond or the `popen_wait_timeout()` function from
  PR #7648 to react to the child termination instantly.

  It would complicate the code and anyway wouldn't allow to react
  instantly on `setpgrp()` in the child.

  Also it assumes yielding during the wait (see below).
* Wait until `setpgrp()` in `popen_send_signal()` instead of
  `popen_new()`.

  It would add yielding/waiting inside `popen_send_signal()` and likely
  will extend a set of its possible exit situations. It is undesirable:
  this function should have simple and predictable behavior.
* Finally, we considered yielding in `popen_wait_group_leadership()`
  instead of sleeping the whole tx thread.

  `<popen handle>:new()` doesn't yield at the moment and a user's code
  may lean on this fact.

  Yielding would allow to achieve better throughtput (amount of parallel
  requests per second), but we don't take much care to performance on
  Mac OS. The primary goal for this platform is to offer the same
  behavior as on Linux to allow development of applications.

I didn't replace `vfork()` with `fork()` on Mac OS, because `vfork()`
works and I don't know consequences of calling `pthread_atfork()`
handlers in a child created by popen. See the comment in `popen_new()`
near to `vfork()` call: it warns about possible mutex double locks. This
topic will be investigated further in #6674.

Fixes #7658

NO_DOC=fixes incorrect behavior, no need to document the bug
NO_TEST=already tested by app-tap/popen.test.lua

e2207fdc

Sep 07, 2022

raft: persist new term and vote separately · c9155ac8

Vladislav Shpilevoy authored 2 years ago

If a node persisted a foreign term + vote request at the same
time, it increased split-brain probability. A node could vote for
a candidate having smaller vclock than the local one. For example,
via the following scenario:

- Node1, node2, node3 are started;
- Node1 becomes a leader;
- The topology becomes node1 <-> node2 <-> node3 due to network
    issues;
- Node1 sends a synchro txn to node2. The txn starts a WAL write;
- Node3 bumps term and votes for self. Sends it all to node2;
- Node2 votes for node3, because their vclocks are equal;
- Node2 finishes all pending WAL writes, including the txn from
    node1. Now its vclock is > node3's one and the vote was wrong.
- Node3 wins, writes PROMOTE, and it conflicts with node1 writing
    CONFIRM.

This patch makes so a node can't persist a vote in a new term in
the same WAL write as the term bump. Term bump is written first
and alone. It serves as a WAL sync after which the node's vclock
is not supposed to change except for the 0 (local) component.

The vote requests are re-checked after term bump is persisted to
see if they still can be applied.

Part of #7253

NO_DOC=bugfix

c9155ac8

qsync: fix txn fiber hang on fencing at CONFIRM · ec628100

Vladislav Shpilevoy authored 2 years ago

If the limbo was fenced during CONFIRM WAL write, then the
confirmed txn was committed just fine, but its author-fiber kept
hanging. This is because when it was woken up, it checked if the
limbo is frozen and went to infinite waiting before actually
checking if the txn is completed.

The fiber would unfreeze if would be woken up explicitly as a
workaround.

The fix is simple - change the checks order.

Part of #7253

NO_DOC=bugfix

ec628100

promote: abort it when become non-candidate · ab08dad9

Vladislav Shpilevoy authored 2 years ago

box.ctl.promote() bumps the term, makes the node a candidate, and
waits for the term outcome. The waiting used to be until there is
a leader elected or the node lost connection quorum or the term
was bumped again.

There was a bug that a node could hang in box.ctl.promote() even
when became a voter. It could happen if the quorum was still there
and a leader couldn't be elected in the current term at all. For
instance, others could have `election_mode='off'`.

The fix is to stop waiting for the term outcome if the node can't
win anyway.

NO_DOC=bugfix

ab08dad9

promote: fix infinite elections with multi-promote · dd89c57e

Vladislav Shpilevoy authored 2 years ago

If box.ctl.promote() was called on more than one instance, then it
could lead to infinite or extremely long elections bumping
thousands of terms in just a few seconds.

This was because box.ctl.promote() used to be a loop. The loop
retried term bump + voted for self until the node won. Retry
happened immediately as the node saw the term was bumped again
and there was no leader elected or the connection quorum was lost.

If 2 nodes would start box.ctl.promote() almost at the same time,
they could bump each other's terms, not see any winner, bump them
again, and so on. For example:

- Node1 term=1, node2 term=2;
- Promote is called on both;
- Node1 term=2, node2 term=3. They receive the messages. Node2
    ignores node1's old term. Node1 term is bumped and it votes
    for node2, but it didn't win, so box.ctl.promote() bumps its
    term to 4.
- Node2 receives term 4 from node1. Its own box.ctl.promote() sees
    the term was bumped and no winner, so it bumps it to 5 and the
    process continues for a long time.

It worked good enough in tests - the problem happened sometimes,
terms could roll like 80k times in a few seconds, but the tests
ended fine anyway.

One of the next commits will make term bump + vote written in
separate WAL records. That aggravates the problem drastically.

Basically, this mutual term bump loop could end only if one node
would receive vote for self from another node and send back the
message 'I am a leader' before the other node's box.ctl.promote()
notices the term was bumped externally. This will get much harder
to achieve.

The patch simply drops the loop. Let box.ctl.promote() fail if the
term was bumped outside.

There was an alternative to keep running it in a loop with a
randomized election timeout like it works inside of raft. But the
current solution is just simpler.

NO_DOC=bugfix
NO_TEST=election_split_vote_test.lua catches it already

dd89c57e

read_view: add space upgrade function to space_read_view · fe594e3c

Vladimir Davydov authored 2 years ago

There may be a space upgrade in progress at the time when a read view is
created. In this case, we should apply the upgrade function to tuples
retrieved from the space read view. We can't just use the space upgrade
function as is, because it may be dropped while the read view is still
in use. So we need to create a special upgrade function for the read
view. This commit adds stubs for this, which will be implemented in the
EE repository.

Needed for https://github.com/tarantool/tarantool-ee/issues/163

NO_DOC=internal
NO_TEST=ee
NO_CHANGELOG=internal

fe594e3c

read_view: assert that read view is used only in one thread · 4156440a

Vladimir Davydov authored 2 years ago

Although currenlty feasible, using a read view from multiple threads
simultaneously is dangerous on its own because of inevitably arising
object life time issues. With introduction of space upgrade handling,
it'll become hardly possible, because we'll have to attach Lua function
to a read view, which can only be used from one thread. Let's add some
assertions to guarantee that a read view is used only from one thread.
To do that, we need to introduce the concept of an active read view:
from now on, a read view must be activated before it can be used and
deactivated before it can be closed; both operations must be done from
the same thread - the thread that accesses the read view.

https://github.com/tarantool/tarantool-ee/issues/163

NO_DOC=debug
NO_TEST=debug
NO_CHANGELOG=debug

4156440a

fiber: make fiber_set_cancellable a no-op · 20b06656

Ilya Verbin authored 2 years ago

Currently this function is not used inside Tarantool, however it is
available via C module API. Deprecate it, because it is very confusing
and has nothing to do with the fiber cancellation.

Closes #7166

@TarantoolBot document
Title: fiber: get rid of fiber_set_cancellable
Product: Tarantool
Since: 2.11
Audience/target: dev
Root document: https://www.tarantool.io/en/doc/latest/dev_guide/reference_capi/fiber/#c.fiber_set_cancellable
SME: @Gumix

This function is a no-op since 2.11 and should be dropped from the
documentation.

20b06656

main: allow spurious wakeups in on_shutdown_f · dd65d5c6

Ilya Verbin authored 2 years ago

It will yield until the is_shutting_down flag is set by tarantool_exit().
This allows to get rid of the FIBER_IS_CANCELLABLE flag, which is no
longer used anywhere in Tarantool.

Part of #7166

NO_DOC=internal
NO_CHANGELOG=internal

dd65d5c6

Sep 06, 2022

ci: add RedOS 7.3 rpm package build (x86_64) · a6b48f14

Sergey Vorontsov authored 2 years ago

Add the redos_7.3.yml workflow to build Tarantool packages (x86_64) for
the RedOS 7.3 system.

Packages are created by https://github.com/packpack/packpack.

NO_DOC=ci
NO_TEST=ci

a6b48f14

sql: fix drop of constraints during ADD COLUMN · ccb39117

Mergen Imeev authored 2 years ago

Prior to this patch, ALTER TABLE ADD COLUMN dropped all modified space
field constraints, which is a bug. This patch fixed this bug.

Part of #6986

NO_DOC=will be added later
NO_CHANGELOG=will be added later

ccb39117

Sep 05, 2022

test: use less unexpected name for non existent file · 393ffa82
Nikolay Shirokovskiy authored 2 years ago
```
NO_TEST=test changes
NO_CHANGELOG=test changes
NO_DOC=test changes
```
393ffa82

test: fix readline dependent tests · 55bab98c

Nikolay Shirokovskiy authored 2 years ago

The tests fail on dev installation if it's readline configuration
(.inputrc) changes prompt. Use default readline configuration for
the tests.

Closes #7620

NO_DOC=test changes
NO_TEST=test changes
NO_CHANGELOG=test changes

55bab98c

uri: fix resolve with only port specification · 96d8dcec

Ilya Grishnov authored 2 years ago

Supplemented the implementation of the `src/lib/uri` parser.
Before this fix a call `uri.parse(uri.format(uri.parse(3301)))`
returned an error of 'Incorrect URI'.
Now this call return correct `service: '3301'`.
As a result, the possibility of using host=localhost by default
for `tarantoolctl connect` has been restored now.
As well as for `console.connect`.

Fixes #7479

NO_DOC=bugfix

96d8dcec

test: always perform assertions in module API test · aaf3bf91

Alexander Turenko authored 2 years ago

This commit pursues several goals:

* Eliminate unused parameter/variable warnings at building module_api.c
  in non-debug configuration. The problem was introduced in commit
  5c1bc3da ("decimal: add the library into the module API").
* Eliminate a need to check newly added tests in two build
  configurations (Debug and RelWithDebInfo) and don't forget to add
  `(void)x;` statements in addition to a test condition check.
* Fail the testing if conditions required by the
  app-tap/module_api.test.lua test are not met -- not only in the Debug
  build, but also in RelWithDebInfo.

Fixes #7625

NO_DOC=a change in a test, purely development matter
NO_CHANGELOG=see NO_DOC

aaf3bf91

replication: retry join automatically · f2ad1dee

Yan Shtunder authored 2 years ago

If the error is non-critical, the instance retries join
automatically.

@TarantoolBot document
Title: Retry join automatically (for a timeout)

There are two types of errors: critical and non-critical.
You can recover from non-critical errors. For example, the
connection master turned out to be read-only. It looks like
a configuration error. If the error is non-critical, the
instance retries join automatically. After a critical error
there is no way to recover, because any of these mistakes
are irreparable anyway. For example, vinyl can create some
files. It's not clear what to do with them to try bootstrap
again.

Closes #6126

f2ad1dee

replication: reshuffle names of the state · 218a62c4

Yan Shtunder authored 2 years ago

In this patch will be introduced a new state for which the
name is suitable: `APPLIER_FETCH_SNAPSHOT`. But it's already
taken. The names of the state will be reshuffled a bit.

    `APPLIER_FETCH_SNAPSHOT -> APPLIER_WAIT_SNAPSHOT;`
    `APPLIER_INITIAL_JOIN -> APPLIER_WAIT_SNAPSHOT;`

Part of #6126

NO_DOC=preparatory commit
NO_CHANGELOG=preparatory commit
NO_TEST=preparatory commit

218a62c4

box: fix high CPU usage while on_shutdown triggers are running · 6d91e44b

Ilya Verbin authored 2 years ago


Currently this script causes 100% CPU usage for 10 sec, because
os.exit() infinitely yields to the scheduler until on_shutdown
fiber completes and breaks the event loop. Fix this by a sleep.

```
box.ctl.set_on_shutdown_timeout(100)
box.ctl.on_shutdown(function() require('fiber').sleep(10) end)
os.exit()
```

Closes #6801

NO_DOC=bugfix
NO_TEST=don't know how to catch this by a test

Co-authored-by: Georgy Moshkin <louielouie314@gmail.com>

6d91e44b

main: run an event loop for on_shutdown triggers · cdd5674c

Ilya Verbin authored 2 years ago

When Tarantool is stopped by Ctrl+D or by reaching the end of the
script, run_script_f() breaks the event loop, then tarantool_exit()
is called from main(), however the fibers that execute on_shutdown
triggers can not be longer scheduled, because the event loop is
already stopped. Fix this by starting an auxiliary event loop for
such cases.

Closes #7434

NO_DOC=bugfix

cdd5674c

Sep 02, 2022

func: copy function definition in func_new · ac5f303d

Vladimir Davydov authored 2 years ago

We need to duplicate a function for handling space upgrade in read view.
We can't just use func_new(func->def) to do this, because func_new sets
the given func_def to func->def, without copying. Usually, foo_new
duplicates the provided foo_def, e.g. see space_new. Let's make func_new
do the same.

Needed for https://github.com/tarantool/tarantool-ee/issues/163

NO_DOC=internal
NO_TEST=internal
NO_CHANGELOG=internal

ac5f303d

func: factor out func_def_new and func_def_delete · 1beb6891

Vladimir Davydov authored 2 years ago

func_def_new takes function id, name, body, comment, and owner id and
allocates a new func_def struct, setting the rest of the members to
their default values. We need this function to create a new func_def
object for handling space upgrade in read view.

Note, this isn't a pure refactoring - before this patch, we used
FUNC_LANGUAGE_LUA for SQL builtin functions, which were deprecated in
2.9. This worked fine, because we never actually called them - it was
needed solely for upgrade from older versions. In this commit, we create
an SQL builtin function just like any other function, but set its vtab
to a dummy, which raises an error on an attempt to call it. This should
make the code clearer.

Needed for https://github.com/tarantool/tarantool-ee/issues/163

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

1beb6891

func: add const qualifier to func_def where applicable · 0c3246de

Vladimir Davydov authored 2 years ago

There's a bunch of places where func_def is used read-only. Let's mark
them with const for the sake of the code clarity.

NO_DOC=code cleanup
NO_TEST=code cleanup
NO_CHANGELOG=code cleanup

0c3246de

func: don't set func->def in func_lua_new · 267016c9

Vladimir Davydov authored 2 years ago

func->def is supposed to be set by func_new, but func_lua_new sets it
for func_persistent_lua_load to work. Actually, there's no need to do
this, because we can simply pass func_def to func_persistent_lua_load
instead. Let's do this - this is needed to clean up func_def handling
in func_new, which in turn is required to make a copy of a space upgrade
function for read view.

Needed for https://github.com/tarantool/tarantool-ee/issues/163

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

267016c9

Revert "log: free resources while event loop is running" · 5cb688ed

Vladimir Davydov authored 2 years ago

This reverts commit 0c3f9b37.

If log_destroy and log_boot use the same fd (STDERR_FILENO), say()
called after say_logger_free() will write to a closed fd. What's worse,
the fd may be reused, in which case say() will write to a completely
unrelated file or socket (maybe a data file!). This is what happened
with flightrec - flightrec finalization info message was written to
an xlog file. Let's move say_logger_free() back to where it belongs -
after other subsystem has been finalized.

Reopens #4450
Needed for https://github.com/tarantool/tarantool-ee/issues/223

NO_DOC=bug fix
NO_TEST=revert
NO_CHANGELOG=unreleased

5cb688ed

Sep 01, 2022

memtx: reuse read views to prevent read_view_version wrap around · fe102ff7

Vladimir Davydov authored 2 years ago

The total number of read views that we can possibly (not necessarily
simultaneously) ever create is limited by UINT32_MAX, because we use
uint32_t for read view versioning and read view version must never wrap
around.

If read views were only used for making snapshots or joining replicas,
this would be fine, because even if we made a snaphost every second
(which is hardly possible), it'd take more than one hundred years for
the read view version to wrap around. However, if read views could be
created by users (which is our ultimate goal), they could get created as
often as every millisecond, which would reduce the wrap around window
down to one month, which is unacceptable.

Let's fix this issue by reusing the most recent read views in case it
was created less than 100 ms ago. The algorithm is described in the
comments to the code.

Closes #7189

NO_DOC=internal
NO_CHANGELOG=internal

fe102ff7

memtx: use MemtxAllocator::collect_garbage in unit test · d146adfe

Vladimir Davydov authored 2 years ago

Once we start reusing read views, we won't be able to allocate and free
a tuple to trigger garbage collection in tests, because the tuple may
get attached to the last read view. Let's make the collect_garbage()
method public and use it in tests.

While we are at it, let's also rewrite the destroy() method using
collect_garbage().

Needed for #7189

NO_DOC=refactoring
NO_CHANGELOG=refactoring

d146adfe

memtx: allow to delay deletion of temporary tuples · 16e892cd

Vladimir Davydov authored 2 years ago

Irrespective of whether there's an open read view or not, we always free
memtx tuples that come from temporary spaces immediately (see #3432).
This is acceptable if read views are only used for snapshotting or
replication, but to reuse the read view infrastructure for user read
views, we need to delay deletion of temporary tuples until all read
views that may access them have been closed.

The idea is to maintain independent lists of tuple garbage collection
arrays for temporary and normal tuples. If a read view doesn't need to
access temporary tuples, we create one garbage collection array for it,
otherwise we create two garbage collection arrays. When we free a tuple,
we choose a garbage collection array for it looking at its type.

Closes #7412

NO_DOC=internal
NO_CHANGELOG=internal

16e892cd

memtx: optimize tuple garbage collection · 04e25c09

Vladimir Davydov authored 2 years ago

Currently, tuples are never garbage collected if the number of open read
views stays above zero, even if they can't possibly be accessed from any
read view (e.g. were freed before the oldest read view was created).
This commit fixes this issue by introducing per read view tuple garbage
collection lists. The algorithm is described in the comments to the
code.

Closes #7185

NO_DOC=internal
NO_CHANGELOG=internal

04e25c09

memtx: rename MemtxAllocator::snapshot_version to read_view_version · 399d0026

Vladimir Davydov authored 2 years ago

Because we will use this version for user read views, not just
snapshots. Comments are updated as well.

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

399d0026

Aug 31, 2022

read_view: add tuple_format to space_read_view · 08070d45

Vladimir Davydov authored 2 years ago

To support accessing tuple fields by name, we create a runtime tuple
format for each space read view, using the space field names to
initialize the field dictionary.

Note, we can't reuse the space tuple format as is, because it allocates
tuples from the engine arena, which is single-threaded, while a read
view may be used from any thread, not just tx. The runtime arena is
single-threaded as well, but we will make it per-thread in future.

Note, we can't even reuse tuple field dictionary - we create a new one
using the space definition instead - because the dictionary is in fact
mutable - it may be changed when the space definition is altered, see
tuple_dictionary_swap.

Good news is runtime tuple formats are reusable so if we create several
read views of the same space, they will all use the same tuple format.

Since this feature isn't required for snapshots/replication, we add
a read view option to enable it - read_view_opts::needs_field_names.

Needed for https://github.com/tarantool/tarantool-ee/issues/207

NO_DOC=internal
NO_TEST=ee
NO_CHANGELOG=internal

08070d45

tuple: add helper for creating runtime tuple format with field names · ab5010c6

Vladimir Davydov authored 2 years ago

Currently, the helper is used only for creation of a tuple format for
Lua (needed for net.box schema). Later on, we will reuse this helper for
creating tuple formats for user read views.

Needed for https://github.com/tarantool/tarantool-ee/issues/207

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

ab5010c6

index: add space_read_view pointer to index_read_view · 48950020

Vladimir Davydov authored 2 years ago

This will let us access space read view format (added later) from
the code that uses an index read view.

Needed for https://github.com/tarantool/tarantool-ee/issues/207

NO_DOC=internal
NO_TEST=ee
NO_CHANGELOG=internal

48950020

index: add index_read_view pointer to index_read_view_iterator · f4f1659d

Vladimir Davydov authored 2 years ago

We store a pointer to index read view in each implementation class,
anyway. Let's move it to the base class - this way we'll be able to
access space read view format (added later) from the code that uses
a read view iterator.

Needed for https://github.com/tarantool/tarantool-ee/issues/207

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

f4f1659d