Commits · 9d42ad47720f4a7f1ec2d7e90889ded4bd9c5f1c · core / tarantool

Aug 14, 2021

txm: avoid excess conflict while reading gaps · 9d42ad47

During iteration a memtx tree index must write gap records to TX
manager. It is done in order to detect the further writes to that
gaps and execute some logic preventing phantom reads.

There are two cases when that gap is stores:
 * Iterator reads the next tuple, the gap is between two tuples.
 * Iterator finished reading, the gap is between the previous
tuple and the key boundary.

By a mistake these two cases were not distinguished correctly and
that led to excess conflicts.

This patch fixes it.

Part of #6206

9d42ad47

txm: simplify construction of tx_read_tracker · b6fab015
Aleksandr Lyapunov authored 3 years ago
```
Just add a function that allocates and initializes the structure.
No logical changes.

Part of #6206
```
b6fab015
txm: split memtx_tx_track_read method into two parts · 2bf4484f
Aleksandr Lyapunov authored 3 years ago
```
No logical changes, only for the next commit simplification

Part of #6206
```
2bf4484f

txm: simplify code with check_dup_common function · eebf7ba0

Aleksandr Lyapunov authored 3 years ago

Implement check_dup_common function that calls either
check_dup_clean or check_dup_dirty.

No logical changes.

Follow up #6132

eebf7ba0

txm: rewrite and refactor mvcc code · d0bc565c

Aleksandr Lyapunov authored 3 years ago

There were several problems that was connected with broken
pointers in tuple history. Another problems is that that code
was quite huge and difficult to understand.

This patch refactors all the code that is connected to lists of
stories in history. A bunch of helper function was added and in
fact these functions was carefully rewtitten:
 * memtx_tx_history_add_stmt
 * memtx_tx_history_rollback_stmt
 * memtx_tx_history_prepare_stmt
 * memtx_tx_history_commit_stmt

In addition to refactoring a couple of significant changes was
made to the logic:
 * Now del_story in statement point to story of the tuple that
was effectively deleted by this statement.
 * Conflicts in secondary indexes (that were previously named as
'cross coflicts' now handled transparently during statement
preparation.

Closes #6132
Closes #6021

d0bc565c

txm: remove redundant _story suffix from memtx_tx_story_link · b392839b

Aleksandr Lyapunov authored 3 years ago

Once there were two type of links of a story: with a clean tuple
and with other story, that's why there was two similar functions
that were used for linking, with differen suffic in names. After
some refactoring the linkage with tuples was removed, so now
a story can only be linked with another story. The time has come
to remove meaningless suffix too.

Part of #6132

b392839b

txm: add more comments and rename a bit · ef59b082
Aleksandr Lyapunov authored 3 years ago
```
Part of #6132
```
ef59b082

txm: link all transaction into list · 29f4e457

Nikita Pettik authored 3 years ago

We are going to abort all transaction on any DDL commit except for TX
owning that change. So let's link all transaction into rlist.

Needed for #5998

29f4e457

test: move all ddl operations outside of transactions · 393330f9

EvgenyMekhanik authored 3 years ago

To fix some problems in the transaction manager we disallow
yields after DDL operation in TX. Thus, we can't longer perform
ddl operations in streams.

Needed for #5998

393330f9

Aug 13, 2021

lua: refactor port_lua_do_dump and encode_lua_call · 027775ff

Sergey Kaplun authored 3 years ago

The old code flow was the following:

1) `struct port_lua` given to `port_lua_do_dump()` has Lua stack with
   arguments to encode to MessagePack.

2) The main coroutine `tarantool_L` is used to call `encode_lua_call()`
   or `encode_lua_call_16`() via `lua_cpcall()`.

3) Objects on port coroutine are encoded via `luamp_encode()` or
   `luamp_encode_call16()`.

4) This encoding may raise an error on unprotected `port->L` coroutine.
   This coroutine has no protected frame on it and this call should fail
   in pure Lua.

Calling anything on unprotected coroutine is not allowed in Lua [1]:

| If an error happens outside any protected environment, Lua calls a
| panic function

Lua 5.1 sets protection only for specific lua_State [2] and calls a
panic function if we raise an error on unprotected lua_State [3].

Nevertheless, no panic occurs now due to two facts:
* The first one is LuaJIT's support of C++ exception handling [4] that
  allows to raise an error in Lua and catch it in C++ or vice versa. But
  documentation still doesn't allow raising errors on unprotected
  coroutines (at least we must use try-catch block).
* The second one is the patch made in LuaJIT to restore currently
  executed coroutine, when C function or fast function raises an
  error [5][6] (see the related issue here [7][8]).

For these reasons, when an error occurs, the unwinder searches and finds
the C-protected stack frame from the `lua_cpcall()` for `tarantool_L`
coroutine and unwinds until that point (without aforementioned patches
LuaJIT just calls a panic function and exits).

If an error is raised, and `lua_cpcall()` returns not `LUA_OK`, then the
error from `port->L` coroutine is converted into a Tarantool error and a
diagnostic is set.

Such auxiliary usage of `tarantool_L` is not idiomatic for Lua.
Internal unwinder used on M1 is not such flexible, so such misuse leads
to panic call. Also the `tarantool_L` usage is redundant. So this patch
drops it and uses only port coroutine instead with `lua_pcall()`.

Functions to encode are saved to the `LUA_REGISTRY` table to
reduce GC pressure, like it is done for other handlers [9].

[1]: https://www.lua.org/manual/5.2/manual.html#4.6
[2]: https://www.lua.org/source/5.1/lstate.h.html#lua_State
[3]: https://www.lua.org/source/5.1/ldo.c.html#luaD_throw
[4]: https://luajit.org/extensions.html#exceptions
[5]: https://github.com/tarantool/luajit/commit/ed412cd9f55fe87fd32a69c86e1732690fc5c1b0
[6]: https://github.com/tarantool/luajit/commit/97699d9ee2467389b6aea21a098e38aff3469b5f
[7]: https://github.com/tarantool/tarantool/issues/1516
[8]: https://www.freelists.org/post/luajit/Issue-with-PCALL-in-21
[9]: https://github.com/tarantool/tarantool/commit/e88c0d21ab765d4c53bed2437c49d77b3ffe4216



Closes #6248
Closes #4617

Reviewed-by: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>
Reviewed-by: Igor Munkin <imun@tarantool.org>
Signed-off-by: Igor Munkin <imun@tarantool.org>

027775ff

net.box: add interactive transaction support in net.box · f9ca802a

mechanik20051988 authored 3 years ago

Implement `begin`, `commit` and `rollback` methods for stream object
in `net.box`, which allows to begin, commit and rollback transaction
accordingly.

Closes #5860

@TarantoolBot document
Title: add interactive transaction support in net.box
Implement `begin`, `commit` and `rollback` methods for stream object
in `net.box`, which allows to begin, commit and rollback transaction
accordingly. Now there are multiple ways to begin, commit and rollback
transaction from `net.box`: using appropriate stream methods, using 'call`
or 'eval' methods or using `execute` method with sql transaction syntax.
User can mix these methods, for example, start transaction using
`stream:begin()`, and commit transaction using `stream:call('box.commit')`
or stream:execute('COMMIT').
Simple example of using interactive transactions via iproto from net.box:
```lua
stream = conn:new_stream()
space = stream.space.test
space_not_from_stream = conn.space.test

stream:begin()
space:replace({1})
-- return previously inserted tuple, because request
-- belongs to transaction.
space:select({})
-- empty select, because select doesn't belongs to
-- transaction
space_not_from_stream:select({})
stream:call('box.commit')
-- now transaction was commited, so all requests
-- returns tuple.
```
Different examples of using streams you can find in
gh-5860-implement-streams-in-iproto.test.lua

f9ca802a

iproto: implement interactive transactions over iproto streams · 48c8dc18

mechanik20051988 authored 3 years ago

Implement interactive transactions over iproto streams. Each stream
can start its own transaction, so they allows multiplexing several
transactions over one connection. If any request fails during the
transaction, it will not affect the other requests in the transaction.
If disconnect occurs when there is some active transaction in stream,
this transaction will be rollbacked, if it does not have time to commit
before this moment.

Part of #5860

@TarantoolBot document
Title: interactive transactions was implemented over iproto streams.
The main purpose of streams is transactions via iproto. Each stream
can start its own transaction, so they allows multiplexing several
transactions over one connection. There are multiple ways to begin,
commit and rollback transaction: using IPROTO_CALL and IPROTO_EVAL
with corresponding function (box.begin, box.commit and box.rollback),
IPROTO_EXECUTE with corresponding sql request ('TRANSACTION START',
'COMMIT', 'ROLLBACK') and IPROTO_BEGIN, IPROTO_COMMIT, IPROTO_ROLLBACK
accordingly. If disconnect occurs when there is some active transaction
in stream, this transaction will be rollbacked, if it does not have time
to commit before this moment. Add new command codes for begin, commit and
rollback transactions: `IPROTO_BEGIN 14`, `IPROTO_COMMIT 15` and
`IPROTO_ROLLBACK 16` accordingly.

48c8dc18

iproto: add RAFT prefix for all requests related to 'raft'. · 6c2eb11a

mechanik20051988 authored 3 years ago

Adding interactive transactions over iproto streamss requires
adding new request types for begin, commit and rollback them.
The type names of these new requests conflict with the existing
names for the 'raft' requests. Adding RAFT prefix for all requests
related to 'raft' resolves this problem.

Part of #5860

@TarantoolBot document
Title: add RAFT prefix for all requests related to 'raft'.
Rename IPROTO_PROMOTE, IPROTO_DEMOTE, IPROTO_CONFIRM and
IPROTO_ROLLBACK to IPROTO_RAFT_PROMOTE, IPROTO_RAFT_DEMOTE,
IPROTO_RAFT_CONFIRM and IPROTO_RAFT_ROLLBACK accordingly.

6c2eb11a

net.box: add stream support to net.box · 0084f903

mechanik20051988 authored 3 years ago

Add stream support to `net.box`. In "net.box", stream
is an object over connection that has the same methods,
but all requests from it sends with non-zero stream ID.
Since there can be a lot of streams, we do not copy the
spaces from the connection to the stream immediately when
creating a stream, but do it only when we first access space.
Also, when updating the schema, we update the spaces in lazy
mode: each stream has it's own schema_version, when there is
some access to stream space we compare stream schema_version
and connection schema_version and if they are different update
clear stream space cache and wrap space that is being accessed
to stream cache.

Part of #5860

@TarantoolBot document
Title: stream support was added to net.box
In "net.box", stream is an object over connection that
has the same methods, but all requests from it sends
with non-zero stream ID. Stream ID is generated on the
client automatically. Simple example of stream creation
using net.box:
```lua
stream = conn:new_stream()
-- all connection methods are valid, but send requests
-- with non zero stream_id.
```

0084f903

iproto: implement streams in iproto · 711cca10

mechanik20051988 authored 3 years ago

Implement streams in iproto. There is a hash table of streams for
each connection. When a new request comes with a non-zero stream ID,
we look for the stream with such ID in this table and if it does not
exist, we create it. The request is placed in the queue of pending
requests, and if this queue was empty at the time of its receipt, it
is pushed to the tx thread for processing. When a request belonging to
stream returns to the network thread after processing is completed, we
take the next request out of the queue of pending requests and send it
for processing to tx thread. If there is no pending requests we remove
stream object from hash table and destroy it. Requests with zero stream
ID are processed in the old way.

Part of #5860

@TarantoolBot document
Title: streams are implemented in iproto
A distinctive feature of streams is that all requests in them
are processed sequentially. The execution of the next request
in stream will not start until the previous one is completed.
To separate requests belonging to and not belonging to streams
we use stream ID field in binary iproto protocol: requests with
non-zero stream ID belongs to some stream. Stream ID is unique
within the connection and indicates which stream the request
belongs to. For streams from different connections, the IDs may
be the same.

711cca10

salad: fix segfault in case when mhash table allocation failure · a18741b0

mechanik20051988 authored 3 years ago

There was no check for successful memory allocation in `new` and `clear`
functions for mhash table. And if the memory was not allocated, a null
pointer dereference occured.

a18741b0

iproto: implement stream id in binary iproto protocol · e0bac737

mechanik20051988 authored 3 years ago

For further implementation of streams, we need to separate
requests belonging to and not belonging to streams. For this
purpose, the stream ID field was added to the iproto binary
protocol. For requests that do not belong to stream, this field
is omitted or equal to zero. For requests belonging to stream,
we use this field to determine which stream the request belongs to.

Part of #5860

@TarantoolBot document
Title: new field in binary iproto protocol

Add new field to binary iproto protocol.
`IPROTO_STREAM_ID 0x0a` determines whether a request
belongs to a stream or not. If this field is omited
or equal to zero this request doesn't belongs to stream.

e0bac737

iproto: clear request::header for client requests · 4fefb519

Vladimir Davydov authored 3 years ago

To apply a client request, we only need to know its type and body. All
the meta information, such as LSN, TSN, or replica id, must be set by
WAL. Currently, however, it isn't necessarily true: iproto leaves a
request header received over iproto as is, and tx will reuse the header
instead of allocating a new one in this case, which is needed to process
replication requests, see txn_add_redo().

Unless a client actually sets one of those meta fields, this causes no
problems. However, if we added transaction support to the replication
protocol, reusing the header would result in broken xlog, because
currently, all requests received over iproto have the is_commit field
set in xrow_header for the lack of TSN, while is_commit must only be set
for the final statement in a transaction. One way to fix it would be
clearing is_commit explicitly in iproto, but ignoring the whole header
received over iproto looks more logical and error-proof.

Needed for #5860

4fefb519

xrow: remove unused call_request::header · a397562e
Vladimir Davydov authored 3 years ago

a397562e

Aug 12, 2021

export: remove "error_unpack_unsafe" from "exports" · f2569702

Leonid Vasiliev authored 3 years ago

"error_unpack_unsafe" was removed from export in commit [1]
and accidentally reanimated during rebase [2].
Let's remove "error_unpack_unsafe" from "exports".

1. https://github.com/tarantool/tarantool/commit/6aafa697e1ec8166df721573195711cea5ec3135
2. https://github.com/tarantool/tarantool/commit/5ceabb378d0169dc776449e45577515114e39f12

Follow-up #5932

f2569702

build: fix build on Linux ARM64 with CMAKE_BUILD_TYPE=Debug · 224cb68c

Andrey Saranchin authored 3 years ago

Fix build errors on arm64 with CMAKE_BUILD_TYPE=Debug.

Despite doubts about the correctness of http parser, keep the principle
of its work and unify behavior whether plain char is signed or unsigned.

Closes #6143

224cb68c

replication: fix flaky gh-3055-election-promote test · 1df99600

Serge Petrenko authored 3 years ago

Found the following error in our CI:

[001] Test failed! Result content mismatch:
[001] --- replication/gh-3055-election-promote.result	Mon Aug  2 17:52:55 2021
[001] +++ var/rejects/replication/gh-3055-election-promote.reject	Mon Aug  9 10:29:34 2021
[001] @@ -88,7 +88,7 @@
[001]   | ...
[001]  assert(not box.info.ro)
[001]   | ---
[001] - | - true
[001] + | - error: assertion failed!
[001]   | ...
[001]  assert(box.info.election.term > term)
[001]   | ---
[001]

The problem was the same as in recently fixed election_qsync.test
(commit 096a0a7d): PROMOTE is written to
WAL asynchronously, and box.ctl.promote() returns earlier than this
happens.

Fix the issue by waiting for the instance to become writeable.

Follow-up #6034

1df99600

box: implement compact mode in tuples · 74177dd8

Aleksandr Lyapunov authored 3 years ago

Tuple are designed to store (almost) any sizes of msgpack data
and rather big count of field offsets. That requires data_offsert
and bsize members of tuples to be rather large - 16 and 32 bits.

That is good, but the problem is that in cases when the majority
of tuples are small that price is significant.

This patch introduces compact tuples: if tuple data size and its
offset table are small - both tuple_offset and bsize are stored in
one 16 bit integer and that saves 4 bytes per tuple.

Compact tuples are used for memtx and runtime tuples. They are not
implemented for vinyl, because in contrast to memtx vinyl stores
engine specific fields after struct tuple and thus requires
different approach for compact tuple.

Part of #5385

74177dd8

box: make a pair of struct tuple's members private · 65c4d37e

Aleksandr Lyapunov authored 3 years ago

There's tuple_offset and bsize members in tuple. For better code
incapsulation in general and for futher changes in particular they
should be incapsulated within struct tuple. This commit provides
that.

There's no functional changes in this commit.

Part of #5385

65c4d37e

box: move is_dirty flag in tuple to appropriate section · 24c31bff
Aleksandr Lyapunov authored 3 years ago
```
Now we have a place in tuple for different flags, move is_dirty
there.

Part of #5385
```
24c31bff

box: rework tuple reference count · 45269211

Aleksandr Lyapunov authored 3 years ago

Tuples are usually have a very low reference counter (I bet the
majority of tuple have it less than 10), and we may rely on the
fact in optimization issues. On the other hand it is not actually
prohibited for a tuple to have a big reference counter, thus the
code must handle it properly.

The obvious solution is to store narrow reference counter right
in struct tuple, and store it somewhere else if it hits threshold.

The previous implementation has a 15 bit counter and 1 bit flag the
that actual counter is stored in separate array. That worked fine
except 15 bits are still an overkill for real reference counts.
And that solution introduced unions into struct tuple, which in
turn, generally speaking, causes an UB since by standard it is
an UB to access one union part after setting other.

The new solution is to store 8 bit counter and 1 bit flag. The
external storage is made as hash table to which a portion of the
counter is uploaded (or acquire) very seldom. That makes the
counter in tuple more compact, rather fast (and even fastest for
low reference counter values) and has no limitation such as
limited count of tuples that can have big reference counts.

Part of #5385

45269211

vinyl: use simple struct member for n_upserts · 60a2e9b9

Aleksandr Lyapunov authored 3 years ago

Now it is stored in memory right before struct struct vy_stmt,
consuming 8 bytes per tuple (due to alignment).

It is much more simple and completely free to store it right in
member of vy_stmt.

Part of #5385

60a2e9b9

vinyl: save 8 bytes in struct vy_stmt · b9c598ec

Aleksandr Lyapunov authored 3 years ago

Due to C/C++ standard layout sizeof(struct vy_stmt) was 32 bytes.
Is a pity since it has only 20 bytes of payload (10 byte for base
struct tuple and 10 for lsn (8) + type (1) + flags (1)).

Repack struct vy_stmt to be 24 bytes long.

Part of #5385

b9c598ec

perf: introduce tuple perf test · 3dea259c
Aleksandr Lyapunov authored 3 years ago
```
Part of #5385
```
3dea259c

Aug 11, 2021

luajit: bump new version · e42d116f

Igor Munkin authored 3 years ago

* ARM64: Fix write barrier in BC_USETS.
* Linux/ARM64: Make mremap() non-moving due to VA space woes.
* Add support for full-range 64 bit lightuserdata.

Closes #2712
Needed for #6154
Part of #5629

e42d116f

replication: fill replicaset.applier.vclock after local recovery · 68851b35

Yan Shtunder authored 3 years ago

replicaset.applier.vclock is initialized in replication_init(),
which happens before local recovery. If some changes are come
from an instance via replication the applier.vclock will be equal 0.
This means that if some wild master will send this node already applied
data, the node will apply the same data twice.

Closes #6028

68851b35

Aug 10, 2021

sql: remove unnecessary function initialization · dd7fa342

Mergen Imeev authored 3 years ago

After removing the SQL built-in functions from _func, the code used to
initialize these SQL built-in functions is no longer used and should be
removed.

Follow-up #6106

dd7fa342

alter: disallow creation of SQL built-in function · c49eab90
Mergen Imeev authored 3 years ago
```
This patch prohibits creation of user-defined functions with SQL_BUILTIN
engine.

Closes #6106
```
c49eab90

alter: parse data dictionary version · e925537d

Vladislav Shpilevoy authored 7 years ago

Version is needed to disallow creation of SQL built-in functions using
_func starting with 2.9.0.

Needed for #6106

e925537d

sql: remove SQL built-in functions from _func · 970062d7

Mergen Imeev authored 3 years ago

This patch removes SQL built-in functions from _func. These functions
could be called directly from Lua, however all they did was returned an
error. After this patch, no SQL built-in functions can be called
directly from LUA.

Part of #6106

970062d7

sql: introduce sql_func_find() · c8c56b14

Mergen Imeev authored 3 years ago

This patch introduces the sql_func_find() function. This function allows
us to centralize the look up of functions during parsing, which
simplifies code and fixes some incorrect error messages.

Part of #6106

c8c56b14

sql: introduce sql_func_flags() · 2abb5732

Mergen Imeev authored 3 years ago

This function returns a set of parameters for the function with the
given name. This function is used when we do not need to call a
function, but we need its parameters.

In addition, this function will allow us to split the parameters
between those that are the same for all implementations, and the
parameters, the value of which is implementation-dependent.

Needed for #6105
Part of #6106

2abb5732

Aug 09, 2021

test: remove unused cases · ed7da7e6

Leonid Vasiliev authored 3 years ago

After changing the way symbols are exported, handling several
cases in the "ssl-cert-paths-discover" test is no longer necessary.
Let's remove it.

Part of #5932

ed7da7e6

cmake: wrap the symbols used in the "ssl-cert-paths-discover" test · 64807be3

Leonid Vasiliev authored 3 years ago

Wrap the symbols used in the "ssl-cert-paths-discover" test to
avoid clashes.

Symbols from openssl have been wraped to:

    crypto_X509_get_default_cert_dir_env
    crypto_X509_get_default_cert_file_env

Tarantool symbols have been prefixed by "tnt_":

    tnt_ssl_cert_paths_discover
    tnt_default_cert_dir_paths
    tnt_default_cert_file_paths

Part of #5932

64807be3

cmake: wrap the exported readline function · ffdb9f24
Leonid Vasiliev authored 3 years ago
```
Wrap the exported readline function to avoid clash of symbols.

Part of #5932
```
ffdb9f24