Commits · 26f7056f9c129d93308683c29e32bd602302f59f · core / tarantool

Aug 17, 2022

replication: fix downstream lag growing when there's no new transactions · a167a070

Serge Petrenko authored 2 years ago

downstream lag is the difference in time between the moment a
transaction was written to master's WAL and the moment an ack for it
arrived.

Its calculation is supported by replicas sending the last applied row
timestamp. When there is no replication, the last applied row timestamp
stays the same, so in this case downstream lag grows as time passes.

Once an old master is replaced by a new one, it notices changes in peer
vclocks and tries to update downstream lag unconditionally. This makes
the lag appear to be growing indefinitely, showing the time since the
last transaction on the old master:

```
 downstream:
   status: follow
   idle: 0.018218606001028
   vclock: {1: 3, 2: 2}
   lag: 34.623061401367
```

The commit 56571d83 ("raft: make followers notice leader hang")
made relay exchange information with tx even when there are no new
transactions, so the issue became even easier to reproduce.

The issue itself was present since downstream lag introduction in commit
29025bce ("relay: provide information about downstream lag").

Closes #7581

NO_DOC=bugfix

a167a070

log: free resources while event loop is running · 0c3f9b37

Cyrill Gorcunov authored 2 years ago


The 'log' module uses fibers internally for logs rotation sake and
before we can free log's resources (on program exit) we need to wait
until rotation is complete, which implies that events loop is still
running. But we break the event loop in `on_shutdown_f` trigger and
calling any events based functionality later cause unexpected results
because fibers are no loner valid to use. Thus move `say_logger_free`
call into `on_shutdown_f` body where fibers are still alive.

N.B. Testing the issue is sensitive to timings, during local tests
found that minimal delay 1ms is enough to trigger, thus ERRINJ_LOG_ROTATE
get increased.

Fixes #4450

NO_DOC=bugfix

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

0c3f9b37

Aug 16, 2022

fiber: do not crash on concurrent fiber:join() · 8f4538cb

Ilya Verbin authored 2 years ago

If two or more fibers are yielding in fiber_join_timeout(), one of them
will eventually join and recycle the fiber, while the rest will crash
on accessing the recycled fiber's struct. Fix this by doing fiber_find()
again after each waiting attempt in lbox_fiber_join().

Closes #7489
Closes #7531

NO_DOC=bugfix

8f4538cb

fiber: introduce fiber_wait_on_deadline() · 73e1059d

Ilya Verbin authored 2 years ago

It is separated from fiber_join_timeout(), and will be used
in lbox_fiber_join() too.

Part of #7489
Part of #7531

NO_DOC=internal
NO_CHANGELOG=internal

73e1059d

Aug 15, 2022

test: extend the fiber.info() backtraces test · fc11ce06

Ilya Verbin authored 2 years ago

Test that an expected Lua function can be found in one of frames.
C function is already covered by this test.

Closes #7535

NO_DOC=test
NO_CHANGELOG=test

fc11ce06

cmake: normalize ENABLE_BACKTRACE option · 3a6021ea

Ilya Verbin authored 2 years ago

CMake accepts the following case-insensitive values as true: 1, ON, YES,
TRUE, Y, or a non-zero number (including floating point numbers). This
complicates the parsing of ENABLE_BACKTRACE in `tarantool.build.options`.
Fix this by defining it to TRUE for any true value.

Part of #7535

NO_DOC=internal
NO_CHANGELOG=internal

3a6021ea

console: remove ERRINJ_STDIN_ISATTY injection · 16d6e9d2

Gleb Kashkin authored 2 years ago

As the underlying problem behind this injection is fixed in #7357 it can
be removed and `-i` flag could be used as initially intended.

Closes #7554
Requires #7357
NO_DOC=refactoring
NO_CHANGELOG=refactoring

16d6e9d2

Aug 11, 2022

raft: add strict fencing · 64ae9a08

Boris Stepanenko authored 2 years ago

With current leader fencing implementation old leader doesn't resign
it's leadership before new leader may be elected. Because of this
several "leaders" might coexist in replicaset for some time.

This commit changes replication_disconnect_timeout that it is twice
as short for current raft leader (2*replication_timeout) if strict
fencing is enabled. Assuming that replication_timeout is the same for
every replica in replicaset this makes it less probable that new
leader can be elected before old one resigns it's leadership.

Old fencing behaviour can be enabled by setting fencing to soft mode.
This is useful when connection death timeouts shouldn't be affected
(e.g. different replication_timeouts are set to prioritize some
replicas as leader over the others).

Closes #7110

@TarantoolBot document
Title: Strict fencing

In `box.cfg` option `election_fencing_enabled` is deprecated in favor
of `election_fencing_mode`. `election_fencing_mode` can be set to one
of the following values:
'off' - fencing turned off (same as `election_fencing_enabled` set to
false before).
Connection death timeout is 4*replication_timeout for all nodes.

'soft' (default) - fencing turned on, but connection death timeout is
the same for leader and followers in replicaset. This is enough to
solve cluster being readonly and not being to elect a new leader in
some situations because of pre-vote.
Connection death timeout is 4*replication_timeout for all nodes.

'strict' - fencing turned on. In this mode leader tries its best to
resign leadership before new leader can be elected. This is achived
by halving death timeout on leader.
Connection death timeout is 4*replication_timeout for followers and
2*replication_timout for current leader.

64ae9a08

luatest_helpers: add replication proxy · b907e726

Boris Stepanenko authored 2 years ago

Before, we used to modify box.cfg.replication to reproduce network
problems in our test. This worked fine in most situations, but doesn't
work in others: when instance gets disconnected by modifying
box.cfg.replication, it closes its connection immediately (in terms of
realtime), and this is noticed almost immediately by its neighbours in
replica set (because they receive EOF). This made it impossible to test
some things, that rely on specific timeouts in our code (e.g. strict
fencing).

This commits adds helper, which acts as UNIX socket proxy, and can block
connection transparently for tarantool instances. It makes it possible
to write some tests, that were not possible before. It is also possible
to inject arbitrary packets between instance, which are interconnected
via proxy.

Usage:

                  +-------------------+
                  |tarantool server 1 |
                  +-------------------+
                            |
                            |
                            |
                   .-----------------.
                  (   /tmp/test-out   )
                   `-----------------'
                            |
                            |
                            |
                  +-------------------+
                  |       proxy       |
                  +-------------------+
                            |
                            |
                            |
                   .-----------------.
          +-------(   /tmp/test-in    )--------+
          |        `-----------------'         |
          |                                    |
          |                                    |
          |                                    |
+-------------------+                +-------------------+
|tarantool server 2 |                |tarantool server 3 |
+-------------------+                +-------------------+

tarantool server 1 init.lua:
box.cfg{listen = '/tmp/test-out'}
box.once("schema", function()
    box.schema.user.grant('guest', 'super')
end)

tarantool server 2 and tarantool server 3 init.lua:
box.cfg{replication = '/tmp/test-in'}

proxy init.lua:
-- Import proxy helper
Proxy = require('test.luatest_helpers.proxy.proxy')

-- Create proxy, which will (when started) listen on client_socket_path
-- and accept connection when client tries to connect. The accepted
-- socket connection is then passed to new Connection instance.
proxy = Proxy:new({
    -- Path to UNIX socket, where proxy will await new connections.
    client_socket_path = '/tmp/test-in',

    -- Path to UNIX socket where tarantool server is listening.
    server_socket_path = '/tmp/test-out',

    -- Table, describing how to process client socket. Optional.
    -- Defaults used and described:
    process_client = {
        -- function(connection) which, if not nil, will be called once
        -- before client socket processing loop.
        pre = nil,

        -- function(connection, data) which, if not nil, will be called
        -- in loop, when new data is received from client socket.
        -- Connection.forward_to_server(connection, data) will:
        -- 1) Connect server socket to server_socket_path, if server
        --    socket is not connected.
        -- 2) Write data to server socket, if connected and writable.
        func = Connection.forward_to_server,

        -- function(connection) which, if not nil, will be called once
        -- after client socket processing loop.
        -- Connection.close_client_socket(connection) will shutdown and
        -- close client socket, if it is connected.
        post = Connection.close_client_socket,
    },

    -- Table, describing how to process server socket. Optional.
    -- Defaults used and described:
    process_server = {
        -- function(connection) which, if not nil, will be called once
        -- before server socket processing loop.
        pre = nil,

        -- function(connection, data) which, if not nil, will be called
        -- in loop, when new data is received from server socket.
        -- Connection.forward_to_client(connection, data) will write data
        -- to client socket, if it is connected and writable
        func = Connection.forward_to_client,

        -- function(connection) which, if not nil, will be called once
        -- after server socket processing loop.
        -- Connection.close_server_socket(connection) will shutdown and
        -- close server socket, if it is connected.
        post = Connection.close_server_socket,
    }

})

-- Bind client socket (defined by proxy.client_socket_path) and start
-- accepting connections on it in a new fiber. If opts.force is set to
-- true, it will remove proxy.client_socket_path file before binding to
-- it. After proxy is started it will accept client connections and
-- create Connection instance for each connection.
proxy:start({force = false})

-- Stop accepting new connetions on client socket and join the fiber,
-- created by proxy:start(), and close client socket. Also stop all
-- active connections (see Connection:stop()).
proxy:stop()

-- Pause accepting new connections and pause all active connections (see
-- Connection:pause()).
proxy:pause()

-- Resume accepting new connections and resume all paused connections
-- (see Connection:resume())
proxy:resume()

-- Connection class:
Connection:new({
    {
        -- Socket which is already created (by Proxy class for example).
        -- Optional, may be nil.
        client_socket = '?table',

        -- Path to connect server socket to. Will try to connect on
        -- initialization, and in Connection.forward_to_server.
        -- Can connect manually by calling
        -- Connection:connect_server_socket().
        server_socket_path = 'string',

        -- See Proxy:new()
        process_client = '?table',

          -- See Proxy:new()
        process_server = '?table',
    },
})

-- Start processing client socket, using functions from
-- Connection:process_client.
Connection:start()

-- Connect server socket to Connection.server_socket_path (if not
-- connected already). Start processing server socket, if successfully
-- connected (using functions from Connection.process_server).
Connection:connect_server_socket()

-- Pause processing packets (both incoming from client socket and server
-- socket).
Connection:pause()

-- Resume processing packets (both incoming from client socket and
-- server socket).
Connection:resume()

-- Close server socket, if open.
Connection:close_server_socket()

-- Close client socket, if open.
Connection:close_client_socket()

-- Close client and server sockets, if open, and wait for processing
-- fibers to die.
Connection:stop()

NO_DOC=test helpers
NO_CHANGELOG=test helpers

b907e726

Aug 09, 2022

console: fix -i being overruled by !isatty() · 9965e3fe

Gleb Kashkin authored 2 years ago

The interactive mode has been ignored when stdin was not a tty and is no
more. Now results of another command can be handled by tarantool.
Before the patch:
```
$ echo 42 | tarantool -i
LuajitError: stdin:1: unexpected symbol near '42'
fatal error, exiting the event loop
```

After the patch:
```
$ echo 42 | tarantool -i
Tarantool 2.5.0-130-ge3cf64a6c
type 'help' for interactive help
tarantool> 42
---
- 42
...

```

Closes #5064

NO_DOC=bugfix

9965e3fe

test: fix reading STDIN command on openSUSE · ea07854e

Gleb Kashkin authored 2 years ago

Inspired by gh-5064, that breaks the previous version of the test on
openSUSE. When using `io.popen:write()` on tarantool with `-i` flag, it
failed to run the command on openSUSE. This happened because before
gh-5064 patch it used to employ `luaL_loadfile()` that interprets EOF
as the end of the command, while when it is loaded as a string openSUSE
expects it to end with '\n'.

Needed for #5064
NO_DOC=test fix
NO_TEST=test fix
NO_CHANGELOG=test fix

ea07854e

Aug 08, 2022

util: introduce strlcat utility function · dcd9be4a

Ilya Verbin authored 2 years ago

strlcat is a function from BSD, which is designed to be safer, more
consistent, and less error prone replacement for strcat and strncat.

NO_DOC=internal
NO_CHANGELOG=internal

Part of #7534

dcd9be4a

Aug 05, 2022

memtx: fix dirty data written to snapshot for hash index · 64d87e88

Vladimir Davydov authored 2 years ago

The hash index doesn't create a snapshot clarifier, which is used for
filtering out uncommitted tuples from a snapshot. Fix this. Also fix
a bug in hash_snapshot_iterator_next, where we passed a wrong argument
to tuple_data_range. It hasn't fired, because the clarifier didn't work.

Fixes commit ee8ed065 ("txm: clarify all fetched tuples").
Fixes commit f167c1af ("memtx: decompress tuples in snapshot
iterator").

Closes #7539

NO_DOC=bug fix

64d87e88

memtx: fix handling of corner cases gap tracking in transaction manager · 7360281e

Georgiy Lebedev authored 2 years ago


Gap tracking does not handle gap writes when the key has the same value as
the gap item: review the whole gap write handling logic, refactor it and
fix handling of corner cases along the way.

Co-authored-by: Alexander Lyapunov <alyapunov@tarantool.org>

Closes #7375

NO_DOC=bugfix

7360281e

memtx: fix tree iterator `next` result clarification · 542f9525

Georgiy Lebedev authored 2 years ago

The problem is described in #7073. It was fixed only for
`tree_iterator_start_raw` next method, but other methods used for reverse
iterators are also subject to this bug: move tuple clarification from the
wrapper of iterator `next` methods to individual iterator methods.

Closes #7432

NO_DOC=bugfix

542f9525

lua/decimal: add Lua value accessors to module API · c75fbce1

Alexander Turenko authored 2 years ago

The Rust module (see the issue) needs a getter and a setter for decimal
values on the Lua stack. Let's make them part of the module API.

Part of #7228

@TarantoolBot document
Title: Lua/C functions for decimals in the module API

The following functions are added into the module API:

```c
/**
 * Allocate a new decimal on the Lua stack and return
 * a pointer to it.
 */
API_EXPORT box_decimal_t *
luaT_newdecimal(struct lua_State *L);

/**
 * Allocate a new decimal on the Lua stack with copy of given
 * decimal and return a pointer to it.
 */
API_EXPORT box_decimal_t *
luaT_pushdecimal(struct lua_State *L, const box_decimal_t *dec);

/**
 * Check whether a value on the Lua stack is a decimal.
 *
 * Returns a pointer to the decimal on a successful check,
 * NULL otherwise.
 */
API_EXPORT box_decimal_t *
luaT_isdecimal(struct lua_State *L, int index);
```

c75fbce1

raft: make followers notice leader hang · 56571d83

Serge Petrenko authored 2 years ago

It's possible to hang an instance by some non-yielding request. The
simplest example is `while true do end`. A more true to life one would
be a `select{}` from a large space, or `pairs` iteration over a space
without yields.

Any such request makes the instance unresponsive - it can serve neither
reads nor writes. At the same time, the instance appears alive to other
cluster members: relay thread used to communicate with others is not
hung and continues to send heartbeats every replication_timeout.

The problem is the most severe with Raft leader elections: followers
believe the leader is fine and do not start elections despite leader
being unable to serve reads or writes.

Closes #7512

NO_DOC=bugfix

56571d83

Aug 04, 2022

salad: rework bps tree read view API · 91caa388

Vladimir Davydov authored 2 years ago

Currently, there's no notion of a BPS tree read view per se - one can
create an iterator over a regular tree and then "freeze" it. This works
just fine for snapshotting and joining replicas, but this spartan API
doesn't let us implement user read views, because to do that we need to
do lookups and create iterators over a frozen tree as many times as we
want, not just once.

So this patch introduces a concept of bps_tree_view, which contains a
frozen image of a bps_tree and implements a subset of non-modifying
bps_tree methods:

 - bps_tree_view_size
 - bps_tree_view_find
 - bps_tree_view_first
 - bps_tree_view_last
 - bps_tree_view_lower_bound
 - bps_tree_view_lower_bound_elem
 - bps_tree_view_upper_bound
 - bps_tree_view_upper_bound_elem
 - bps_tree_view_iterator_get_elem
 - bps_tree_view_iterator_prev
 - bps_tree_view_iterator_next
 - bps_tree_view_iterator_is_equal

Note, bps_tree and bps_tree_view share bps_tree_iterator, because
iterator methods (get_elem, next, prev, is_equal) take bps_tree or
bps_tree_view. The bps_tree_iterator now contains only block index and
offset.

We could also implement the rest of non-modifying methods, but didn't do
that, because they are not needed to implement user read views:

 - bps_tree_random
 - bps_tree_approximate_count
 - bps_tree_debug_check
 - bps_tree_print

To create a bps_tree_view from a bps_tree, one is supposed to call
bps_tree_view_create. If a bps_tree_view is no longer needed, it should
be destroyed with bps_tree_view_destroy.

Old methods used for creating frozen iterators were dropped:

 - bps_tree_iterator_freeze
 - bps_tree_iterator_destroy

To avoid code duplication, we factored out the common part of bps_tree
and bps_tree_view into a new structure, named bps_tree_common.
Basically, the new structure contains all bps_tree members except
matras, which is stored in bps_tree. The difference between
bps_tree_view and bps_tree is that the latter stores matras_view
instead of matras. The common part contains pointers to matras and
matras_view, which are used by internal implementation to look up
bps_tree blocks.

All internal methods now take bps_tree_common instead of bps_tree.
For all public methods that are implemented both for bps_tree and
bps_tree_view, we have the common implementation defined in _impl
suffixed private function, which is called by the corresponding public
functions.

To ensure that a modifying method isn't called on bps_tree_common object
corresponding to a bps_tree_view because of a bug in the bps_tree
implementation, we added !matras_is_read_view_created assertion to
bps_tree_touch_block.

Closes #7191

NO_DOC=refactoring
NO_CHANGELOG=refactoring

91caa388

Use bps_tree_size instead of accessing size directly · 5855fd30

Vladimir Davydov authored 2 years ago

We have a method for getting the number of elements stored in a BPS
tree. Let's use it instead of accessing BPS tree internals directly
so that we can freely refactor BPS tree internals.

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

5855fd30

salad: rename a few bps_tree methods · c84990ad

Vladimir Davydov authored 2 years ago

 - Rename bps_tree_iterator_are_equal to bps_tree_iterator_is_equal for
   consistency with other methods that check two objects for equality
   (for example, tt_uuid_is_equal).

 - Rename bps_tree_iterator_first and bps_tree_iterator_last to
   bps_tree_first and bps_tree_last, because these are methods of
   bps_tree, not bps_tree_iterator. Omitting _iterator is also
   consistent with bps_tree_lower_bound and bps_tree_upper_bound
   methods, which also create bps_tree_iterator objects.

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

c84990ad

memtx: fix reverse iterators gap tracking · fda38e66

Georgiy Lebedev authored 2 years ago

In case of reverse iterators, due to index limitations, we need to clarify
the successor tuple early: this implies that the successor's story is not
always at the top of the history chain, whilst we need to add the gap item
to the story currently present in index — fix this by reusing the
iterators' check logic to set the current iterator's tuple (which is
considered the successor) to a tuple in index.

CLoses #7409

NO_DOC=bugfix

fda38e66

memtx: fix HASH index 'GT' iterator `next` method set incorrectly · 5c0d7117

Georgiy Lebedev authored 2 years ago

The `next` method of memtx HASH index 'GT' iterator is initially set to
'GT' and is supposed to be set to 'GE' after first iteration: it is
mistakenly set to the 'base' method instead of the full method which also
does tuple clarification — this allows dirty reads. Move the `next` method
change on first iteration to `WRAP_ITERATOR_METHOD` for clarity and
correctness.

Closes #7477

NO_DOC=bugfix

5c0d7117

salad: rework light hash read view API · b595f212

Vladimir Davydov authored 2 years ago

Currently, there's no notion of a LIGHT hash table read view per se -
one can create an iterator over a regular hash table and then "freeze"
it. This works just fine for snapshotting and joining replicas, but this
spartan API doesn't let us implement user read views, because to do that
we need to do lookups and create iterators over a frozen hash table as
many times as we want, not just once.

So this patch introduces a concept of LIGHT(view), which contains a
frozen image of a LIGHT(core) and implements a subset of non-modifying
LIGHT(core) methods:

 - LIGHT(view_count)
 - LIGHT(view_find)
 - LIGHT(view_find_key)
 - LIGHT(view_get)
 - LIGHT(view_iterator_begin)
 - LIGHT(view_iterator_key)
 - LIGHT(view_iterator_get_and_next)

Note, LIGHT(core) and LIGHT(view) share LIGHT(iterator), because
iterator methods (begin, key, get_and_next) take LIGHT(core) or
LIGHT(view). The LIGHT(iterator) now contains only a hash table slot.

We could also implement the rest of non-modifying methods, but didn't do
that, because they are not needed to implement user read views:

 - LIGHT(random)
 - LIGHT(selfcheck)

To create a LIGHT(view) from a LIGHT(core), one is supposed to call
LIGHT(view_create). If a LIGHT(view) is no longer needed, it should be
destroyed with LIGHT(view_destroy).

Old methods used for creating frozen iterators were dropped:

 - LIGHT(iterator_freeze)
 - LIGHT(iterator_destroy)

To avoid code duplication, we factored out the common part of
LIGHT(core) and LIGHT(view) into a new structure, named LIGHT(common).
Basically, the new structure contains all LIGHT(core) members except
matras, which is stored in LIGHT(core). The difference between
LIGHT(view) and LIGHT(core) is that the latter stores matras_view
instead of matras. The common part contains pointers to matras and
matras_view, which are used by internal implementation to look up
LIGHT(record).

All internal methods now take LIGHT(common) instead of LIGHT(core).
For all public methods that are implemented both for LIGHT(core) and
LIGHT(view), we have the common implementation defined in _impl suffixed
private function, which is called by the corresponding public functions.

To ensure that a modifying method isn't called on LIGHT(common) object
corresponding to a LIGHT(view) because of a bug in the LIGHT code, we
added !matras_is_read_view_created assertion to LIGHT(touch_record),
LIGHT(prepare_first_insert), and LIGHT(grow).

Closes #7192

NO_DOC=refactoring
NO_CHANGELOG=refactoring

b595f212

salad: add LIGHT(count) method · 77b552fe

Vladimir Davydov authored 2 years ago

This commit adds a function that retrieves the number of records stored
in a light hash table and makes light users use it instead of accessing
the light count directly. This gives us more freedom of refactoring the
light internals without modifying the code using it.

Needed for #7192

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

77b552fe

prbuf: fix prbuf_open for empty buffer · 943ce3ca

Vladimir Davydov authored 2 years ago

prbuf_check, which is called by prbuf_open, proceeds to scanning the
buffer even if it's empty. On debug build, this results in prbuf_open
reporting that the buffer is corrupted, because we trash the buffer in
prbuf_create. On a release build, this may lead to a hang, in case the
buffer is zeroed out. Let's fix this by returning success from
prbuf_check if the buffer is empty. Note, prbuf_iterator_next doesn't
call prbuf_first_record if the buffer is empty, either.

Needed for https://github.com/tarantool/tarantool-ee/issues/187

NO_DOC=bug fix
NO_CHANGELOG=will be added to EE

943ce3ca

decimal: add the library into the module API · 5c1bc3da

Alexander Turenko authored 2 years ago

The main decision made in this patch is how large the public
`box_decimal_t` type should be. Let's look on some calculations.

We're interested in the following values.

* How much decimal digits is stored?
* Size of an internal decimal type (`sizeof(decimal_t)`).
* Size of a buffer to store a string representation of any valid
  `decimat_t` value.
* Largest signed integer type fully represented in decimal_t (number of
  bits).
* Largest unsigned integer type fully represented in decimal_t (number
  of bits).

Now `decimal_t` is defined to store 38 decimal digits. It means the
following values:

| digits | sizeof | string | int???_t | uint???_t |
| ------ | ------ | ------ | -------- | --------- |
| 38     | 36     | 52     | 126      | 127       |

In fact, decNumber (the library we currently use under the hood) allows
to vary the 'decimal digits per unit' parameter, which is 3 by default,
so we can choose density of the representation. For example, for given
38 digits the sizeof is 36 by default, but it may vary from 28 to 47
bytes:

| digits | sizeof     | string | int???_t | uint???_t |
| ------ | ---------- | ------ | -------- | --------- |
| 38     | 36 (28-47) | 52     | 126      | 127       |

If we'll want to store `int128_t` and `uint128_t` ranges, we'll need 39
digits:

| digits | sizeof     | string | int???_t | uint???_t |
| ------ | ---------- | ------ | -------- | --------- |
| 39     | 36 (29-48) | 53     | 130      | 129       |

If we'll want to store `int256_t` and `uint256_t` ranges:

| digits | sizeof     | string | int???_t | uint???_t |
| ------ | ---------- | ------ | -------- | --------- |
| 78     | 62 (48-87) | 92     | 260      | 259       |

If we'll want to store `int512_t` and `uint512_t` ranges:

| digits | sizeof       | string | int???_t | uint???_t |
| ------ | ------------ | ------ | -------- | --------- |
| 155    | 114 (84-164) | 169    | 515      | 514       |

The decision here is what we consdider as possible and what as unlikely.
The patch freeze the maximum amount of bytes in `decimal_t` as 64. So
we'll able to store 256 bit integers and will NOT able to store 512 bit
integers in a future (without the ABI breakage at least).

The script, which helps to calculate those tables, is at end of the
commit message.

Next, how else `box_decimal_*()` library is different from the internal
`decimal_*()`?

* Added a structure that may hold any decimal value from any current or
  future tarantool version.
* Added `box_decimal_copy()`.
* Left `strtodec()` out of scope -- we can add it later.
* Left `decimal_str()` out of scope -- it looks dangerous without at
  least a good explanation when data in the static buffer are
  invalidated. There is `box_decimal_to_string()` that writes to an
  explicitly provided buffer.
* Added `box_decimal_mp_*()` for encoding to/decoding from msgpack.
  Unlike `mp_decimal.h` functions, here we always have `box_decimal_t`
  as the first parameter.
* Left `decimal_pack()` out of scope, because a user unlikely wants to
  serialize a decimal value piece-by-piece.
* Exposed `decimal_unpack()` as `box_decimal_mp_decode_data()` to keep a
  consistent terminogoly around msgpack encoding/decoding.
* More detailed API description, grouping by functionality.

The script, which helps to calculate sizes around `decimal_t`:

```lua
-- See notes in decNumber.h.

-- DECOPUN: DECimal Digits Per UNit
local function unit_size(DECOPUN)
    assert(DECOPUN > 0 and DECOPUN < 10)
    if DECOPUN <= 2 then
        return 1
    elseif DECOPUN <= 4 then
        return 2
    end
    return 4
end

function sizeof_decimal_t(digits, DECOPUN)
    -- int32_t digits;
    -- int32_t exponent;
    -- uint8_t bits;
    -- <..padding..>
    -- <..units..>
    local us = unit_size(DECOPUN)
    local padding = us - 1
    local unit_count = math.ceil(digits / DECOPUN)
    return 4 + 4 + 1 + padding + us * unit_count
end

function string_buffer(digits)
    -- -9.{9...}E+999999999# (# is '\0')
    -- ^ ^      ^^^^^^^^^^^^
    return digits + 14
end

function binary_signed(digits)
    local x = 1
    while math.log10(2 ^ (x - 1)) < digits do
        x = x + 1
    end
    return x - 1
end

function binary_unsigned(digits)
    local x = 1
    while math.log10(2 ^ x) < digits do
        x = x + 1
    end
    return x - 1
end

function digits_for_binary_signed(x)
    return math.ceil(math.log10(2 ^ (x - 1)))
end

function digits_for_binary_unsigned(x)
    return math.ceil(math.log10(2 ^ x))
end

function summary(digits)
    print('digits', digits)
    local sizeof_min = math.huge
    local sizeof_max = 0
    local DECOPUN_sizeof_min
    local DECOPUN_sizeof_max
    for DECOPUN = 1, 9 do
        local sizeof = sizeof_decimal_t(digits, DECOPUN)
        print('sizeof', sizeof, 'DECOPUN', DECOPUN)
        if sizeof < sizeof_min then
            sizeof_min = sizeof
            DECOPUN_sizeof_min = DECOPUN
        end
        if sizeof > sizeof_max then
            sizeof_max = sizeof
            DECOPUN_sizeof_max = DECOPUN
        end
    end
    print('sizeof min', sizeof_min, 'DECOPUN', DECOPUN_sizeof_min)
    print('sizeof max', sizeof_max, 'DECOPUN', DECOPUN_sizeof_max)
    print('string', string_buffer(digits))
    print('int???_t', binary_signed(digits))
    print('uint???_t', binary_unsigned(digits))
end
```

Part of #7228

@TarantoolBot document
Title: Module API for decimals

See the declarations in `src/box/decimal.h` in tarantool sources.

5c1bc3da

Aug 02, 2022

sql: always treat NaN as NULL · 7c5651af

Mergen Imeev authored 2 years ago

In most cases, NaN was treated as NULL. But in case NaN was returned as
a result of a Lua or C user defined function, it was considered a
double. After this patch, NaN will also be considered NULL in the
specified cases.

Closes #6374
Closes #6572

NO_DOC=bugfix

7c5651af

sql: fix wrong flag is_res_neg in sql_rem_int() · 1f4bb194

Mergen Imeev authored 2 years ago

This patch makes the is_res_neg flag false in the sql_rem_int() function
if the left value is negative and the result is 0. Prior to this patch,
the value of the flag was true, which resulted in an assertion during
encoding 0 as MP_INT.

Closes #6575

NO_DOC=bugfix

1f4bb194

sql: do nothing in ROUND() if precision is too big · 4c216c4c

Mergen Imeev authored 2 years ago

The smallest positive double value is 2.225E-307, and the value before
the exponent has a maximum of 15 digits after the decimal point. This
means that double values cannot have more than 307 + 15 digits after
the decimal point.

After this patch, ROUND() will return its first argument unchanged if
the first argument is DOUBLE and the second argument is INTEGER greater
than 322.

Closes #6650

NO_DOC=bugfix

4c216c4c

Aug 01, 2022

fiber: allow to reset fiber slice with SIGURG · 1a3b710d

Andrey Saranchin authored 3 years ago

The patch introduces opportunity for user to reset
slice of current fiber execution. It allows to limit
iteration in space with SIGURG.

NO_CHANGELOG=see later commits
NO_DOC=see later commits

1a3b710d

box: allow to limit space iteration with timeout · bc053c55

Andrey Saranchin authored 3 years ago

Currently, there is no way to interrupt a long execution of a
request (such as s:select(nil)). This patch introduces this
opportunity.

Box will use fiber deadline timeout as a timeout for DML usage.
Thus, when deadline of current fiber is up, all DML requests will
end with a particular error.

Closes #6085

NO_CHANGELOG=see later commits
NO_DOC=see later commits

bc053c55

test: adapt tests to iteration limit · b33ea6ea

Andrey Saranchin authored 2 years ago

Part of #6085

NO_TEST=no behavior changes
NO_CHANGELOG=no behavior changes
NO_DOC=no behavior changes

b33ea6ea

fiber: introduce fiber slice · e9bd2250

Andrey Saranchin authored 3 years ago

This patch introduces execution time slice for fiber. Later, we will use
this mechanism to limit iteration in space.

Part of #6085

NO_CHANGELOG=see later commits
NO_DOC=see later commits

e9bd2250

fiber_channel: add accessor to internal functions · 395c30e8

Alexander Turenko authored 2 years ago

The Rust module [1] leans on several internal symbols. They were open in
Tarantool 2.8 (see #2971 and #5932), but never were in the public API.
Tarantool 2.10.0 hides the symbols and we need a way to get them back to
use in the module.

We have the following options:

1. Design and expose a module API for fiber channels.
2. Export the symbols with a prefix like `tnt_internal_` (to don't spoil
   the global namespace).
3. Provide a `dlsym()` alike function to get an address of an internal
   symbol for users who knows what they're doing.

I think that the third way offers the best compromise between amount of
effort, quality of the result and opportunities to extend. In this
commit I hardcoded the list of functions to make the change as safe as
possible. Later I'll return here to autogenerate the list.

Exported the following function from the tarantool executable:

```c
void *
tnt_internal_symbol(const char *name);
```

I don't add it into the module API headers, because the function is to
perform a dark magic and we don't suggest it for users.

While I'm here, added `static` to a couple of fiber channel functions,
which are only used within the compilation unit.

[1]: https://github.com/picodata/tarantool-module

Part of #7228
Related to #6372

NO_DOC=don't advertize the dangerous API
NO_CHANGELOG=don't advertize the dangerous API

395c30e8

Jul 27, 2022

box: fix thread_id check in box.stat.net.thread[] · 969b76ac
Ilya Verbin authored 2 years ago
```
The valid range for thread_id is [0, iproto_threads_count - 1].

Closes #7196

NO_DOC=bugfix
```
969b76ac

box: check for foreign keys on space:truncate() · 7ec71b4f

Ilya Verbin authored 2 years ago

Add a missed check to on_replace_dd_truncate, similar to
on_replace_dd_space and on_replace_dd_index.

Closes #7309

NO_DOC=bugfix

7ec71b4f

core: introduce helper tt_sigaction · 9839812c

Andrey Saranchin authored 2 years ago

The problem is that even if we block all signals on all
threads except the main thread, the signals still can be
delivered to other threads (#7206). And another problem
is that user can spawn his own thread and not block
signals.

That is why the patch introduces tt_sigaction function that
guarantees that all signals will be handled only by the main
thread. We use this helper in clock_lowres module.
This is supposed to solve the problem, described in #7408.

NO_CHANGELOG=internal
NO_DOC=internal

9839812c

Jul 26, 2022

tuple: add JSON path field accessor to module API · bcca0b2b

Alexander Turenko authored 2 years ago

Added a function (see the API in the documentation request below), which
reflects the `tuple[json_path]` Lua API (see #1285).

Part of #7228

@TarantoolBot document
Title: tuple: access a field using JSON path via module API

The following function is added into the module API:

```c
/**
 * Return a raw tuple field in the MsgPack format pointed by
 * a JSON path.
 *
 * The JSON path includes the outmost field. For example, "c" in
 * ["a", ["b", "c"], "d"] can be accessed using "[2][2]" path (if
 * index_base is 1, as in Lua). If index_base is set to 0, the
 * same field will be pointed by the "[1][1]" path.
 *
 * The first JSON path token may be a field name if the tuple
 * has associated format with named fields. A field of a nested
 * map can be accessed in the same way: "foo.bar" or ".foo.bar".
 *
 * The return value is valid until the tuple is destroyed, see
 * box_tuple_ref().
 *
 * Return NULL if the field does not exist or if the JSON path is
 * malformed or invalid. Multikey JSON path token [*] is treated
 * as invalid in this context.
 *
 * \param tuple a tuple
 * \param path a JSON path
 * \param path_len a length of @a path
 * \param index_base 0 if array element indexes in @a path are
 *        zero-based (like in C) or 1 if they're one-based (like
 *        in Lua)
 * \retval a pointer to a field data if the field exists or NULL
 */
API_EXPORT const char *
box_tuple_field_by_path(box_tuple_t *tuple, const char *path,
			uint32_t path_len, int index_base);
```

bcca0b2b

Jul 25, 2022

box: return 1-based fkey field numbers to Lua · 014f5aa1

Ilya Verbin authored 2 years ago

In Lua field's numbers are counted from base 1, however currently
space:format() and space.foreign_key return zero-based foreign key
fields, which leads to an error on space:format(space:format()).

Closes #7350

NO_DOC=bugfix

014f5aa1

box: do not modify format arg by normalize_format · a8b6fd0c

Ilya Verbin authored 2 years ago

Currently a foreign_key field in the `format` argument, passed to
normalize_format, can be changed inside normalize_foreign_key_one.
Fix this by using a local copy of def.field.

NO_DOC=bugfix
NO_CHANGELOG=minor bug

a8b6fd0c