Commits · d2271ec0e2a9a9e2ddaf8c28f0bcbb9842353cdb · core / tarantool

Aug 23, 2022

console: fix multiline commands saved as oneline · d2271ec0

Gleb Kashkin authored 2 years ago

When multiline commands were loaded from .tarantool_history, they were
treated as a bunch of oneline commands. Now readline is configured to
write timestamps in .tarantool_history as delimiters and multiline
commands are handled correctly.

If there is already a .tarantool_history file, readline will set
timestamps automatically, nothing will be lost.

Closes #7320
NO_DOC=bugfix
NO_TEST=impossible to check readline history from lua

d2271ec0

Aug 18, 2022

Introduce internal database read view API · 26f7056f

Vladimir Davydov authored 2 years ago

Currently, we create a database read view only to create a memtx
snapshot or join a replica, but there's already quite a bit of code
duplication between these two scenarios. In the future, we will need
the same functionality to create a user read view. So let's factor out
this code into a separate module - read_view.

The API of the read_view module is quite simple - there are just two
methods: open and close a read view. The user can pass a space and index
filter while opening a read view to skip certain spaces. E.g. we skip
all temporary spaces and secondary indexes when we create a memtx
snapshot. A read_view object has a list of space_read_view objects, one
per each space included into the read view. A space_read_view object, in
turn, has a map of all index_read_view objects (introduced earlier)
corresponding to space indexes. There's nothing like a space cache - the
user can create one if required.

An engine that supports creation of a read view (currently, only memtx)
is supposed to set the ENGINE_SUPPORTS_READ_VIEW flag and implement the
create_read_view engine method in addition to the create_read_view index
method. The engine method should do some engine-wide read view related
preparations. For example, in case of memtx, it suspends tuple garbage
collection.

Closes #7363

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

26f7056f

Aug 17, 2022

replication: fix downstream lag growing when there's no new transactions · a167a070

Serge Petrenko authored 2 years ago

downstream lag is the difference in time between the moment a
transaction was written to master's WAL and the moment an ack for it
arrived.

Its calculation is supported by replicas sending the last applied row
timestamp. When there is no replication, the last applied row timestamp
stays the same, so in this case downstream lag grows as time passes.

Once an old master is replaced by a new one, it notices changes in peer
vclocks and tries to update downstream lag unconditionally. This makes
the lag appear to be growing indefinitely, showing the time since the
last transaction on the old master:

```
 downstream:
   status: follow
   idle: 0.018218606001028
   vclock: {1: 3, 2: 2}
   lag: 34.623061401367
```

The commit 56571d83 ("raft: make followers notice leader hang")
made relay exchange information with tx even when there are no new
transactions, so the issue became even easier to reproduce.

The issue itself was present since downstream lag introduction in commit
29025bce ("relay: provide information about downstream lag").

Closes #7581

NO_DOC=bugfix

a167a070

log: free resources while event loop is running · 0c3f9b37

Cyrill Gorcunov authored 2 years ago


The 'log' module uses fibers internally for logs rotation sake and
before we can free log's resources (on program exit) we need to wait
until rotation is complete, which implies that events loop is still
running. But we break the event loop in `on_shutdown_f` trigger and
calling any events based functionality later cause unexpected results
because fibers are no loner valid to use. Thus move `say_logger_free`
call into `on_shutdown_f` body where fibers are still alive.

N.B. Testing the issue is sensitive to timings, during local tests
found that minimal delay 1ms is enough to trigger, thus ERRINJ_LOG_ROTATE
get increased.

Fixes #4450

NO_DOC=bugfix

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

0c3f9b37

box: declare local functions as static · 3e83a505

Cyrill Gorcunov authored 2 years ago


Some functions is src/main.cc are declared as global
while they used in file scope only. Declare them as
appropriate.

NO_DOC=cleanup
NO_CHANGELOG=cleanup
NO_TEST=cleanup

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

3e83a505

Aug 16, 2022

fiber: do not crash on concurrent fiber:join() · 8f4538cb

Ilya Verbin authored 2 years ago

If two or more fibers are yielding in fiber_join_timeout(), one of them
will eventually join and recycle the fiber, while the rest will crash
on accessing the recycled fiber's struct. Fix this by doing fiber_find()
again after each waiting attempt in lbox_fiber_join().

Closes #7489
Closes #7531

NO_DOC=bugfix

8f4538cb

fiber: introduce fiber_wait_on_deadline() · 73e1059d

Ilya Verbin authored 2 years ago

It is separated from fiber_join_timeout(), and will be used
in lbox_fiber_join() too.

Part of #7489
Part of #7531

NO_DOC=internal
NO_CHANGELOG=internal

73e1059d

Aug 15, 2022

console: remove ERRINJ_STDIN_ISATTY injection · 16d6e9d2

Gleb Kashkin authored 2 years ago

As the underlying problem behind this injection is fixed in #7357 it can
be removed and `-i` flag could be used as initially intended.

Closes #7554
Requires #7357
NO_DOC=refactoring
NO_CHANGELOG=refactoring

16d6e9d2

cmake: add ENABLE_READ_VIEW option · 6947ed76

Vladimir Davydov authored 2 years ago

We will add all source files related to user read views under this
option.

Needed for https://github.com/tarantool/tarantool-ee/issues/191

NO_DOC=internal
NO_TEST=internal
NO_CHANGELOG=internal

6947ed76

key_def: make tuple_compare_field and hint_cmp public · 1a6c1bb4

Vladimir Davydov authored 2 years ago

We need these functions to implement format-less tuple comparison.

Needed for https://github.com/tarantool/tarantool-ee/issues/191

NO_DOC=internal
NO_TEST=internal
NO_CHANGELOG=internal

1a6c1bb4

Aug 11, 2022

index: refactor snapshot iterator API · 7572b5c6

Vladimir Davydov authored 2 years ago

To make a memtx snapshot, we use the create_snapshot_iterator index
method. The method creates a 'frozen' iterator over an index - changes
done to the index after the iterator was created don't affect the
iterator output. Also, the iterator is safe to use from any thread.
This API works just fine for snapshots, but it's too limited to allow
creation of user read views so we need to rework it.

To make the existing snapshot infrastructure suitable for user read
views, this commit replaces the create_snapshot_iterator method with
create_read_view. The new method returns an index_read_view object,
which has the API similar to the read-only API of an index. A read view
object may only be created and destroyed in the tx thread, but it may be
used in any thread.

Currently, index_read_view has the only method - create_iterator, which
takes iterator type and key and returns an index_read_view_iterator
object. The iterator type and key arguments are ignored and we always
assume the iterator type to be ITER_ALL (asserted), but later on we will
fix this and also add a method to look up a tuple by key.

Closes #7194

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

7572b5c6

sequence: get rid of sequence_data_iterator::tuple · f1e7a91d

Vladimir Davydov authored 2 years ago

Since commit f167c1af ("memtx: decompress tuples in snapshot
iterator") a snapshot iterator may allocate the result tuple on the
fiber region - the caller is supposed to clean the region after usage.
So we don't need to store the tuple in sequence_data_iterator anymore -
we can allocate it on the fiber region instead, which is simpler and
more straightforward.

NO_DOC=internal
NO_TEST=internal
NO_CHANGELOG=internal

f1e7a91d

vinyl: drop vinyl_index_create_snapshot_iterator · 4e277eb5

Vladimir Davydov authored 2 years ago

The create_snapshot_iterator index callback is used by the memtx engine
to create a consistent read view of data stored in memtx so that it can
be written to a snapshot or sent to a remote replica. We also define and
use this callback internally in vinyl to implement initial join.

Actually, there's no need to have this code wrapped in a callback in
vinyl, because it's never called from outside the vinyl internals. Let's
inline it and drop the callback for vinyl. This will simplify further
refactoring of the internal index read view API.

Needed for #7194

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

4e277eb5

index: rename get_raw to get_internal · a7722dad

Vladimir Davydov authored 2 years ago

'get_raw' is a misleading name, because usually we append the '_raw'
suffix to functions that work with raw MsgPack while 'get_raw' actually
returns a formatted tuple. The function is used internally in memtx to
implement tuple compression. Let's call it 'get_internal' to emphasize
that.

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

a7722dad

raft: add strict fencing · 64ae9a08

Boris Stepanenko authored 2 years ago

With current leader fencing implementation old leader doesn't resign
it's leadership before new leader may be elected. Because of this
several "leaders" might coexist in replicaset for some time.

This commit changes replication_disconnect_timeout that it is twice
as short for current raft leader (2*replication_timeout) if strict
fencing is enabled. Assuming that replication_timeout is the same for
every replica in replicaset this makes it less probable that new
leader can be elected before old one resigns it's leadership.

Old fencing behaviour can be enabled by setting fencing to soft mode.
This is useful when connection death timeouts shouldn't be affected
(e.g. different replication_timeouts are set to prioritize some
replicas as leader over the others).

Closes #7110

@TarantoolBot document
Title: Strict fencing

In `box.cfg` option `election_fencing_enabled` is deprecated in favor
of `election_fencing_mode`. `election_fencing_mode` can be set to one
of the following values:
'off' - fencing turned off (same as `election_fencing_enabled` set to
false before).
Connection death timeout is 4*replication_timeout for all nodes.

'soft' (default) - fencing turned on, but connection death timeout is
the same for leader and followers in replicaset. This is enough to
solve cluster being readonly and not being to elect a new leader in
some situations because of pre-vote.
Connection death timeout is 4*replication_timeout for all nodes.

'strict' - fencing turned on. In this mode leader tries its best to
resign leadership before new leader can be elected. This is achived
by halving death timeout on leader.
Connection death timeout is 4*replication_timeout for followers and
2*replication_timout for current leader.

64ae9a08

raft: return NULL from box_raft when raft isn't initialized · 0da98773

Boris Stepanenko authored 2 years ago

Currently box_raft asserts that raft is initialized when it is called.
For strict fencing box_raft will be called in
replication_disconnect_timeout to set different timeouts for leader and
follower. Sometimes replication_disconnect_timeout is called before raft
is initialized.

This commit changes box_raft behaviour, removing the assertion and
returning NULL instead of pointer to global raft state, if raft isn't
initialized. This makes it possible to call box_raft even before raft
has been initialized, checking that return value isn't NULL.

Assuming that this assertion didn't trigger anywhere
else, there is no need to check for box_raft returning NULL anywhere
except new calls. Even if in future this will change it will trigger
segmentation fault and the problem could be easily localized.

Part of #7110

NO_DOC=internal changes
NO_TEST=internal changes
NO_CHANGELOG=internal changes

0da98773

Aug 09, 2022

console: fix -i being overruled by !isatty() · 9965e3fe

Gleb Kashkin authored 2 years ago

The interactive mode has been ignored when stdin was not a tty and is no
more. Now results of another command can be handled by tarantool.
Before the patch:
```
$ echo 42 | tarantool -i
LuajitError: stdin:1: unexpected symbol near '42'
fatal error, exiting the event loop
```

After the patch:
```
$ echo 42 | tarantool -i
Tarantool 2.5.0-130-ge3cf64a6c
type 'help' for interactive help
tarantool> 42
---
- 42
...

```

Closes #5064

NO_DOC=bugfix

9965e3fe

feedback: hide from box.cfg if disabled at build · 51a9cef3

Alexander Turenko authored 2 years ago

It is counter-intuitive to see options of a component that is disabled
at build time. Especially, when the returned value means that the
component is enabled (while it is not so).

Before this patch (on `-DENABLE_FEEDBACK_DAEMON=OFF` build):

```yaml
tarantool> box.cfg()
tarantool> box.cfg.feedback_enabled
---
- true
...
```

After this patch (on `-DENABLE_FEEDBACK_DAEMON=OFF` build):

```yaml
tarantool> box.cfg()
tarantool> box.cfg.feedback_enabled
---
- null
...
```

NB: The following test cases in cartridge are failed with
`-DENABLE_FEEDBACK_DAEMON=OFF` (as before as well as after the patch):

* integration.feedback.test_feedback
* integration.feedback.test_rocks

Since they verify cartridge's additions for the feedback daemon, it is
expected outcome of disabling the component entirely. Ideally we should
conditionally disable those test cases, but it is out of scope here.

Follows up #3308

NO_DOC=I think it is expected behavior and unlikely it requires any
       change in the documentation
NO_TEST=a test would verify behavior of the particular build type, but
        we have no such configuration in CI, so the test would be pretty
        useless
NO_CHANGELOG=seems too minor to highlight it for users

51a9cef3

Aug 08, 2022

Use size-bounded versions of sprintf, strcpy and strcat · 243f9ebd

Ilya Verbin authored 2 years ago

To avoid potential buffer overflows and to make static analyzers happy.

Fixed CWE-120:
- sprintf: does not check for buffer overflows
- strcpy: does not check for buffer overflows when copying to destination
- strcat: does not check for buffer overflows when concatenating to
  destination

Closes #7534

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

243f9ebd

util: introduce strlcat utility function · dcd9be4a

Ilya Verbin authored 2 years ago

strlcat is a function from BSD, which is designed to be safer, more
consistent, and less error prone replacement for strcat and strncat.

NO_DOC=internal
NO_CHANGELOG=internal

Part of #7534

dcd9be4a

Aug 05, 2022

memtx: fix dirty data written to snapshot for hash index · 64d87e88

Vladimir Davydov authored 2 years ago

The hash index doesn't create a snapshot clarifier, which is used for
filtering out uncommitted tuples from a snapshot. Fix this. Also fix
a bug in hash_snapshot_iterator_next, where we passed a wrong argument
to tuple_data_range. It hasn't fired, because the clarifier didn't work.

Fixes commit ee8ed065 ("txm: clarify all fetched tuples").
Fixes commit f167c1af ("memtx: decompress tuples in snapshot
iterator").

Closes #7539

NO_DOC=bug fix

64d87e88

memtx: fix handling of corner cases gap tracking in transaction manager · 7360281e

Georgiy Lebedev authored 2 years ago


Gap tracking does not handle gap writes when the key has the same value as
the gap item: review the whole gap write handling logic, refactor it and
fix handling of corner cases along the way.

Co-authored-by: Alexander Lyapunov <alyapunov@tarantool.org>

Closes #7375

NO_DOC=bugfix

7360281e

memtx: denormalize `ITER_ALL` to `ITER_GE` in TREE index · 5a5e72b9

Georgiy Lebedev authored 2 years ago

Since `ITER_ALL` is an alias to `ITER_GE` in context of TREE index,
denormalize it during iterator creation.

Needed for #7375

NO_CHANGELOG=refactoring
NO_DOC=refactoring
NO_TEST=refactoring

5a5e72b9

memtx: fix tree iterator `next` result clarification · 542f9525

Georgiy Lebedev authored 2 years ago

The problem is described in #7073. It was fixed only for
`tree_iterator_start_raw` next method, but other methods used for reverse
iterators are also subject to this bug: move tuple clarification from the
wrapper of iterator `next` methods to individual iterator methods.

Closes #7432

NO_DOC=bugfix

542f9525

lua/decimal: add Lua value accessors to module API · c75fbce1

Alexander Turenko authored 2 years ago

The Rust module (see the issue) needs a getter and a setter for decimal
values on the Lua stack. Let's make them part of the module API.

Part of #7228

@TarantoolBot document
Title: Lua/C functions for decimals in the module API

The following functions are added into the module API:

```c
/**
 * Allocate a new decimal on the Lua stack and return
 * a pointer to it.
 */
API_EXPORT box_decimal_t *
luaT_newdecimal(struct lua_State *L);

/**
 * Allocate a new decimal on the Lua stack with copy of given
 * decimal and return a pointer to it.
 */
API_EXPORT box_decimal_t *
luaT_pushdecimal(struct lua_State *L, const box_decimal_t *dec);

/**
 * Check whether a value on the Lua stack is a decimal.
 *
 * Returns a pointer to the decimal on a successful check,
 * NULL otherwise.
 */
API_EXPORT box_decimal_t *
luaT_isdecimal(struct lua_State *L, int index);
```

c75fbce1

lua/interval: rework luaT_{new,push}interval() · 225a6213

Alexander Turenko authored 2 years ago

This change follows the previous commits regarding decimal, uuid and
datetiem functions. See them for details.

Part of #7228

NO_DOC=refactoring, no user-visible changes
NO_TEST=refactoring, no behavior changes
NO_CHANGELOG=refactoring, no user-visible changes

225a6213

lua/datetime: rework luaT_{new,push}datetime() · 1eadf531

Alexander Turenko authored 2 years ago

This change follows the previous commits regarding
`luaT_{new,push}decimal()` and `luaT_{new,push}uuid()`. See them for
details.

Part of #7228

NO_DOC=refactoring, no user-visible changes
NO_TEST=refactoring, no behavior changes
NO_CHANGELOG=refactoring, no user-visible changes

1eadf531

lua/uuid: rework luaT_{new,push}uuid() API · 21f6a4b7

Alexander Turenko authored 2 years ago

This change follows the previous commit regarding `luaT_newdecimal()`
and `luaT_pushdecimal()`, see explanation and details there.

Also changed the `luaL_` prefix to more appropriate `luaT_`. The `struct
tt_uuid` is our own type, the functions are specific to tarantool. So
`luaT_`.

Part of #7228

NO_DOC=refactoring, no user-visible changes
NO_TEST=refactoring, no behavior changes
NO_CHANGELOG=refactoring, no user-visible changes

21f6a4b7

lua/decimal: rework luaT_{new,push}decimal() API · b4f6675a

Alexander Turenko authored 2 years ago

`luaT_pushdecimal()` now accepts a decimal argument to copy into the Lua
managed memory.

`luaT_newdecimal()` now doing what `luaT_pushdecimal()` did before: just
allocates a storage for decimal on the Lua stack.

This naming looks much more friendly. It also seems that it follow Lua
API names: `lua_push*()` accepts what to push, `lua_new*()` doesn't.

A couple of notes around the change:

* On the first glance it seems that `luaT_pushdecimal()` is redundant,
  because it can be written using `luaT_newdecimal()` + copying. That's
  truth in contexts, where we know size of the internal `decimal_t`
  structure. A user of the module API don't know it and should pass
  `box_decimal_t *` pointer to `luaT_pushdecimal()` to write the value.
* I use `memcpy()` instead of just `*a = *b` in `luaT_pushdecimal()` to
  copy the padding byte content. Who knows, maybe this not-so-legal way
  to hold extra information may be crucial for some use case or will
  allow us to add one field into the structure.

This is preparatory commit for exposing `luaT_*decimal()` functions into
the module API.

Next commits will change uuid, datetime, interval functions in the same
way.

Part of #7228

NO_DOC=refactoring, no user-visible changes
NO_TEST=will be tested in a next commit, after exposing to the
        module API
NO_CHANGELOG=refactoring, no user-visible changes

b4f6675a

lua/decimal: return pointer from luaT_isdecimal() · 450e2664

Alexander Turenko authored 2 years ago

This way we can use just luaT_isdecimal() instead of two calls:
luaT_isdecimal() + luaT_checkdecimal() or luaT_isdecimal() +
luaL_checkcdata(). It is convenient and we already follow this way in
luaT_istuple().

The difference from luaT_checkdecimal() is that luaT_isdecimal() does
not raise a Lua exception. In may be undesirable and/or complicated to
handle in some contexts.

This is the preparation for exposing luaT_isdecimal() into the module
API.

Part of #7228

NO_DOC=refactoring, no user-visible changes
NO_TEST=will be tested in a next commit, after exposing to the
        module API
NO_CHANGELOG=refactoring, no user-visible changes

450e2664

lua/decimal: use luaT_ prefix instead of lua_ · 3061bea9

Alexander Turenko authored 2 years ago

We use the tarantool specific prefix for functions that are working with
tarantool specific types. lua_ or luaL_ prefix may be confusing, because
it is not always clear what is the origin of the function and where to
find its documentation.

This change is the preparation for exposing luaT_pushdecimal() and
luaT_isdecimal() into the module API.

While I'm here, I made several tidy changes:

* Added `static` where appropriate.
* Removed luaT_pushdecimalstr() from the header file, because it is not
  used outside of the compilation unit.

Part of #7228

NO_DOC=refactoring, no user-visible changes
NO_TEST=refactoring, nothing new to test
NO_CHANGELOG=refactoring, no user-visible changes

3061bea9

raft: make followers notice leader hang · 56571d83

Serge Petrenko authored 2 years ago

It's possible to hang an instance by some non-yielding request. The
simplest example is `while true do end`. A more true to life one would
be a `select{}` from a large space, or `pairs` iteration over a space
without yields.

Any such request makes the instance unresponsive - it can serve neither
reads nor writes. At the same time, the instance appears alive to other
cluster members: relay thread used to communicate with others is not
hung and continues to send heartbeats every replication_timeout.

The problem is the most severe with Raft leader elections: followers
believe the leader is fine and do not start elections despite leader
being unable to serve reads or writes.

Closes #7512

NO_DOC=bugfix

56571d83

Aug 04, 2022

salad: rework bps tree read view API · 91caa388

Vladimir Davydov authored 2 years ago

Currently, there's no notion of a BPS tree read view per se - one can
create an iterator over a regular tree and then "freeze" it. This works
just fine for snapshotting and joining replicas, but this spartan API
doesn't let us implement user read views, because to do that we need to
do lookups and create iterators over a frozen tree as many times as we
want, not just once.

So this patch introduces a concept of bps_tree_view, which contains a
frozen image of a bps_tree and implements a subset of non-modifying
bps_tree methods:

 - bps_tree_view_size
 - bps_tree_view_find
 - bps_tree_view_first
 - bps_tree_view_last
 - bps_tree_view_lower_bound
 - bps_tree_view_lower_bound_elem
 - bps_tree_view_upper_bound
 - bps_tree_view_upper_bound_elem
 - bps_tree_view_iterator_get_elem
 - bps_tree_view_iterator_prev
 - bps_tree_view_iterator_next
 - bps_tree_view_iterator_is_equal

Note, bps_tree and bps_tree_view share bps_tree_iterator, because
iterator methods (get_elem, next, prev, is_equal) take bps_tree or
bps_tree_view. The bps_tree_iterator now contains only block index and
offset.

We could also implement the rest of non-modifying methods, but didn't do
that, because they are not needed to implement user read views:

 - bps_tree_random
 - bps_tree_approximate_count
 - bps_tree_debug_check
 - bps_tree_print

To create a bps_tree_view from a bps_tree, one is supposed to call
bps_tree_view_create. If a bps_tree_view is no longer needed, it should
be destroyed with bps_tree_view_destroy.

Old methods used for creating frozen iterators were dropped:

 - bps_tree_iterator_freeze
 - bps_tree_iterator_destroy

To avoid code duplication, we factored out the common part of bps_tree
and bps_tree_view into a new structure, named bps_tree_common.
Basically, the new structure contains all bps_tree members except
matras, which is stored in bps_tree. The difference between
bps_tree_view and bps_tree is that the latter stores matras_view
instead of matras. The common part contains pointers to matras and
matras_view, which are used by internal implementation to look up
bps_tree blocks.

All internal methods now take bps_tree_common instead of bps_tree.
For all public methods that are implemented both for bps_tree and
bps_tree_view, we have the common implementation defined in _impl
suffixed private function, which is called by the corresponding public
functions.

To ensure that a modifying method isn't called on bps_tree_common object
corresponding to a bps_tree_view because of a bug in the bps_tree
implementation, we added !matras_is_read_view_created assertion to
bps_tree_touch_block.

Closes #7191

NO_DOC=refactoring
NO_CHANGELOG=refactoring

91caa388

Use bps_tree_size instead of accessing size directly · 5855fd30

Vladimir Davydov authored 2 years ago

We have a method for getting the number of elements stored in a BPS
tree. Let's use it instead of accessing BPS tree internals directly
so that we can freely refactor BPS tree internals.

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

5855fd30

salad: minor bps tree code cleanup · 2fa28b41

Vladimir Davydov authored 2 years ago

 - Add bps_tree_delete_value to the comment and declarations.
   (All other public methods are there.)
 - Fix typo in a comment: approxiamte -> approximate.
 - Fix comment to bps_tree_random.
 - Remove repeated word 'count' from comments.

NO_DOC=no change
NO_TEST=no change
NO_CHANGELOG=no change

2fa28b41

salad: rename a few bps_tree methods · c84990ad

Vladimir Davydov authored 2 years ago

 - Rename bps_tree_iterator_are_equal to bps_tree_iterator_is_equal for
   consistency with other methods that check two objects for equality
   (for example, tt_uuid_is_equal).

 - Rename bps_tree_iterator_first and bps_tree_iterator_last to
   bps_tree_first and bps_tree_last, because these are methods of
   bps_tree, not bps_tree_iterator. Omitting _iterator is also
   consistent with bps_tree_lower_bound and bps_tree_upper_bound
   methods, which also create bps_tree_iterator objects.

NO_DOC=refactoring
NO_TEST=refactoring
NO_CHANGELOG=refactoring

c84990ad

memtx: fix reverse iterators gap tracking · fda38e66

Georgiy Lebedev authored 2 years ago

In case of reverse iterators, due to index limitations, we need to clarify
the successor tuple early: this implies that the successor's story is not
always at the top of the history chain, whilst we need to add the gap item
to the story currently present in index — fix this by reusing the
iterators' check logic to set the current iterator's tuple (which is
considered the successor) to a tuple in index.

CLoses #7409

NO_DOC=bugfix

fda38e66

memtx: remove `tree_iterator_set_current_tuple` on clarified tuples · 837e0524

Georgiy Lebedev authored 2 years ago

All the 'base' TREE index iterator `next` methods (including
`tree_iterator_start`) internally set the iterator's current tuple to the
one found in the index satisfying the conditions: setting the iterator's
current tuple to the clarified one is redundant and moreover gives a
performance penalty each iteration because of the iterator check logic.

Needed for #7409

NO_CHANGELOG=refactoring
NO_DOC=refactoring
NO_TEST=refactoring

837e0524

memtx: fix HASH index 'GT' iterator `next` method set incorrectly · 5c0d7117

Georgiy Lebedev authored 2 years ago

The `next` method of memtx HASH index 'GT' iterator is initially set to
'GT' and is supposed to be set to 'GE' after first iteration: it is
mistakenly set to the 'base' method instead of the full method which also
does tuple clarification — this allows dirty reads. Move the `next` method
change on first iteration to `WRAP_ITERATOR_METHOD` for clarity and
correctness.

Closes #7477

NO_DOC=bugfix

5c0d7117

salad: rework light hash read view API · b595f212

Vladimir Davydov authored 2 years ago

Currently, there's no notion of a LIGHT hash table read view per se -
one can create an iterator over a regular hash table and then "freeze"
it. This works just fine for snapshotting and joining replicas, but this
spartan API doesn't let us implement user read views, because to do that
we need to do lookups and create iterators over a frozen hash table as
many times as we want, not just once.

So this patch introduces a concept of LIGHT(view), which contains a
frozen image of a LIGHT(core) and implements a subset of non-modifying
LIGHT(core) methods:

 - LIGHT(view_count)
 - LIGHT(view_find)
 - LIGHT(view_find_key)
 - LIGHT(view_get)
 - LIGHT(view_iterator_begin)
 - LIGHT(view_iterator_key)
 - LIGHT(view_iterator_get_and_next)

Note, LIGHT(core) and LIGHT(view) share LIGHT(iterator), because
iterator methods (begin, key, get_and_next) take LIGHT(core) or
LIGHT(view). The LIGHT(iterator) now contains only a hash table slot.

We could also implement the rest of non-modifying methods, but didn't do
that, because they are not needed to implement user read views:

 - LIGHT(random)
 - LIGHT(selfcheck)

To create a LIGHT(view) from a LIGHT(core), one is supposed to call
LIGHT(view_create). If a LIGHT(view) is no longer needed, it should be
destroyed with LIGHT(view_destroy).

Old methods used for creating frozen iterators were dropped:

 - LIGHT(iterator_freeze)
 - LIGHT(iterator_destroy)

To avoid code duplication, we factored out the common part of
LIGHT(core) and LIGHT(view) into a new structure, named LIGHT(common).
Basically, the new structure contains all LIGHT(core) members except
matras, which is stored in LIGHT(core). The difference between
LIGHT(view) and LIGHT(core) is that the latter stores matras_view
instead of matras. The common part contains pointers to matras and
matras_view, which are used by internal implementation to look up
LIGHT(record).

All internal methods now take LIGHT(common) instead of LIGHT(core).
For all public methods that are implemented both for LIGHT(core) and
LIGHT(view), we have the common implementation defined in _impl suffixed
private function, which is called by the corresponding public functions.

To ensure that a modifying method isn't called on LIGHT(common) object
corresponding to a LIGHT(view) because of a bug in the LIGHT code, we
added !matras_is_read_view_created assertion to LIGHT(touch_record),
LIGHT(prepare_first_insert), and LIGHT(grow).

Closes #7192

NO_DOC=refactoring
NO_CHANGELOG=refactoring

b595f212