Commits · ab24dfb65b45ab372ccbf4831dde80d3887e686b · core / tarantool

Nov 08, 2023

perf: add TreeReplaceRandomExistingKeys to memtx benchmark · 3a3890ed

Ilya Verbin authored 1 year ago

It is similar to TreeGetRandomExistingKeys, but performs box_replace()
instead of box_index_get().

Needed for #6762

NO_DOC=benchmark
NO_TEST=benchmark
NO_CHANGELOG=benchmark

3a3890ed

Oct 19, 2023

box: call tuple_free from box_free · 05751e6c

Vladimir Davydov authored 1 year ago

There are four problems we have to address to make this possible:

 1. memtx_engine_shutdown may delete the tuple referenced by
    box_tuple_last so that tuple_free, which is called later by
    box_free, will crash trying to free it. Fix this by clearing
    box_tuple_last in memtx_engine_shutdown.

 2. tuple_format_destroy and tuple_field_delete, called by it, expect
    all constraints to be detached. Let's destroy the constraints if
    this isn't the case. This effectively reverts commit 7a87b9a5
    ("box: do not call constraint[i].destroy() in
    tuple_field_delete()").

 3. tuple_field_delete, called by tuple_format_destroy, expects the
    default value function to be unpinned. Let's unpin it if this isn't
    the case. To avoid linking dependencies between the tuple and box
    libraries, we have to introduce a virtual destructor for
    field_default_func.

 4. The tuple_format unit test calls tuple_free after box_free. If
    box_free calls tuple_free by itself, this leads to a crash. Fix this
    by removing tuple_free and tuple_init calls from the test.

Closes #9174

NO_DOC=code health
NO_CHANGELOG=code health
NO_TEST=checked by existing tests

05751e6c

Oct 09, 2023

perf: add memtx benchmark · 2b7d9027

Georgiy Lebedev authored 2 years ago

This first version is quite basic and only benchmarks random `get`s of
existing keys and `select`s of all keys for a tree index (these benchmarks
are needed for #6964) — its main goal is to provide a foundation (i.e., all
the necessary initialization logic) for benchmarking memtx. Extending this
benchmark using the provided memtx singleton and fixture should be fairly
simple.

The results of running this benchmark compiled with clang-16 on my Intel
MacBook Pro (13-inch, 2020) laptop [1]:

NO_WRAP
georgiy.lebedev@georgiy-lebedev perf % ./memtx.perftest --benchmark_min_warmup_time=10 --benchmark_repetitions=10 --benchmark_report_aggregates_only=true --benchmark_display_aggregates_only=true
2023-10-02T12:59:36+03:00
Running ./memtx.perftest
Run on (8 X 2000 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB
  L1 Instruction 32 KiB
  L2 Unified 512 KiB (x4)
  L3 Unified 6144 KiB
Load Average: 5.67, 10.05, 7.89
mapping 4398046511104 bytes for memtx tuple arena...
Actual slab_alloc_factor calculated on the basis of desired slab_alloc_factor = 1.090508
fiber has not yielded for more than 0.500 seconds
--------------------------------------------------------------------------------------------------------
Benchmark                                              Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------------------------
MemtxFixture/TreeGetRandomExistingKeys_mean          682 ns          667 ns           10 items_per_second=1.51504M/s
MemtxFixture/TreeGetRandomExistingKeys_median        704 ns          693 ns           10 items_per_second=1.44387M/s
MemtxFixture/TreeGetRandomExistingKeys_stddev       81.7 ns         72.7 ns           10 items_per_second=169.696k/s
MemtxFixture/TreeGetRandomExistingKeys_cv          11.97 %         10.90 %            10 items_per_second=11.20%
MemtxFixture/TreeGet1RandomExistingKey_mean          253 ns          241 ns           10 items_per_second=4.20104M/s
MemtxFixture/TreeGet1RandomExistingKey_median        233 ns          229 ns           10 items_per_second=4.36911M/s
MemtxFixture/TreeGet1RandomExistingKey_stddev       46.7 ns         29.7 ns           10 items_per_second=464.187k/s
MemtxFixture/TreeGet1RandomExistingKey_cv          18.43 %         12.34 %            10 items_per_second=11.05%
MemtxFixture/TreeSelectAll_mean                  4766728 ns      4705622 ns           10 items_per_second=27.978M/s
MemtxFixture/TreeSelectAll_median                4605936 ns      4580478 ns           10 items_per_second=28.6184M/s
MemtxFixture/TreeSelectAll_stddev                 447495 ns       349499 ns           10 items_per_second=1.85573M/s
MemtxFixture/TreeSelectAll_cv                       9.39 %          7.43 %            10 items_per_second=6.63%
NO_WRAP

[1]: https://support.apple.com/kb/SP819?locale=en_US

Needed for #6964

NO_CHANGELOG=benchmark
NO_DOC=benchmark
NO_TEST=benchmark

2b7d9027

Oct 03, 2023

ci: run performance tests · 5edcb712

Sergey Bronnikov authored 1 year ago

Performance tests added to perf directory are not automated and
currently we run these tests manually from time to time. From other side
source code that used rarely could lead to software rot [1].

The patch adds CMake target "test-perf" and GitHub workflow, that runs
these tests in CI. Workflow is based on workflow release.yml, it builds
performance tests and runs them.

1. https://en.wikipedia.org/wiki/Software_rot

NO_CHANGELOG=testing
NO_DOC=testing
NO_TEST=testing

5edcb712

perf: add targets for running C performance tests · 68623381

Sergey Bronnikov authored 1 year ago

The patch adds a targets for each C performance test in a directory
perf/ and a separate target "test-c-perf" that runs all C performance
tests at once.

NO_CHANGELOG=testing
NO_DOC=testing
NO_TEST=test infrastructure

68623381

perf: add targets for running Lua performance tests · 49d9a874

Sergey Bronnikov authored 1 year ago

The patch adds a targets for each Lua performance test in a directory
perf/lua/ (1mops_write_perftest, box_select_perftest,
uri_escape_unescape_perftest) and a separate target "test-lua-perf" that
runs all Lua performance tests at once.

NO_CHANGELOG=testing
NO_DOC=testing
NO_TEST=test infrastructure

49d9a874

Sep 29, 2023

box: implement sort_order in indexes · b1990b21

Magomed Kostoev authored 1 year ago

The `sort_order` parameter was introduced earlier but had no effect
until now. Now it allows to specify a sort (iteration) order for
each key part.

The parameter is only applicable to ordered indexes, so any value
except 'undef' for the `sort_order` is disallowed for all indexes
except TREE. The 'undef' value of the `sort_order` field of the
`key_part_def` is translated to 'asc' on `key_part` creation.

In order to make the key def aware if its index is unordered, the
signature of `key_def_new` has been changed: the `for_func_index`
parameter has been moved to the new `flags` parameter and
`is_unordered` flag has been introduced.

Alternative iterator names has been introduced (which are aliases
to regular iterators): box.index.FORWARD_[INCLUSIVE/EXCLUSIVE],
box.index.REVERSE_[INCLUSIVE/EXCLUSIVE].

By the way fixed the `key_hint_stub` overload name, which supposed
to be called `tuple_hint_stub`.

`tuple_hint` and `key_hint` template declarations has been changed
because of the checkpatch diagnostics.

Closes #5529

@TarantoolBot document
Title: Now it's possible to specify sort order of each index part.

Sort order specifies the way indexes iterate over tuples with
different fields in the same part. It can be either ascending
(which is the case by default) and descending.

Tuples with different ascending parts are sorted in indexes from
lesser to greater, whereas tuples with different descending parts
are sorted in the opposte order: from greater to lesser.

Given example:

```lua
box.cfg{}

s = box.schema.create_space('tester')
pk = s:create_index('pk', {parts = {
  {1, 'unsigned', sort_order = 'desc'},
  {2, 'unsigned', sort_order = 'asc'},
  {3, 'unsigned', sort_order = 'desc'},
}})

s:insert({1, 1, 1})
s:insert({1, 1, 2})
s:insert({1, 2, 1})
s:insert({1, 2, 2})
s:insert({2, 1, 1})
s:insert({2, 1, 2})
s:insert({2, 2, 1})
s:insert({2, 2, 2})
s:insert({3, 1, 1})
s:insert({3, 1, 2})
s:insert({3, 2, 1})
s:insert({3, 2, 2})
```

In this case field 1 and 3 are descending, whereas field 2 is
ascending. So `s:select()` will return this result:

```yaml
---
- [3, 1, 2]
- [3, 1, 1]
- [3, 2, 2]
- [3, 2, 1]
- [2, 1, 2]
- [2, 1, 1]
- [2, 2, 2]
- [2, 2, 1]
- [1, 1, 2]
- [1, 1, 1]
- [1, 2, 2]
- [1, 2, 1]
...
```

Beware, that when using other sort order than 'asc' for any field
'GE', 'GT', 'LE' and 'LT' iterator lose their meaning and specify
'forward inclusive', 'forward exclusive', 'reverse inclusive' and
'reverse exclusive' iteration direction respectively. Given example
above, `s:select({2}, {iterator = 'GT'})` will return this:

```yaml
---
- [1, 1, 2]
- [1, 1, 1]
- [1, 2, 2]
- [1, 2, 1]
...
```

And `s:select({1}, {iterator = 'LT'})` will give us:

```yaml
---
- [2, 2, 1]
- [2, 2, 2]
- [2, 1, 1]
- [2, 1, 2]
- [3, 2, 1]
- [3, 2, 2]
- [3, 1, 1]
- [3, 1, 2]
...
```

In order to be more clear alternative iterator aliases can be used:
'FORWARD_INCLUSIVE', 'FORWARD_EXCLUSIVE', 'REVERSE_INCLUSIVE',
'REVERSE_EXCLUSIVE':

```
> s:select({1}, {iterator = 'REVERSE_EXCLUSIVE'})
---
- [2, 2, 1]
- [2, 2, 2]
- [2, 1, 1]
- [2, 1, 2]
- [3, 2, 1]
- [3, 2, 2]
- [3, 1, 1]
- [3, 1, 2]
...
```

b1990b21

Aug 08, 2023

perf: initial version of 1M operations test · 10870343

Sergey Ostanevich authored 1 year ago

The test can be used for regression testing. It is advisable to tune
the machine: check the NUMA configuration, fix the pstate or similar
CPU autotune. Although, running dozen times gives more-less stable
result for the peak performance, that should be enough for regression
identification.

NO_DOC=adding an internal test
NO_CHANGELOG=ditto
NO_TEST=ditto

10870343

Jul 25, 2023

perf: add test for box select · 114d09f5

Vladimir Davydov authored 1 year ago

The test runs get, select, pairs space methods with various arguments in
a loop and prints the average method run time in nanoseconds (lower is
better).

Usage:

  tarantool box_select.lua

Output format:

  <test-case> <run-time>

Example:

  $ tarantool box_select.lua --pattern 'get|select_%d$'
  get_0 155
  get_1 240
  select_0 223
  select_1 335
  select_5 2321

Options:

  --pattern <string>  run only tests matching the pattern; use '|'
                      to specify more than one pattern, for example,
                      'get|select'
  --read_view         use a read view (EE only)

Apart from the test, this patch also adds a script that compares test
results:

  $ tarantool box_select.lua --pattern get > base
  $ tarantool box_select.lua --pattern get > patched1
  $ tarantool box_select.lua --pattern get > patched2
  $ tarantool compare.lua base patched1 patched2
         base          patched1          patched2
  get_0   149       303 (+103%)       147 (-  1%)
  get_1   239       418 (+ 74%)       238 (-  0%)

NO_DOC=perf test
NO_TEST=perf test
NO_CHANGELOG=perf test

114d09f5

Mar 23, 2023

salad: allow to pass matras statistics to LIGHT, BPS tree, R-tree · f000eff6

Vladimir Davydov authored 2 years ago

Simply forward the new matras_stats argument to matras_create().
Currently, it's only used in tests.

While we are at it:
 - Drop custom alloc/free func types in favor of matras types since
   now container constructors depend on matras_stats anyway.
 - Rearrange arguments order to group allocator-related arguments
   together.
 - Drop rtree_init return value (it's always 0).

Needed for https://github.com/tarantool/tarantool-ee/issues/143

NO_DOC=refactoring
NO_CHANGELOG=refactoring

f000eff6

Jan 24, 2023

perf/cmake: add a function for generating perf test targets · ca58d6c9

Sergey Bronnikov authored 2 years ago

Commit 2be74a65 ("test/cmake: add a function for generating unit
test targets") added a function for generating unit test targets in
CMake. This function makes code simpler and less error-prone.

Proposed patch adds a similar function for generating performance test
targets in CMake.

NO_CHANGELOG=build infrastructure updated
NO_DOC=build infrastructure updated
NO_TEST=build infrastructure updated

ca58d6c9

Dec 27, 2022

perf: add uri.escape/unescape test · 3cc0b3cf

Sergey Bronnikov authored 2 years ago

Added a simple benchmark for URI escape/unescape.

Part of #3682

NO_DOC=documentation is not required for performance test
NO_CHANGELOG=performance test
NO_TEST=performance test

3cc0b3cf

Aug 26, 2022

perf: introduce Light benchmark · 9818bba4

Nikita Pettik authored 2 years ago

Benchmark is implemented using Google Benchmark lib. Here's benchmark
settings:
 - values: we use structure (tuple) containing pointer to heap memory
           and size (all payload is of the same size - 32 bytes);
 - keys: unsigned char (first byte in the tuple memory);
 - hash function: FNV-1a;
 - value comparator: std::memcmp();
 - value count: 10k - 100k - 1M

Before each test we prepare vector of tuples storing truly random
values.

Here's the list of results obtained on my PC (i7-8700 12 X 4600 MHz):

Insertions: ~20-12M per second;
Find (no misses): ~58-16M* per second (find by key gives the same result);
Find (many misses): ~84-30M per second;
Iteration with dereference: ~450M per second;
Insertions after erase: ~50-17M* per second;
Find after erase: ~52-17M* per second (the same as without erase);
Delete: ~32-8M* per second.

* The first value is for 10k values in hash table; second - is for 1M.

Just to have some baseline here results for quite similar benchmark for
std::unordered_map (it is also included in source file):

Insertions: ~26-8M per second;
Find (no misses): ~44-11M per second;
Iteration with dereference: ~265-56M per second;
Find after erase: ~37-13M per second.

Part of #7338

NO_TEST=<Benchmark>
NO_DOC=<Benchmark>
NO_CHANGELOG=<Benchmark>

9818bba4

perf: use C++ 14 standard · e48835fd

Nikita Pettik authored 2 years ago

There are a lot of pretty things introduced in 14 standard,
so let's use it.

NO_DOC=<Build change>
NO_TEST=<Build change>
NO_CHANGELOG=<Build change>

e48835fd

perf: move debug warning to a separate header · 0a7764a7

Nikita Pettik authored 2 years ago

It's useful and can be used in all performance tests, so let's move it
to a separate header.

NO_TEST=<Refactoring>
NO_DOC=<Refactoring>
NO_CHANGELOG=<Refactoring>

0a7764a7

Jun 28, 2022

tuple: refactor flags · 9da70207

Nikita Pettik authored 2 years ago

Before this patch struct tuple had two boolean bit fields: is_dirty and
has_uploaded_refs. It is worth mentioning that sizeof(boolean) is
implementation depended. However, in code it is assumed to be 1 byte
(there's static assertion restricting the whole struct tuple size by 10
bytes). So strictly speaking it may lead to the compilation error on
some non-conventional system. Secondly, bit fields anyway consume at
least one size of type (i.e. there's no space benefits in using two
uint8_t bit fields - they anyway occupy 1 byte in total). There are
several known pitfalls concerning bit fields:
 - Bit field's memory layout is implementation dependent;
 - sizeof() can't be applied to such members;
 - Complier may raise unexpected side effects
   (https://lwn.net/Articles/478657/).

Finally, in our code base as a rule we use explicit masks:
txn flags, vy stmt flags, sql flags, fiber flags.

So, let's replace bit fields in struct tuple with single member called
`flags` and several enum values corresponding to masks (to be more
precise - bit positions in tuple flags).

NO_DOC=<Refactoring>
NO_CHANGELOG=<Refactoring>
NO_TEST=<Refactoring>

9da70207

May 18, 2022

replication: fix race in accessing vclock by applier and tx threads · ddec704e

Serge Petrenko authored 2 years ago

When applier ack writer was moved to applier thread, it was overlooked
that it would start sharing replicaset.vclock between two threads.

This could lead to the following replication errors on master:

 relay//102/main:reader V> Got a corrupted row:
 relay//102/main:reader V> 00000000: 81 00 00 81 26 81 01 09 02 01

Such a row has an incorrectly-encoded vclock: `81 01 09 02 01`.
When writer fiber encoded the vclock length (`81`), there was only one
vclock component: {1: 9}, but at the moment of iterating over the
components, another WAL write was reported to TX thread, which bumped
the second vclock component {1: 9, 2: 1}.

Let's fix the race by delivering a copy of current replicaset vclock to
the applier thread.

Also add a perf test to the perf/ directory.

Closes #7089
Part-of tarantool/tarantool-qa#166

NO_DOC=internal fix
NO_TEST=hard to test

ddec704e

Mar 24, 2022

box: introduce a pair of tuple_format_new helpers · 4b8dc6b7

Aleksandr Lyapunov authored 3 years ago

tuple_format_new has lots of arguments, all of them necessary
indeed. But a small analysss showed that almost always there are
only two kinds of usage of that function: with lots of zeros as
arguments and lots of values taken from space_def.

Make two versions of tuple_format_new:
simple_tuple_format_new, with all those zeros omitted, and
space_tuple_format_new, that takes space_def as an argument.

NO_DOC=refactoring
NO_CHANGELOG=refactoring

4b8dc6b7

Mar 23, 2022

Fix undefined reference to `set_sigint_cb` function · 4c48af26

mechanik20051988 authored 3 years ago

We should link box_test_utils to tuple perf test to
prevent this error.

Follow up #2717

NO_CHANGELOG=build fix
NO_DOC=build fix
NO_TEST=build fix

4c48af26

Mar 03, 2022

alter: implement ability to set compression for tuple fields · a51313a4

mechanik20051988 authored 3 years ago

Implement ability to set compression for tuple fields. Compression type
for tuple fields is set in the space format, and can be set during space
creation or during setting of a new space format.
```lua
format = {{name = 'x', type = 'unsigned', compression = 'none'}}
space = box.schema.space.create('memtx_space', {format = format})
space:drop()
space = box.schema.space.create('memtx_space')
space:format(format)
```
For opensource build only one compression type ('none') is
supported. This type of compression means its absence, so
it doesn't affect something.

Part of #2695

NO_CHANGELOG=stubs for enterprise version
NO_DOC=stubs for enterprise version

a51313a4

Feb 03, 2022

test: fix incorrect resource release · 438ce64e

mechanik20051988 authored 3 years ago

There were two problems with resource release in performance test:
- because of manually zeroing of `box_tuple_last`, tuple_format
  structure was not deleted. `box_tuple_last` should be zeroed in
  `tuple_free` function.
- invalid loop for resource release in one of the test cases.
This patch fix both problems.

NO_CHANGELOG=test fix
NO_DOC=test fix

438ce64e

Dec 09, 2021

cmake: align folders dependencies · d8097325

Sergey Ostanevich authored 3 years ago

Use of PROJECT_ prefix gives ability to build the project as a
submodule of other projects.

d8097325

Aug 18, 2021

memtx: introduce memtx_set_tuple_format_vtab() · 7417aed7

Nikita Pettik authored 3 years ago

This is helper to set proper tuple_format vtable depending on allocator
symbolic name.

Follow-up #5419

7417aed7

memtx: introduce allocator_settings structure · 96793697

Nikita Pettik authored 3 years ago

It is assumed to accumulate all allocation setting across all allocators
in order to unify Allocator::create() interface.

Follow-up #5419

96793697

memtx: implement template tuple allocation · 94e137bc

mechanik20051988 authored 4 years ago

Patch which prepare ability to select memory allocator.
Changed tuple allocation functions to templates, with
parameterized by the memory allocator type.
Part of #5419

94e137bc

memtx: replace direct function calls to calls via pointers from vtab · 801c906d

mechanik20051988 authored 4 years ago

Previously in memtx space direct memtx_tuple_new/memtx_tuple_delete
function calls were used. Also pointers to functions, used for alloc/free
memory for memtx tuples are stored in tuple_format_vtab. Replaced direct
memtx_tuple_new and memtx_tuple_delete function calls in memtx_space to
calls via pointers from vtab.
Part of #5419

801c906d

Aug 12, 2021
- perf: introduce tuple perf test · 3dea259c
  Aleksandr Lyapunov authored 3 years ago
  
  Part of #5385
  3dea259c