Commits · b4ddb4a084f9856c8a7e27ef463c43c6ca9b88a6 · core / tarantool

Nov 27, 2019

refactoring: clear triggers from fresh exceptions · b4ddb4a0

Ilya Kosarev authored 5 years ago

Clear triggers from freshly occured exceptions. Trivial replacements:
`diag_raise` by `return -1`, _xc function by it's non _xc version.

Part of #4247

b4ddb4a0

refactoring: set diagnostics if sequence_by_id fails · cc6f68d2

Ilya Kosarev authored 5 years ago

In refactoring: use non _xc version of functions in triggers
(b75d5f85) sequence_cache_find was
replaced by sequence_by_id. It led to the loss of diagnostics in case
of sequence_by_id failure. Now it is fixed.

Part of #4247

cc6f68d2

refactoring: recombine error conditions in triggers · 8d66e638

Ilya Kosarev authored 5 years ago

Some error conditions in triggers and underlying functions were
combined to look better. On the other hand, in
on_replace_dd_fk_constraint we now return an error immediately if
child space were not found instead of searching for both child and
parent spaces before search results inspection.

Part of #4247

8d66e638

refactoring: specify struct name in allocation diagnostics · bfe2a287

Ilya Kosarev authored 5 years ago

In case of allocation problems in region alloc we were setting
diagnostics using "new slab" stub. Now we specify concrete struct name
which was going to be allocated.

Part of #4247

bfe2a287

refactoring: wrap new operator calls in triggers · aa2e0987

Ilya Kosarev authored 5 years ago

std operator new might throw so we need to wrap it in triggers to
provide non-throwing triggers. It also means alter_space_move_indexes
returns an error code now. It's usages are updated.

Part of #4247

aa2e0987

sql: fix decode of boolean binding value · abc08ca7

Nikita Pettik authored 5 years ago

Some time ago, when there was no support of boolean type in SQL, boolean
values passed as parameters to be bound were converted to integer values
0 and 1. It takes place in lua_sql_bind_decode(). However, now we can
avoid this conversion and store booleans as booleans. Note that patch
does not include test case since type of value is preserved correctly,
so when binding is extracted from struct sql_bind it will assigned to
the right value.

abc08ca7

Nov 26, 2019

iproto: don't destroy a session during disconnect · 6da9d395

Vladislav Shpilevoy authored 5 years ago

Binary session disconnect trigger yield could lead to use after
free of the session object. That happened because iproto thread
sent two requests to TX thread at disconnect:

    - Close the session and run its on disconnect triggers;

    - If all requests are handled, destroy the session.

When a connection is idle, all requests are handled, so both these
requests are sent. If the first one yielded in TX thread, the
second one arrived and destroyed the session right under the feet
of the first one.

This can be solved in two ways - in TX thread, and in iproto
thread.

Iproto thread solution (which is chosen in the patch): just don't
send destroy request until disconnect returns back to iproto
thread.

TX thread solution (alternative): add a flag which says whether
disconnect is processed by TX. When destroy request arrives, it
checks the flag. If disconnect is not done, the destroy request
waits on a condition variable until it is.

The iproto is a bit tricker to implement, but it looks more
correct.

Closes #4627

6da9d395

Nov 22, 2019

luajit: bump a new version · 5d2105bf

Kirill Yukhin authored 5 years ago

Add LUAJIT_ENABLE_PAIRSMM flag as a build option for luajit.
If the flag is set, pairs/ipairs metamethods are available in
Lua 5.1.
For Tarantool this option is enabled by default.

5d2105bf

Nov 21, 2019

build: fix warning re comparison of enum and uint · 2afbe263

Vladislav Shpilevoy authored 5 years ago


The warning is observed when tarantool is compiled by GCC 9.1.0.

Warnings are treated as errors during a debug build or when
-DENABLE_WERROR=ON option is passed to cmake, that is usual for our
testing jobs in CI.

The commit that introduces the problem is
3a8adccf ('access: fix invalid error
type for not found user').

Reviewed-by: Alexander Turenko <alexander.turenko@tarantool.org>

Unverified

2afbe263

replication: use empty password by default · 6c01ca48

Vladislav Shpilevoy authored 5 years ago

Replication's applier encoded an auth request with exactly the
same parameters as extracted by the URI parser. I.e. when no
password was specified, the parser returned it as NULL, and it was
not encoded. The relay, received such an auth request, complained
that IPROTO_TUPLE field is not specified (this is password).

Such an error confuses - a user didn't do anything illegal, he
just used URI like 'login@host:port', without a password after the
login.

The patch makes the applier use an empty string as a default
password.

An alternative was to force a user always set a password even if
it is an empty string, like that: 'login:@host:port'. And if a
password was not found in an auth request, then reject it with a
password mismatch error. But in that case a URI of kind
'login@host:port' becomes useless - it can never pass. In
addition, netbox already uses an empty string as a default
password. So the only way to make it consistent, and don't break
anything - repeat netbox logic for replication URIs.

Closes #4605

Conflicts:
	test/replication/suite.cfg

6c01ca48

replication: show errno in replication info · 691715b5

Vladislav Shpilevoy authored 5 years ago

Box.info.replication shows applier/relay's latest error message.
But it didn't include errno description for system errors, even
though it was included in the logs. Now box.info shows the errno
description as well, when possible.

Closes #4402

Conflicts:
	test/replication/suite.cfg

691715b5

error: move errno into an error object · 22bbb34f

Vladislav Shpilevoy authored 5 years ago

The only error type having an errno as a part of it was
SystemError (and its descendants SocketError, TimedOut, OOM, ...).
That was used in logs (SystemError::log() method), and exposed to
Lua (if type was SystemError, an error object had 'errno' field).

But actually errno might be useful not only there. For example,
box.info.replication exposes the latest error message of
applier/relay as 'message' field of 'upstream/downstream' fields,
lacking errno description.

Before the patch it was impossible to obtain an errno code from C,
because it was necessary to check whether an error has SystemError
type, cast to SystemError class, and call SystemError::get_errno()
method.

Now errno is available as a part of struct error object (available
from C), and is not 0 for system errors.

Part of #4402

22bbb34f

access: fix invalid error type for not found user · 3a8adccf

Vladislav Shpilevoy authored 5 years ago

Box.session.su() raised 'SystemError' when a user was not found
due to a too long user name. That was obviously wrong, because
SystemError is always something related to libraries (standard,
curl, etc), and it has an errno code.

Now a ClientError is raised.

3a8adccf

func: fix use after free on function unload · fa2893ea

Vladislav Shpilevoy authored 5 years ago

Functions are stored in lists inside module objects. Module
objects are stored in a hash table, where key is a package name.
But the key was a pointer at one of module's function definition
object. Therefore, when that function was deleted, its freed
package name memory was still in the hash key, and could be
accessed, when another function was deleted.

Now module does not use memory of its functions, and keep a copy
of the package name.

fa2893ea

app/fiber: wait till a full event loop iteration ends · 7990d1fa

Serge Petrenko authored 5 years ago

fiber.top() fills in statistics every event loop iteration,
so if it was just enabled, fiber.top() returns zero in fiber cpu
usage statistics because total time consumed by the main thread was
not yet accounted for.
Same stands for viewing top() results for a freshly created fiber:
its metrics will be zero since it hasn't lived a full ev loop iteration
yet.
Fix this by delaying the test till top() results are meaningful and add
minor refactoring.

Follow-up #2694

7990d1fa

fiber.top(): alter exponential moving average calculation · e5a3c090

Serge Petrenko authored 5 years ago

When fiber EMA is 0 and first non-zero observation is added to it, we assumed
that EMA should be equal to this observation (i.e. average value should
be the same as the observed one). This breaks the following invariant:
sum of clock EMAs of all fibers equals clock EMA of the thread.
If one of the fibers is just spawned and has a big clock delta, it
will assign this delta to its EMA, while the thread will calculate the
new EMA as 15 * EMA / 16 + delta / 16, which may lead to a situation
when fiber EMA is greater than cord EMA.

This caused occasional test failures:
```
[001] Test failed! Result content mismatch:
[001] --- app/fiber.result	Mon Nov 18 17:00:48 2019
[001] +++ app/fiber.reject	Mon Nov 18 17:33:10 2019
[001] @@ -1511,7 +1511,7 @@
[001]  -- not exact due to accumulated integer division errors
[001]  sum_avg > 99 and sum_avg < 101 or sum_avg
[001]  ---
[001] -- true
[001] +- 187.59585601717
[001]  ...
[001]  tbl = nil
[001]  ---

```

Follow-up #2694

e5a3c090

fiber.top() refactor clock and cpu time calculation · 1743d0a4

Serge Petrenko authored 5 years ago

Unify all the members related to fiber's clock statistics into struct
clock_stat and all the members related to cord's knowledge of cpu state
and clocks to struct cpu_stat.
Reset stats of all alive fibers on fiber.top_enable().

Follow-up #2694

1743d0a4

Nov 15, 2019

test: ensure instances are stopped in tctl test · 8d363c43

Alexander Turenko authored 5 years ago


The problem appears after 6c627af3
('test: tarantoolctl: verify delayed box.cfg()'), where the test case
was changed and it doesn't more assume an error at the instance start.
So we need to stop it to prevent a situation when instances are stay
after `make test`.

Fixes #4600.
Reviewed-by: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>

Unverified

8d363c43

Nov 14, 2019

app/argparse: expect no value for a boolean option · e47f2c91

Alexander Turenko authored 5 years ago


Before commit 03f85d4c ('app: fix
boolean handling in argparse module') the module does not expect a value
after a 'boolean' argument. However there was the problem: a 'boolean'
argument can be passed only at end of an argument list, otherwise it
wrongly consumes a next argument and gives a confusing error message.

The mentioned commit fixes this behaviour in the following way: it still
allows to pass a 'boolean' argument at end of the list w/o a value, but
requires a value ('true', 'false', '1', '0') if a 'boolean' argument is
not at the end to be provided using {'--foo=true'} or {'--foo', 'true'}
syntax.

Here this behaviour is changed: a 'boolean' argument does not assume an
explicitly passed value despite its position in an argument list. If a
'boolean' argument appears in the list, then argparse.parse() returns
`true` for its value (a list of `true` values in case of 'boolean+'
argument), otherwise it will not be added to the result.

This change also makes the behaviour of long (--foo) and short (-f)
'boolean' options consistent.

The motivation of the change is simple: it is easier and more natural to
type, say, `tarantoolctl cat --show-system 00000000000000000000.snap`
then `tarantoolctl cat --show-system true 00000000000000000000.snap`.

This commit adds several new test cases, but it does not mean that we
guarantee that the module behaviour will not be changed around some
corner cases, say, handling of 'boolean+' arguments. This is internal
module.

Follows up #4076.
Reviewed-by: Vladislav Shpilevoy <v.shpilevoy@tarantool.org>

Unverified

e47f2c91

refactoring: remove exceptions from ck_constraint_def_new_from_tuple · e7c64d41

Ilya Kosarev authored 5 years ago

ck_constraint_def_new_from_tuple is used in
on_replace_dd_ck_constraint therefore it has to be cleared from
exceptions. Now it doesn't throw any more. It's usages are updated.
Some _xc functions, not needed any more, are removed.

Part of #4247
Conflicts:
	src/box/alter.cc

e7c64d41

refactoring: remove exceptions from fk_constraint_check_dup_links · 6642a7f7

Ilya Kosarev authored 5 years ago

fk_constraint_check_dup_links is used in
on_replace_dd_fk_constraint therefore it has to be cleared from
exceptions. Now it doesn't throw any more. It's usages are updated.

Part of #4247

6642a7f7

fix: don't request absent tuple field · 25aedb01

Ilya Kosarev authored 5 years ago

During replacement of tuple_field_bool_xc with it's non-xc version
turned out that it might be called even if there is not enough fields
in processed tuple. Now it is fixed.

Part of #4247

25aedb01

Nov 13, 2019

refactoring: remove exceptions from fk_constraint_def_new_from_tuple · 470ccf6c

Ilya Kosarev authored 5 years ago

fk_constraint_def_new_from_tuple is used in
on_replace_dd_fk_constraint therefore it has to be cleared from
exceptions. Now it doesn't throw any more. It means we also need
to clear from exceptions it's subsidiary function: decode_fk_links.
Their usages are updated. Some _xc functions, not needed any more,
are removed.

Part of #4247

470ccf6c

refactoring: remove exceptions from coll_id_def_new_from_tuple · 716a9af7

Ilya Kosarev authored 5 years ago

coll_id_def_new_from_tuple is used in on_replace_dd_collation
therefore it has to be cleared from exceptions. Now it doesn't
throw any more. It's usages are updated.

Part of #4247

716a9af7

refactoring: remove exceptions from user_def_new_from_tuple · 180163c0

Ilya Kosarev authored 5 years ago

user_def_new_from_tuple is used in on_replace_dd_user &
user_cache_alter_user therefore it has to be cleared from
exceptions. Now it doesn't throw any more. It means we also need
to clear from exceptions it's subsidiary function:
user_def_fill_auth_data. Their usages are updated.

Part of #4247

180163c0

refactoring: remove exceptions from alter_space_new · 92b38e73
Ilya Kosarev authored 5 years ago
```
alter_space_new doesn't throw anymore. It's usages are updated.

Part of #4247
```
92b38e73

refactoring: remove exceptions from user_has_data · f4c8c08a

Ilya Kosarev authored 5 years ago

user_has_data is used in on_replace_dd_user therefore it has to be
cleared from exceptions. Now it doesn't throw any more. It means
we also need to clear from exceptions it's subsidiary function:
space_has_data. Their usages are updated.

Part of #4247

f4c8c08a

refactoring: remove exceptions from space_def_new_from_tuple · 0119367c

Ilya Kosarev authored 5 years ago

space_def_new_from_tuple is used in on_replace_dd_space therefore
it has to be cleared from exceptions. Now it doesn't throw any
more. It means we also need to clear from exceptions it's
subsidiary functions: space_opts_decode, field_def_decode &
space_format_decode. Their usages are updated.

Part of #4247

0119367c

Nov 12, 2019

json: fix assert typo in json_path_cmp() · d60e63d8

Nikita Pettik authored 5 years ago

   284  int
   285  json_path_cmp(const char *a, int a_len, const char *b, int b_len,
   286                int index_base)
   287  {

  ...

   304          /* Paths a and b must be valid. */
   305          assert(rc_b == 0 && rc_b == 0);

Obviously (according to the comment) author implied that both rc_a == 0
and rc_b == 0. Let's fix this small typo.

d60e63d8

tuple: account the whole array in field.data and size · 82913537

Vladislav Shpilevoy authored 5 years ago

Before the patch a struct xrow_update_field object didn't account
array header in its .size and .data members. Indeed, it was not
needed, because anyway updates could be only 'flat'.
For example, consider the tuple:

    [mp_array, mp_uint, mp_uint, mp_uint]
              ^                         ^
             pos1                      pos2

Struct xrow_update_field.size and .data accounted memory from
pos1 to pos2, without the array header. Number of fields was
stored inside a rope object. This is why it made no sense to keep
array header pointer.

But now updates are going to be not flat, and not only for array.
There will be an update tree. Each node of that tree will describe
update of some part of a tuple.

Some of the nodes will need to know exact borders of their
children, including headers. It is going to be used for fast
copying of neighbours of such children. Consider an example.

Tuple with one field consisting of nested maps:

    tuple = {}
    tuple[1] = {
        a = {
            b = {
                c = {
                    d = {1, 2, 3}
                }
            }
        }
    }

Update:

    {{'+', '[1].a.b.c.d[1]', 1}, {'+', '[1].a.b.c.d[2]', 1}}

To update such a tuple a simple tree will be built:

            root: [ [1] ]
                     |
 isolated path: [ 'a.b.c' ]
                     |
      leaves: [ [1] [2] [3] ]
                +1  +1   -

Root node keeps the whole tuple borders. It is a rope with single
field.
This single field is a deeply updated map. Such deep multiple
updates with long common prefixes are stored as an isolated path
+ map/array in the end. Here the isolated path is 'a.b.c'. It
ends with the terminal array update.

Assume, that operations are applied and it is time to save the
result. Save starts from the root.
Root rope will encode root array header, and will try to save the
single field. The single field is an isolated update. It needs to
save everything before old {1,2,3}, the new array {2,2,3}, and
everything after the old array. The simplest way to do it - know
exact borders of the old array {1,2,3} and memcpy all memory
before and after.

This is exactly what this patch allows to do. Everything before
xrow_update_field.data, and after xrow_update_field.data + .size
can be safely copied, and is not related to the field. To copy
adjacent memory it is not even needed to know field type.
Xrow_update_field.data and .size have the same meaning for all
field types.

Part of #1261

82913537

json: lexer_eof and token_cmp helper functions · 3e0d0600

Vladislav Shpilevoy authored 5 years ago

They are needed in incoming JSON updates, which are going to
solve a task of comparison of two JSON paths, their simultaneous
parsing, and digging into a tuple.

json_token_cmp() existed before this patch, but it was trying to
compare parent pointers too, which is not needed in the JSON
updates, since they won't use JSON trees.

Needed for #1261

3e0d0600

refactoring: use non _xc version of functions in triggers · b75d5f85

Ilya Kosarev authored 5 years ago

There were some _xc functions used in triggers. Now they all are
replaced with their non _xc versions. If corresponding _xc version
hadn't had any other usages, it was removed.

Part of #4247

b75d5f85

refactoring: remove exceptions from sequence_field_from_tuple · dc1a7315

Ilya Kosarev authored 5 years ago

sequence_field_from_tuple is used in set_space_sequence &
on_replace_dd_space_sequence therefore it has to be cleared from
exceptions. Now it doesn't throw any more. It's usages are updated.

Part of #4247

dc1a7315

refactoring: remove exceptions from sequence_def_new_from_tuple · fb76de9d

Ilya Kosarev authored 5 years ago

sequence_def_new_from_tuple is used in on_replace_dd_sequence
therefore it has to be cleared from exceptions. Now it doesn't
throw any more. It's usages are updated.

Part of #4247

fb76de9d

access: forbid to drop admin's universe access · 2de398ff

Vladislav Shpilevoy authored 5 years ago

Bootstrap and recovery work on behalf of admin. Without the
universe access they are not able to even fill system spaces with
data.

It is better to forbid this ability until someone made their
cluster unrecoverable.

2de398ff

replication: don't drop admin super privileges · 95237ac8

Vladislav Shpilevoy authored 5 years ago

The admin user has universal privileges before bootstrap or
recovery are done. That allows to, for example, bootstrap from a
remote master, because to do that the admin should be able to
insert into system spaces, such as _priv.

But after the patch on online credentials update was implemented
(#2763, 48d00b0e) the admin could
loose its universal access if, for example, a role was granted to
him before universal access was recovered.

That happened by two reasons:

    - Any change in access rights, even in granted roles, led to
      rebuild of universal access;

    - Any change in access rights updated the universal access in
      all existing sessions, thanks to #2763.

What happened: two tarantools were started. One of them master,
granted 'replication' role to admin. Second node, slave, tried to
bootstrap from the master. The slave created an admin session and
started loading data. After it loaded 'grant replication role to
admin' command, this nullified admin universal access everywhere,
including this session. Next rows could not be applied.

Closes #4606

95237ac8

Nov 11, 2019

test: FreeBSD is ready for tests with data segment · e6866550

Alexander V. Tikhonov authored 5 years ago

After the issue #4537 fixed for the data segment size limit,
the temporary blocked tests because of it unblocked.

Part of #4271

e6866550

Nov 09, 2019

refactoring: remove exceptions from func_def_new_from_tuple · 6085ffe5

Ilya Kosarev authored 5 years ago

func_def_new_from_tuple is used in on_replace_dd_func therefore
it has to be cleared from exceptions. Now it doesn't throw any
more. It means we also need to clear from exceptions it's
subsidiary function func_def_get_ids_from_tuple.
Their usages are updated.

Part of #4247

6085ffe5

refactoring: remove exceptions from index_def_new_from_tuple · 90ac0037

Ilya Kosarev authored 5 years ago

index_def_new_from_tuple is used in on_replace_dd_index therefore
it has to be cleared from exceptions. Now it doesn't throw any
more. It means we also need to clear from exceptions it's
subsidiary functions: index_def_check_sequence,
index_def_check_tuple, index_opts_decode, func_index_check_func.
Their usages are updated.

Part of #4247

90ac0037

lua: add fiber.top() listing fiber cpu consumption · 77fa45bd

Serge Petrenko authored 5 years ago

Implement a new function in Lua fiber library: top(). It returns a table
containing fiber cpu usage stats. The table has two entries:
"cpu_misses" and "cpu". "cpu" itself is a table listing all the alive
fibers and their cpu consumtion.
The patch relies on CPU timestamp counter to measure each fiber's time
share.

Closes #2694

@TarantoolBot document
Title: fiber: new function `fiber.top()`

`fiber.top()` returns a table of all alive fibers and lists their cpu
consumption. Let's take a look at the example:
```
tarantool> fiber.top()
---
- cpu:
    107/lua:
      instant: 30.967324490456
      time: 0.351821993
      average: 25.582738345233
    104/lua:
      instant: 9.6473633128437
      time: 0.110869897
      average: 7.9693406131877
    101/on_shutdown:
      instant: 0
      time: 0
      average: 0
    103/lua:
      instant: 9.8026528631511
      time: 0.112641118
      average: 18.138387232255
    106/lua:
      instant: 20.071174377224
      time: 0.226901357
      average: 17.077908441831
    102/interactive:
      instant: 0
      time: 9.6858e-05
      average: 0
    105/lua:
      instant: 9.2461986412164
      time: 0.10657528
      average: 7.7068458630827
    1/sched:
      instant: 20.265286315108
      time: 0.237095335
      average: 23.141537169257
  cpu_misses: 0
...

```
The two entries in a table returned by `fiber.top()` are
`cpu_misses` and `cpu`.

`cpu` itself is a table whose keys are strings containing fiber ids and
names.
The three metrics available for each fiber are:
1) instant (per cent),
which indicates the share of time fiber was executing during the
previous event loop iteration
2) average (per cent), which is calculated as an exponential moving
average of `instant` values over all previous event loop iterations.
3) time (seconds), which estimates how much cpu time each fiber spent
processing during its lifetime.

More info on `cpu_misses` field returned by `fiber.top()`:
`cpu_misses` indicates the amount of times tx thread detected it was
rescheduled on a different cpu core during the last event loop
iteration.
fiber.top() uses cpu timestamp counter to measure each fiber's execution
time. However, each cpu core may have its own counter value (you can
only rely on counter deltas if both measurements were taken on the same
core, otherwise the delta may even get negative).
When tx thread is rescheduled to a different cpu core, tarantool just
assumes cpu delta was zero for the latest measurement. This loweres
precision of our computations, so the bigger `cpu misses` value the
lower the precision of fiber.top() results.

Fiber.top() doesn't work on arm architecture at the moment.

Please note, that enabling fiber.top() slows down fiber switching by
about 15 per cent, so it is disabled by default.
To enable it you need to issue `fiber.top_enable()`.
You can disable it back after you finished debugging  using
`fiber.top_disable()`.
"Time" entry is also added to each fibers output in fiber.info()
(it duplicates "time" entry from fiber.top().cpu per fiber).
Note, that "time" is only counted while fiber.top is enabled.

77fa45bd