Commits · 856b4c3c1c666288cd2b4cec5a8c1382dcfcae8d · core / tarantool

Mar 16, 2020

sql: trust column duplicate check to box · 856b4c3c

Vladislav Shpilevoy authored 5 years ago


CREATE TABLE used to check column name duplicates before going to
box. But it is not necessary, because the same check is done by
box.

Reviewed-by: Nikita Pettik <korablev@tarantool.org>

856b4c3c

test: use test_run.grep_log in panic_on_broken_lsn · 54606356

Olga Arkhangelskaia authored 5 years ago


Use 'filename' grep_log() option instead of custom log search.

Reviewed-by: Alexander Turenko <alexander.turenko@tarantool.org>

Unverified

54606356

Mar 12, 2020

test: update test-run · c401880b

Alexander Turenko authored 5 years ago

Add filename option to grep_log() (#106).

The option can be used when it is not possible to grab 'box.cfg.log'
value from a tarantool instance: say, when the instance is stopped or
crashed.

Usage:

 | local test_run = require('test_run').new()
 | test_run:grep_log(node, what, bytes, {filename = <...>})

Unverified

c401880b

Mar 10, 2020

box: fix struct port_tuple.size wrong type in Lua · 8bea83ed

Vladislav Shpilevoy authored 5 years ago

Original port_tuple in C has 'int size;' field. It was
'size_t size' in Lua. Since sizeof(size_t) usually is
8, and sizeof(int) is 4, this was a really bad typo.

8bea83ed

memtx: fix out of memory handling for rtree · 3b4fbdc0

Olga Arkhangelskaia authored 5 years ago

When tarantool tries to recover rtree from a snapshot and memtx_memory value
is lower than it has been when the snapshot was created, server suffers from
segmentation fault. This happens because there is no out of memory error
handling in rtree lib. In another words, we do not check the result of
malloc operation.
The execution flow in case of recovery uses different way and the secondary
keys are build in batches. That way has no checks and reservations.
The patch adds memtx_rtree_index_reserve implementation to make sure that any
memory allocation in rtree will fail. Although this gives us no additional
optimization as in case of memtx_tree, the memory reservation prevents
tarantool from segmentation fault. If there is not enough memory to be reserved
server will fail gently with the "Failed to allocate" error message.

Closes #4619

3b4fbdc0

Mar 08, 2020

box: netbox.self and connect should work interchangeably · 0bcf9a59

Maria authored 5 years ago

Despite what was stated in the documentation, netbox.connect was not always
equivalent to netbox.self. In particular, they converted tuple to different
types - table and cdata respectively.

The patch fixes the issue and covers all cases where netbox.self and connect
perform conversion of types - e.g., for box.error.

Closes #4513

0bcf9a59

Mar 06, 2020

test: fix fio.test.lua flakiness · d8a9f1d9

Vladislav Shpilevoy authored 5 years ago

In 89c73e64 ("fio: respect
$TMPDIR in fio.tempdir(), when it is set") was added a test
checking that fio.tempdir() returns a path to a folder, stored
by a path specified in $TMPDIR environment variable.

Check was done by calling Lua returned_path:find(tmpdir_path).
If tmpdir path contained 'special' characters such as '.', it
didn't match, because string.find() takes a regular expression,
not just a string.

string.startswith() works fine.

Follow-up #4794

d8a9f1d9

Hotfix for : don't account :execute() call twice · c05f5d4a

Maria authored 5 years ago

The patch fixes a bug for the commit b0f588f6 where statistics on
box.execute was collected twice. This happened because
sql_prepare_and_execute called sql_execute under the hood, so there's
no need to do rmean_collect in both of them.

Follow-up #4756

c05f5d4a

Mar 05, 2020

lua: expose checktuple function · 82cc5631

Chris Sosnin authored 5 years ago

All lua types feature check, push and is functions. We expose lua_checktuple
for full consistency.

Closes #2553

82cc5631

luajit: bump new version · 12694d72
Kirill Yukhin authored 5 years ago
```
- Make string to number conversions fail on NUL char.
```
12694d72

luajit: bump new version · a81453b9

Kirill Yukhin authored 5 years ago

- gdb: adjust the extension to be used with Python 2
- gdb: introduce luajit-gdb extension

a81453b9

fio: respect $TMPDIR in fio.tempdir(), when it is set · 89c73e64

Vladislav Shpilevoy authored 5 years ago

TMPDIR is an environment variable used to tell what a directory
should be used to create temporary files. It is described in the
POSIX standard, and should be used by programs creating temporary
files.

Closes #4794

@TarantoolBot document
Title: fio.tempdir() $TMPDIR

fio.tempdir() stores created temporary directory into /tmp by
default. This can be changed by setting TMPDIR environment
variable. Before starting Tarantool, or at runtime by
os.setenv().

89c73e64

build: link bundled libcurl with c-ares · 23837076

Serge Petrenko authored 5 years ago

libcurl has a built-in threaded resolver used for asynchronous DNS
requests, however, when DNS server is slow to respond, the request still
hangs tarantool until it is finished. The reason is that curl calls
thread_join on the resolving thread internally upon timeout, making the
calling thread hang until resolution has ended.
Use c-ares as an asynchronous resolver instead to eliminate the problem.

Closes #4591

23837076

box: replication shouldn't leak user password · a806549d

Maria authored 5 years ago

It was possible to leak user password through setting 'replication'
configuration option in first box.cfg invocation. This happened due
to unconditional logging in load_cfg function. The patch introduces
conditional logging.

Closes #4493

a806549d

Mar 04, 2020

sql: support constraint drop · 85adac03

Roman Khabibov authored 5 years ago

Extend <ALTER TABLE> statement to drop table constraints by their
names.

Closes #4120

@TarantoolBot document
Title: Drop table constraints in SQL
Now, it is possible to drop table constraints (PRIMARY KEY,
UNIQUE, FOREIGN KEY, CHECK) using
<ALTER TABLE table_name DROP CONSTRAINT constraint_name> statement
by their names.

For example:

tarantool> box.execute([[CREATE TABLE test (
                             a INTEGER PRIMARY KEY,
                             b INTEGER,
                             CONSTRAINT cnstr CHECK (a >= 0)
                        );]])
---
- row_count: 1
...

tarantool> box.execute('ALTER TABLE test DROP CONSTRAINT cnstr;')
---
- row_count: 1
...

The same for all the other constraints.

85adac03

sql: don't select from _index during parsing · 4bdcb3fa

Roman Khabibov authored 5 years ago

Remove function box_index_by_name() from parser to avoid selects
during parsing. Add the ability to choose index during VDBE code
compilation which will be used to find the tuple to drop from a
system space.

Needed for #4120

4bdcb3fa

sql: improve "no such constraint" error message · 7d558ae8

Roman Khabibov authored 5 years ago

Clarify the error message for better user handling. Add the name
of space where the constraint under dropping wasn't founded.

Part of #4120

7d558ae8

Mar 03, 2020

replication: fix rebootstrap in case the instance is listed in box.cfg.replication · dbcfaf70

Serge Petrenko authored 5 years ago

When checking wheter rejoin is needed, replica loops through all the
instances in box.cfg.replication, which makes it believe that there is a
master holding files, needed by it, since it accounts itself just like
all other instances.
So make replica skip itself when finding an instance which holds files
needed by it, and determining whether rebootstrap is needed.

We already have a working test for the issue, it missed the issue due to
replica.lua replication settings. Fix replica.lua to optionally include
itself in box.cfg.replication, so that the corresponding test works
correctly.

Closes #4759

dbcfaf70

Mar 02, 2020

replication: do not relay rows coming from a remote instance back to it · ed2e1430

Serge Petrenko authored 5 years ago

We have a mechanism for restoring rows originating from an instance that
suffered a sudden power loss: remote masters resend the isntance's rows
received before a certain point in time, defined by remote master vclock
at the moment of subscribe.
However, this is useful only on initial replication configuraiton, when
an instance has just recovered, so that it can receive what it has
relayed but haven't synced to disk.
In other cases, when an instance is operating normally and master-master
replication is configured, the mechanism described above may lead to
instance re-applying instance's own rows, coming from a master it has just
subscribed to.
To fix the problem do not relay rows coming from a remote instance, if
the instance has already recovered.

Closes #4739

ed2e1430

replication: implement an instance id filter for relay · 45de9907

Serge Petrenko authored 5 years ago

Add a filter for relay to skip rows coming from unwanted instances.
A list of instance ids whose rows replica doesn't want to fetch is encoded
together with SUBSCRIBE request after a freshly introduced flag IPROTO_ID_FILTER.

Filtering rows is needed to prevent an instance from fetching its own
rows from a remote master, which is useful on initial configuration and
harmful on resubscribe.

Prerequisite #4739, #3294

@TarantoolBot document

Title: document new binary protocol key and subscribe request changes

Add key `IPROTO_ID_FILTER = 0x51` to the internals reference.
This is an optional key used in SUBSCRIBE request followed by an array
of ids of instances whose rows won't be relayed to the replica.

SUBSCRIBE request is supplemented with an optional field of the
following structure:
```
+====================+
|      ID_FILTER     |
|   0x51 : ID LIST   |
| MP_INT : MP_ARRRAY |
|                    |
+====================+
```
The field is encoded only when the id list is not empty.

45de9907

wal: warn when trying to write a record with a broken lsn · e0750262

Serge Petrenko authored 5 years ago

There is an assertion in vclock_follow `lsn > prev_lsn`, which doesn't
fire in release builds, of course. Let's at least warn the user on an
attempt to write a record with a duplicate or otherwise broken lsn, and
not follow such an lsn.

Follow-up #4739

e0750262

box: expose box_is_orphan method · 7b83b73d

Serge Petrenko authored 5 years ago

is_orphan status check is needed by applier in order to tell relay
whether to send the instance's own rows back or not.

Prerequisite #4739

7b83b73d

Feb 28, 2020

Revert "test: unit/popen" · 5e5d5a4a

Alexander Turenko authored 5 years ago

Found another problem with the test:

 | /builds/DtQXhC5e/0/tarantool/tarantool/test/unit/popen.c:63:6:
 | error: variable 'rc' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized]
 |        if (handle == NULL)
 |            ^~~~~~~~~~~~~~

Decided to revert it and fix in background.

This reverts commit 40a51647.

Unverified

5e5d5a4a

Revert "test: disable popen.test" · 7ae50146

Alexander Turenko authored 5 years ago

Found another problem with the test:

 | /builds/DtQXhC5e/0/tarantool/tarantool/test/unit/popen.c:63:6:
 | error: variable 'rc' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized]
 |        if (handle == NULL)
 |            ^~~~~~~~~~~~~~

Decided to revent the test and so revent its disabling.

This reverts commit bceaf05c.

Unverified

7ae50146

test: disable popen.test · bceaf05c

Cyrill Gorcunov authored 5 years ago


This test is buggy, need to rewrite. Thus to not block other developers
which refer the master branch just disable it.

The test was added in 40a51647 ('test:
unit/popen').

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Alexander Turenko <alexander.turenko@tarantool.org>

Unverified

bceaf05c

popen: fix empty envp type for nonlinux builds · 5c0420ca

Cyrill Gorcunov authored 5 years ago


Fix for commit f58cb606 ('popen:
introduce a backend engine').

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: Alexander Turenko <alexander.turenko@tarantool.org>

Unverified

5c0420ca

Feb 27, 2020

test: unit/popen · 40a51647

Cyrill Gorcunov authored 5 years ago


Basic tests for popen engine

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

40a51647

popen: introduce a backend engine · f58cb606

Cyrill Gorcunov authored 5 years ago


In the patch we introduce popen backend engine which provides
a way to execute external programs and communicate with their
stdin/stdout/stderr streams.

It is possible to run a child process with:

 a) completely closed stdX descriptors
 b) provide /dev/null descriptors to appropritate stdX
 c) pass new transport into a child (currently we use
    pipes for this sake, but may extend to tty/sockets)
 d) inherit stdX from a parent, iow do nothing

On tarantool start we create @popen_pids_map hash which maps
created processes PIDs to popen_handle structure, this structure
keeps everything needed to control and communicate with the children.
The hash will allow us to find a hild process quickly from inside
of a signal handler.

Each handle links into @popen_head list, which is need to be able
to destory children processes on exit procedure (ie when we exit
tarantool and need to cleanup the resources used).

Every new process is born by vfork() call - we can't use fork()
because of at_fork() handlers in libeio which cause deadlocking
in internal mutex usage. Thus the caller waits until vfork()
finishes its work and runs exec (or exit with error).

Because children processes are running without any limitations
they can exit by self or can be killed by some other third side
(say user of a hw node), we need to watch their state which is
done by setting a hook with ev_child_start() helper. This helper
allows us to catch SIGCHLD when a child get exited/signaled
and unregister it from a pool or currently running children.
Note the libev wait() reaps child zomby by self. Another
interesting detail is that libev catches signal in async way
but our SIGCHLD hook is called in sync way before child reap.

This engine provides the following API:
 - popen_init
	to initialize engine
 - popen_free
	to finalize engine and free all reasources
	allocated so far
 - popen_new
	to create a new child process and start it
 - popen_delete
	to release resources occupied and
	terminate a child process
 - popen_stat
	to fetch statistics about a child process
 - popen_command
	to fetch command line string formerly used
	on the popen object creation
 - popen_write_timeout
	to write data into child's stdin with
	timeout
 - popen_read_timeout
	to read data from child's stdout/stderr
	with timeout
 - popen_state
	to fetch state (alive, exited or killed) and
	exit code of a child process
 - popen_state_str
	to get state of a child process in string
	form, for Lua usage mostly
 - popen_send_signal
	to send signal to a child process (for
	example to kill it)

Known issues to fix in next series:

 - environment variables for non-linux systems do not support
   inheritance for now due to lack of testing on my side;

 - for linux base systems we use popen2 system call passing
   O_CLOEXEC flag so that two concurrent popen_create calls
   would not affect each other with pipes inheritance (while
   currently we don't have a case where concurrent calls could
   be done as far as I know, still better to be on a safe side
   from the beginning);

 - there are some files (such as xlog) which tarantool opens
   for own needs without setting O_CLOEXEC flag and it get
   propagated to a children process; for linux based systems
   we use close_inherited_fds helper which walks over opened
   files of a process and close them. But for other targets
   like MachO or FreeBSD this helper just zapped simply because
   I don't have such machines to experimant with; we should
   investigate this moment in more details later once base
   code is merged in;

 - need to consider a case where we will be using piping for
   descriptors (for example we might be writting into stdin
   of a child from another pipe, for this sake we could use
   splice() syscall which gonna be a way faster than copying
   data inside kernel between process). Still the question
   is -- do we really need it? Since we use interanal flags
   in popen handle this should not be a big problem to extend
   this interfaces;

   this particular feature is considered to have a very low
   priority but I left it here just to not forget.

Part-of #4031

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

f58cb606

coio: export helpers · c81d9aa4

Cyrill Gorcunov authored 5 years ago


There is no reason to hide functions. In particular
we will use these helpers in popen code.

Part-of #4031

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

c81d9aa4

test: add clean up for box/access test · e33216af

Maria authored 5 years ago

The commit e8009f41 ('box: user.grant
error should be versatile') did not do proper clean-up: it grants
non-default privileges for user 'guest' and does not revoke them at the
end. That caused occasional failures of other tests, all with the same
error saying user 'guest' already had access on universe.

This case should be handled by test-run in a future, see [1].

[1]: https://github.com/tarantool/test-run/issues/156

Follows up #714

Unverified

e33216af

test: stabilize flaky fiber memory leak detection · d6cf327f

Alexander Turenko authored 5 years ago

After #4736 regression fix (in fact it just reverts the new logic in
small) it is possible again that a fiber's region may hold a memory for
a while, but release it eventually. When the used memory exceeds 128 KiB
threshold, fiber_gc() puts 'garbage' slabs back to slab_cache and
subtracts them from region_used() metric. But until this point those
slabs are accounted in region_used() and so in fiber.info() metrics.

This commit fixes flakiness of test cases of the following kind:

 | fiber.info()[fiber.self().id()].memory.used -- should be zero
 | <...workload...>
 | fiber.info()[fiber.self().id()].memory.used -- should be zero

The problem is that the first `<...>.memory.used` value may be non-zero.
It depends of previous tests that were executed on this tarantool
instance.

The obvious way to solve it would be print differences between
`<...>.memory.used` values before and after a workload instead of
absolute values. This however does not work, because a first slab in a
region can be almost used at the point where a test case starts and a
next slab will be acquired from a slab_cache. This means that the
previous slab will become a 'garbage' and will not be collected until
128 KiB threshold will exceed: the latter `<...>.memory.used` check will
return a bigger value than the former one. However, if the threshold
will be reached during the workload, the latter check may show lesser
value than the former one. In short, the test case would be unstable
after this change.

It is resolved by restarting of a tarantool instance before such test
cases to ensure that there are no 'garbage' slabs in a current fiber's
region.

Note: This works only if a test case reserves only one slab at the
moment: otherwise some memory may be hold after the case (and so a
memory check after a workload will fail). However it seems that our
cases are small enough to don't trigger this situation.

Call of region_free() would be enough, but we have no Lua API for it.

Fixes #4750.

d6cf327f

Feb 25, 2020

box: sql prepare and execute statistics should be reflected in box.stat() · 9d7686fc

Maria authored 5 years ago

Calling prepare and execute did not update corresponding request statistics
in the box.stat table. This happened because methods that collect stats were
never called where they should have been.

Closes #4756

9d7686fc

box: user.grant error should be versatile · e8009f41

Maria authored 5 years ago

Error message on granted privileges was not flexible and
did not distinguish between universal or any other privileges
leaving either 'nil' or simply '' at the end.

Closes #714

e8009f41

Feb 24, 2020

upgrade: fix generated sequence upgrade from 2.1 · 6d45a41e

Vladislav Shpilevoy authored 5 years ago

The bug was in an attempt to update a record in _space_sequence
in-place, to add field path and number. This was not properly
supported by the system space's trigger, and was banned in the
previous patch of this series.

But delete + tuple update + insert work fine. The patch uses them.

To test it the old disabled and heavily outdated
xlog/upgrade.test.lua was replaced with a smaller analogue, which
is supposed to be created separately for each upgrade bug.
According to the new policy of creating test files.

The patch tries to make it easy to add new upgrade tests and
snapshots. A new test should consist of fill.lua script to
populate spaces, snapshot, needed xlogs, and a .test.lua file.
Fill script and binaries should be in the same folder as test file
name, which is located in version folder. Like this:

 xlog/
 |
 + <test_name>.test.lua
 |
 +- upgrade/
    |
    +- <version>/
    |   |
    |   +-<test_name>/
    |     |
    |     +- fill.lua
    |     +- *.snap
    |     +- *.xlog

Version is supposed to say explicitly what a version files in
there have.

Closes #4771

6d45a41e

box: forbid to update/replace _space_sequence · 1a84b80e

Vladislav Shpilevoy authored 5 years ago

Anyway this does not work for generated sequences. A proper
support of update would complicate the code and won't give
anything useful.

Part of #4771

1a84b80e

upgrade: add missing sys triggers off and erasure · e1c7d25f

Vladislav Shpilevoy authored 5 years ago

box.internal.bootstrap() before doing anything turns off system
space triggers, because it is likely to do some hard changes
violating existing rules. And eliminates data from all system
spaces to fill it from the scratch.

Each time when a new space is added, its erasure and turning off
its triggers should have been called explicitly here. As a result
it was not done sometimes, by accident. For example, triggers
were not turned off for _sequence_data, _sequence,
_space_sequence.

Content removal wasn't done for _space_sequence.

The patch makes a generic solution which does not require manual
patching of trigger manipulation and truncation anymore.

The bug was discovered while working on #4771, although it is not
related.

e1c7d25f

fiber: leak slab if unable to bring prots back · 8d53fadc

Cyrill Gorcunov authored 5 years ago


In case if we unable to revert guard page back to
read|write we should never use such slab again.

Initially I thought of just put panic here and
exit but it is too destructive. I think better
print an error and continue. If node admin ignore
this message then one moment at future there won't
be slab left for use and creating new fibers get
prohibited.

In future (hopefully near one) we plan to drop
guard pages to prevent VMA fracturing and use
stack marks instead.

Reviewed-by: Alexander Turenko <alexander.turenko@tarantool.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

8d53fadc

fiber: set diagnostics at madvise/mprotect failure · c6752297

Cyrill Gorcunov authored 5 years ago


Both madvise and mprotect calls can fail due to various
reasons, mostly because of lack of free memory in the
system.

We log such cases via say_x helpers but this is not enough.
In particular tarantool/memcached relies on diag error to be
set to detect an error condition:

 | expire_fiber = fiber_new(name, memcached_expire_loop);
 | const box_error_t *err = box_error_last();
 | if (err) {
 |	say_error("Can't start the expire fiber");
 |	say_error("%s", box_error_message(err));
 |	return -1;
 | }

Thus lets use diag_set() helper here and instead of macros
use inline functions for better readability.

Fixes #4722

Reported-by: Alexander Turenko <alexander.turenko@tarantool.org>
Reviewed-by: Alexander Turenko <alexander.turenko@tarantool.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>

c6752297

app: verify unix socket path length in socket.tcp_server() · aae93514

Chris Sosnin authored 5 years ago

Providing socket pathname longer than UNIX_PATH_MAX to socket.tcp_server()
will not cause any error, lbox_socket_local_resolve will just truncate the
name according to the limit, causing bad behavior (tarantool will try to
access a socket, which doesn't exist). Thus, let's verify, that pathname
can fit into buffer.

Closes #4634

aae93514

Feb 21, 2020

gitlab-ci: adjust base URL of RPM/Deb repositories · 4dee6890

Alexander V. Tikhonov authored 5 years ago

Our S3 based repositories now reflect packagecloud.io repositories
structure.

It will allow us to migrate from packagecloud.io w/o much complicating
redirection rules on a web server serving download.tarantool.org.

Deploy source packages (*.src.rpm) into separate 'SRPM' repository
like packagecloud.io does.

Changed repository signing key from its subkey to public and moved it
to gitlab-ci environment.

Follows up #3380

Unverified

4dee6890