Commits · 5a586fda2c06ec0154e625cf56d3b83a13c312f8 · core / tarantool

Aug 29, 2019

travis-ci: remove obsolete jobs on travis · 5a586fda

Alexander V. Tikhonov authored 5 years ago

Removed the jobs that is interesting to check on gitlab-ci
instead of travis-ci:
  - osx 13 "Sierra"
  - all LTO jobs
  - ASAN job

Part of #4410

(cherry picked from commit 05477787)

Unverified

5a586fda

Destroy port after iproto eval if transaction isn't finished · 4792f256
Georgy Kirichenko authored 5 years ago
```
This is a followup for 7691154a

(cherry picked from commit ff14626d)
```
4792f256

Iproto call won't leak if transaction isn't committed · 585cd458

Maria Khaydich authored 5 years ago

In case of throwing client error because of inactive function
we did not destroy used port. It could possibly cause huge
memory leaks as could be seen with top or its analogues when
performing net.box test run in a loop.

Closes #4388

(cherry picked from commit 7691154a)

585cd458

Aug 28, 2019

Add missing test for _say patch · 8e2d5551

Kirill Yukhin authored 5 years ago

It turned out that patch d0e38d59 wasn't accompanied
w/ a test. Add proper check.

(cherry picked from commit c80e9416)

8e2d5551

replication: enter orphan mode on manual replication configuration chage · cd627f26

Serge Petrenko authored 5 years ago

Currently we only enter orphan mode when instance fails to connect to
replication_connect_quorum remote instances during local recovery.
On bootstrap and manual replication configuration change an error is
thrown. We better enter orphan mode on manual config change, and leave
it only in case we managed to sync with replication_connect_quorum
instances.

Closes #4424

@TarantoolBot document
Title: document reaction on error in replication configuration change.

Now when issuing `box.cfg{replication={uri1, uri2, ...}}` and failing to
sync with replication_connect_quorum remote instances, the server will
throw an error, if it is bootstrapping, and just set its state to orphan
in all other cases (recovering from existing xlog/snap files or manually
changing box.cfg.replication on the fly). To leave orphan mode, you may
wait until the server manages to sync with replication_connect_quorum
instances.
In order to leave orphan mode you need to make the server sync with
enough instances. To do so, you may either:
1) set replication_connect_quorum to a lower value
2) reset box.cfg.replication to exclude instances that cannot
   be reached or synced with
3) just set box.cfg.replication to "" (empty string)

(cherry picked from commit 5a0cfe02)

cd627f26

replication: do not abort replication on ER_UNKNOWN_REPLICA · 0976d20a

Konstantin Osipov authored 5 years ago

In 3-node replica-set, registering with the leader node
does not guarantee the registration record arrives to the second
peer immediately. The joining node may bootstrap faster than
the registration record arrives to the second peer, in which
case replication will fail to create a full mesh.

(cherry picked from commit a850acfd)

0976d20a

systemd: Do nothing if NOTIFY_SOCKET env variable is not set. · 9652c757
Konstantin Osipov authored 5 years ago
```
A follow up on gh-4305, fix failing args.test.py

(cherry picked from commit 83ef5a17)
```
Unverified

9652c757
Fix build failure on Linux. · 4db10976
Konstantin Osipov authored 5 years ago
```
(cherry picked from commit ef14050f)
```
4db10976

Aug 27, 2019

Enable support for NOTIFY_SOCKET in envs without systemd · 0641cc7b

Max Melentiev authored 5 years ago

To make it possible to develop and test related features on
systems without systemd.

WITH_SYSTEMD cmake flag is used to generate systemd related files:
unit, generator script, etc. To keep this behavior and make it possible
to use NOTIFY_SOCKET without other systemd-related stuff,
I added WITH_NOTIFY_SOCKET cmake flag.

It also required some changes to support other OS:

SOCK_CLOEXEC (not available on macOS) flag for socket()
is replaced with `fcntl(fd, F_SETFD, FD_CLOEXEC)` which has the same effect.

MSG_NOSIGNAL flag for sendmsg is also not available on macOS.
However it has SO_NOSIGPIPE flag for setsockopt which disables SIGPIPE.
So it requires different solution for different OS. Inspired by
https://nwat.xyz/blog/2014/01/16/porting-msg_more-and-msg_nosigpipe-to-osx/

Have to reduce send-buffer size to 4MB because larger values
are not supported on macOS by default. This value should be enough
for all systems because notification messages are usually less than 1KB.

Fixes #4436

(cherry picked from commit 1e509dde)

0641cc7b

build: fix linking with static openssl library · b1b7fdf9

Alexander Turenko authored 5 years ago

System-wide dynamic libraries usually (always?) have NEEDED and RUNPATH
tags in a dynamic section (as `readelf -d /usr/lib/lib<...>.so` shows),
so when we link, say, with libssl.so, which depends on libz.so, a linker
does not complain against unresolved symbols that can be found in Z
library (if it is installed within a system).

Things are different when we linking with a static library. Say, when we
linking with libssl.a, which contains an unresolved symbol from Z
library, a linker reports an error. It is not possible to store an
information where to find unresolved symbols (NEEDED / RUNPATH) in a
static library (AFAIK).

We depend on three libraries that are depend on Z library: libcurl,
libssl and libcrypto (two latter are part of OpenSSL). When one of those
libraries is linked statically we should link with libz.so or libz.a
(depending on BUILD_STATIC flag). The patch doing exactly this.

The patch changes OPENSSL_LIBRARIES variable to fix the issue with
static linking of OpenSSL libraries. It also changes CURL_LIBRARIES in
the same way, however this does not alter any visible behaviour, because
OPENSSL_LIBRARIES is added to CURL_LIBRARIES. The latter change was made
to unify the way to choose libraries to link with: it is pure
refactoring part.

Fixes #4437.

(cherry picked from commit 2cdfaf3b)

b1b7fdf9

Use forked tarantool/curl repository · 84af355d

Alexander Turenko authored 5 years ago

A manual action is needed after pulling of this commit to actually use
the downstream repository instead of the upstream one:

sed -i -e 's@https://github.com/curl/curl.git@https://github.com/tarantool/curl.git@' .git/config

It is part of our processes to use forked repositories for submodules.

Suggested by Konstantin Osipov.

(cherry picked from commit 5a46639f)

84af355d

gitlab-ci: clean up .gitlab-ci.yml · 52a8f5c4

Alexander V. Tikhonov authored 5 years ago

Cleaned up the .gitlab-ci.yml file from duplicating
code - added templates that storres the needed
configuration values for different jobs.
Also moved the static_build from 'deploy' tag to
'deploy_test' tag to be sure that tests will not
be run under high load.

52a8f5c4

test: update test-run · e1cd1639

Alexander Turenko authored 5 years ago

* Added "fragile" option with a list of tests that are not intended to
  be run in parallel with others. The format of the list is the same as
  for "disabled" option (#187, PR #188).
* Ensured that non-parallel test suites and fragile tests will be run
  only when all parallel ones will be finished (PR #188).
* Fixed reporting in case of lack of temporary result (PR #172).

(cherry picked from commit 418ab172)

Unverified

e1cd1639

Aug 26, 2019

lua: pwd: fix passwd and group traversal · 1c9d817a

Alexander Turenko authored 5 years ago

CentOS 6 and FreeBSD 12 implementations of getpwuid() rewind
setpwent()-{getpwent()}-endpwent() loop to a start that leads to a hang
during pwd.getpwall() invoke. The same is true for a getgrgid() call
during setgrent()-{getgrent()}-endgrent() loop.

The commit modifies pwd module to avoid getpwuid() calls during passwd
database traversal and to avoid getgrgid() calls when traversing groups.

The commit also fixes the important regression on CentOS 6 after
f5d8331e ('lua: workaround
pwd.getpwall() issue on Fedora 29'): tarantool hungs during startup due
to added getpwall() call. This made tarantool unusable on CentOS 6 at
all.

Aside of that the commit fixes another pwd.getgrall() problem: the
function gaves password entries instead of group entries.

Fixes #4428.
Fixes #4447.
Part of #4271.

(cherry picked from commit 7753d350)

1c9d817a

Aug 23, 2019

test: tarantoolctl: verify delayed box.cfg() · 3e4dea42

Alexander Turenko authored 5 years ago

app-tap.tarantoolctl.test.lua fails after
17df9edf ('tarantoolctl: allow to start
instances with delayed box.cfg{}').

The commit fixes the test case that did check that an error is reported
if box.cfg() was not called in an instance script.

Follows up #4435.
Fixes #4448.

(cherry picked from commit 6c627af3)

Unverified

3e4dea42

Aug 22, 2019

feedback: unify payload generation logic · d66ca2c1

Yaroslav Dynnikov authored 5 years ago

This change is related to #4391. The objective was to collect additional
information about modules, but it's hard to do without changing API.

This patch will allow to monkey-patch report generation and achieve the
same results without interfering the daemon behavior.

(cherry picked from commit 5b9f207d)

d66ca2c1

systemd: replace sendmsg with sendto shortcut · 1cbd136a

Max Melentiev authored 5 years ago

There is a problem with calculating .msg_namelen field
of msghdr struct. Instead of

    .msg_name   = &sa,
    .msg_namelen = sizeof(sa.sun_family) + strlen(sd_unix_path),

it must set as

    .msg_namelen = sizeof(sa) // larger value than current invalid one

It works on linux but when I tried to enable this feature for macOS
it didn't (maybe because of different order of fields in the struct).

Instead of fixing calculation, I've replaced original sendmsg call
with sendto, because it's a convenient shortcut which
simplifies code and can prevent such mistakes.

Required for #4436

(cherry picked from commit 89aae30c)

1cbd136a

tarantoolctl: allow to start instances with delayed box.cfg{} · b6e7eb4b

Max Melentiev authored 5 years ago

`tarantoolctl start` patches box.cfg two times:
1) before the init script to set default values and enforce some others,
2) after the init script to prevent changing a pid_file in runtime.

The second patching fails if an init file does not call
box.cfg{} before it's finished. This can take a place in apps with
managed instances which receive configuration from external server.

This patch moves the second patching into the box.cfg
wrapper created during the first patching. So the second patching
is performed only after box.cfg{} was invoked, so it does not fail anymore.

However there is relatively minor flaw for applications that
invoke box.cfg{} after init script is finished:
`tarantoolctl start` goes to background only when box.cfg{} is called.
Though this is not the case for daemon management systems like systemd,
as they handle backgrounding on their side

Fixes #4435

@TarantoolBot document
Title: tarantoolctl allows to start instances without a box.cfg{}

tarantoolctl now works for instances without box.cfg{} or
with delayed box.cfg{} call. This can be managed instances which receive
configuration from external server.

For such instances `tarantoolctl start` goes to background when
box.cfg{} is called, so it will wait until options for box.cfg are received.
However this is not the case for daemon management systems like systemd,
as they handle backgrounding on their side.

b6e7eb4b

Move mp_compare_double_uint64() to trivia/util.h · 75b14a35

Nikita Pettik authored 5 years ago

This function implements common way of precise comparison between
unsigned integer and floating point values (doubles). Currently, it is
used in tuple comparators, but we need the same thing in SQL. Hence,
let's move it to header containing set of utilities.

(cherry picked from commit 72dbeb0e)

75b14a35

sql: use double_compare_uint64() for int<->float cmp · 12431ed4

Nikita Pettik authored 5 years ago

To compare floating point values and integers in SQL functions
compare_uint_float() and compare_int_float() are used. Unfortunately,
they contain bug connected with checking border case: that's not correct
to cast UINT64_MAX (2^64 - 1) to double. Proper way is to use exp2(2^64)
or predefined floating point constant. To not bother fixing function
which in turn may contain other tricky places, let's use instead already
verified double_compare_uint64(). So that we have unified way of
integer<->float comparison.

(cherry picked from commit 73a4a525)

12431ed4

gitlab-ci: fix building of Debian Buster image · 882068ee

Alexander Turenko authored 5 years ago

`apt-get update <...>` fails on Debian Buster on docker_bootstrap goal
(see #4331 for the similar issue).

Added a description how to change dependencies in .travis.mk.

(cherry picked from commit 45c2576d)

882068ee

txn: erase old savepoint in case of name collision · 03cbc365

Nikita Pettik authored 5 years ago

Name duplicates are allowed for savepoints (both in our SQL
implementation and in ANSI specification). ANSI SQL states that previous
savepoint should be deleted. What is more, our doc confirms this fact
and says that "...it is released before the new savepoint is set."
Unfortunately, it's not true - currently old savepoint remains in the
list. For instance:

SAVEPOINT t;
SAVEPOINT t;
RELEASE SAVEPOINT t;
RELEASE SAVEPOINT t; -- no error is raised

Let's fix this and remove old savepoint from the list.

(cherry picked from commit 8b8b6895)

03cbc365

sql: use struct txn_savepoint as anonymous savepoint · 12619335

Nikita Pettik authored 5 years ago

This allows us to completely remove SQL specific struct Savepoint and
use instead original struct txn_savepoint.

(cherry picked from commit 0a92ec7e)

12619335

txn: merge struct sql_txn into struct txn · c9046042

Nikita Pettik authored 5 years ago

This procedure is processed in several steps. Firstly, we add name
to struct txn_savepoint since we should be capable of operating on named
savepoints (which in turn is SQL feature). Still, anonymous (in the sense
of name absence) savepoints are also valid. Then, we add list (as
implementation of stailq) of savepoints to struct txn: it allows us to
find savepoint by its name. Finally, we patch rollback to/release
savepoint routines: for rollback tail of the list containing savepoints
is cut (but subject of rollback routine remains in the list); for
release routine we cut tail including node being released.

(cherry picked from commit 56096ff2)

c9046042

txn: move fk_deferred_count from psql_txn to txn · 8b45c38d

Nikita Pettik authored 5 years ago

We are going to merge struct psql_txn with struct txn as a part of SQL
integration into NoSQL, so let's move counter of deferred foreign key
violations directly to struct txn.

(cherry picked from commit 5259274d)

8b45c38d

Aug 21, 2019

build: link libcurl statically from a submodule · 5fcca9dd

Mergen Imeev authored 5 years ago

Hold libcurl-7.65.3. This version is not affected by the following
issues:

* #4180 ('httpc: redirects are broken with libcurl-7.30 and older');
* #4389 ('libcurl memory leak');
* #4397 ('HTTPS seem to be unstable').

After this patch libcurl will be statically linked when
ENABLE_BUNDLED_LIBCURL option is set. This option is set by default.

Closes #4318

@TarantoolBot document
Title: Tarantool dependency list was changed

* Added build dependencies: autoconf, automake, libtool, zlib-devel
  (zlib1g-dev on Debian).
* Added runtime dependencies: zlib (zlib1g on Debian).
* Removed build dependencies: libcurl-devel (libcurl4-openssl-dev on
  Debian).
* Removed runtime dependencies: curl.

The reason is that now we use compiled-in libcurl: so we don't depend on
a system libcurl, but inherit its dependencies.

(cherry picked from commit 7e51aebb)

5fcca9dd

lua: workaround pwd.getpwall() issue on Fedora 29 · 5806c124

Alexander Turenko authored 5 years ago

This is a workaround for systemd-nss issue:
https://github.com/systemd/systemd/issues/9585

The following error is observed on app-tap/pwd.test.lua on Fedora 29
(glibc-2.28-26.fc29, systemd-239-12.git8bca462.fc29) when tarantool is
linked with libcurl w/o GSS-API support:

 | builtin/pwd.lua:169: getpwall failed [errno 2]: No such file or directory

Such tarantool build lacks of libselinux.so.1 transitive dependency
(tarantool -> libcurl.so.4 -> libgssapi_krb5.so.2 -> libkrb5support.so.0
-> libselinux.so.1) and strace shows the following calls when
pwd.getpwall() is invoked first time:

 | openat(AT_FDCWD, "/lib64/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 7A
 | <...>
 | access("/etc/selinux/config", F_OK)     = -1 ENOENT (No such file or directory)

It looks like a part of libselinux initialization code and is invoked
during execution of a last ffi.C.getpwent() call that returns `nil` as a
result and left errno set to ENOENT. Our pwd module set errno to zero
before getpwent() call and expects that it will be preserved if no
unrecoverable errors occur. It seems that this expectation is not meet
due to the systemd-nss issue linked above.

Second and next getpwall() calls will succeed, so the commit adds an
extra getpwall() during pwd module load. This workaround is disabled on
FreeBSD due to another issue: #4428 ('getpwall() hangs on FreeBSD 12').

See also the previous related commit:
efccac69 ('lua: fix error handling in
getpwall and getgrall').

Follows up #3766.
Part of #4318.

(cherry picked from commit f5d8331e)

5806c124

gitlab-ci: add static build · 100e3028

Alexander V. Tikhonov authored 5 years ago

Added static build using Dockerfile on Centos 7 for release
commit criteria only. Added the cleanup for cmake generating
CMakeCache.txt files and CMakeFiles directories to avoid of
cmake localy created setup failing inside the docker after
the whole tarantool path was copied into it. Added testing
into the static build, running only when RUN_TESTS environment
variable set to non empty value, used in gitlab-ci job to run
the testing after the build.

Closes #3668

(cherry picked from commit f7509186)

Unverified

100e3028

Aug 19, 2019
- log: fix segfault on _say without filename · e3b886f6
  Mons Anderson authored 5 years ago
  
  (cherry picked from commit d0e38d59)
  e3b886f6
- relay: set `last_row_time' to `now' in `relay_new' and `relay_start'. (#4431) · e448cfed
  rtokarev authored 5 years ago
  
  (cherry picked from commit 507f3721)
  e448cfed
Aug 16, 2019

gc: randomie the next checkpoint time also after a manual box.snapshot(). · f9950440

Konstantin Osipov authored 5 years ago

Before this patch, snapshot interval was set randomly within
checkpoint_interval period. However, after box.snapshot(), the next
snapshot was scheduled exactly checkpoint_interval from the current time.
Many orchestration scripts snapshot entire cluster right after deployment,
to take a backup. This kills randomness, since all instances begin to
count the next checkpoint time from the current time.

Randomize the next checkpoint time after a manual snapshot as well.

Fixes gh-4432

(cherry picked from commit 6277f48a)

f9950440

test: update test-run · 9f1beecc

Alexander Turenko authored 5 years ago

pretest_clean: preserve GREATEST and LEAST built-in functions.

Needed for #4405.

(cherry picked from commit 05fb6faa)

Unverified

9f1beecc

Hotfix for · 94479e8d

Nikita Pettik authored 5 years ago

It was forgotten to update result file of sql/bind.test.lua
in previous patch. Let's fix that and refresh sql/bind.result with
up-to-date results.

(cherry picked from commit 0894bec2)

94479e8d

Aug 15, 2019

sql: fix type in meta for unsigned binding · c5246686

Nikita Pettik authored 5 years ago

It was decided that for all integer literals we would return "INTEGER"
type, not "UNSIGNED". Accidentally, after substitution of unsigned
binding value type was set to "UNSIGNED". Let's fix that and set
"INTEGER" type.

(cherry picked from commit b7d595ac)

c5246686

luajit: Bump luajit version · f0e6c87b
Kirill Yukhin authored 5 years ago
```
(cherry picked from commit 03a39c3d)
```
f0e6c87b
test: new test for LuaJIT fold machinery · ade32483
Sergey Ostanevich authored 5 years ago
```
https://github.com/LuaJIT/LuaJIT/issues/505
(cherry picked from commit 26303604)
```
ade32483
luajit: Bump luajit version · f20e692d
Kirill Yukhin authored 5 years ago
```
(cherry picked from commit a634bd7d)
```
f20e692d

test: fix flaky swim/errinj.test.lua · d5b2d614

Vladislav Shpilevoy authored 5 years ago

In one place that test sends a packet and expects that it has
arrived two lines below. Under high load it may take more time.
The patch makes the test explicitly wait for the packet arrival.

Closes #4392

d5b2d614

Aug 14, 2019

test: app/socket flaky fails at 1118 line · 109a304d

Alexander V. Tikhonov authored 5 years ago

Found that on high loaded hosts the test flaky fails at:

[004] --- app/socket.result	Mon Jul 15 07:18:57 2019
[004] +++ app/socket.reject	Tue Jul 16 16:37:35 2019
[004] @@ -1118,7 +1118,7 @@
[004]  ...
[004]  ch:get(1)
[004]  ---
[004] -- true
[004] +- null
[004]  ...
[004]  s:error()
[004]  ---

Found that the test in previous was used for testing the
the channel get() function timeout and the error occurred
on it, but later the checking error changed to:
"builtin/socket.lua: attempt to use closed socket" and the
test became not correct. Because for now it passes when the
socket read function runs before the socket closing, but in
this way read call doesn't wait. In the other way on high
loaded hosts the close call may occure before read call and
in this way read call halts and socket get call returns
'null'. As seen both ways are not correct to check the error.
Decided to remove this subtest.

Check commit ba7a4fee ("Add tests for socket:close closes #360")

Fixes #4354

(cherry picked from commit 952d8d1d)

109a304d

Aug 13, 2019

json: detect a new invalid json path case · b70aae59

Vladislav Shpilevoy authored 5 years ago

JSON paths has no a strict standard, but definitely there is no
an implementation, allowing to omit '.' after [], if a next token
is a key. For example:

    [1]key

is invalid. It should be written like that:

    [1].key

Strangely, but we even had tests on the invalid case.

Closes #4419

(cherry picked from commit ef64ee51)

b70aae59