Commits · 507f37215ebcb5df882576d14eb5c135811b5fd9 · core / tarantool

Aug 19, 2019
- relay: set `last_row_time' to `now' in `relay_new' and `relay_start'. (#4431) · 507f3721
  rtokarev authored 5 years ago
  
  507f3721
Aug 16, 2019

pretest_clean: preserve GREATEST and LEAST built-in functions.

Needed for #4405.

(cherry picked from commit 05fb6faa)

16323e53

Aug 15, 2019
- luajit: Bump luajit version · 144b1e7c
  Kirill Yukhin authored 5 years ago
  
  (cherry picked from commit 03a39c3d)
  144b1e7c
- luajit: Bump luajit version · 8bc69524
  Kirill Yukhin authored 5 years ago
  
  (cherry picked from commit a634bd7d)
  8bc69524
- test: new test for LuaJIT fold machinery · 04feb23f
  Sergey Ostanevich authored 5 years ago
  
  https://github.com/LuaJIT/LuaJIT/issues/505 (cherry picked from commit 26303604)
  04feb23f
Aug 14, 2019

test: app/socket flaky fails at 1118 line · 251fc1ad

Alexander V. Tikhonov authored 5 years ago

Found that on high loaded hosts the test flaky fails at:

[004] --- app/socket.result	Mon Jul 15 07:18:57 2019
[004] +++ app/socket.reject	Tue Jul 16 16:37:35 2019
[004] @@ -1118,7 +1118,7 @@
[004]  ...
[004]  ch:get(1)
[004]  ---
[004] -- true
[004] +- null
[004]  ...
[004]  s:error()
[004]  ---

Found that the test in previous was used for testing the
the channel get() function timeout and the error occurred
on it, but later the checking error changed to:
"builtin/socket.lua: attempt to use closed socket" and the
test became not correct. Because for now it passes when the
socket read function runs before the socket closing, but in
this way read call doesn't wait. In the other way on high
loaded hosts the close call may occure before read call and
in this way read call halts and socket get call returns
'null'. As seen both ways are not correct to check the error.
Decided to remove this subtest.

Check commit ba7a4fee ("Add tests for socket:close closes #360")

Fixes #4354

(cherry picked from commit 952d8d1d)

251fc1ad

Aug 13, 2019

json: detect a new invalid json path case · fa420571

Vladislav Shpilevoy authored 5 years ago

JSON paths has no a strict standard, but definitely there is no
an implementation, allowing to omit '.' after [], if a next token
is a key. For example:

    [1]key

is invalid. It should be written like that:

    [1].key

Strangely, but we even had tests on the invalid case.

Closes #4419

(cherry picked from commit ef64ee51)

fa420571

Aug 11, 2019

test: update test-run · 41a7601f

Alexander Turenko authored 5 years ago

Disable a check whether yaml responses are well-formed for 'core =
tarantool' tests in test-run. The check was unable to handle complex
(dictionary / list) keys in a dictionary, because pyyaml does not
support them.

See also https://github.com/yaml/pyyaml/issues/88

Fixes #4421.

(cherry picked from commit 940a673e)

41a7601f

Aug 02, 2019

relay: stop relay on subscribe error · 17db2717

Vladimir Davydov authored 5 years ago

In case an error occurs between relay_start() and cord_costart() in
relay_subscribe(), the relay status won't be reset to STOPPED. As a
result, any further attempt to re-subscribe will fail with ER_CFG:
duplicate connection with the same replica UUID. This may happen, for
example, if the WAL directory happens to be temporarily inaccessible on
the master.

Closes #4399

(cherry picked from commit 35ef3320)

17db2717

Jul 31, 2019

httpc: allow only strings for opts.headers values · 4faa1038

Roman Khabibov authored 5 years ago

There are two reasons to disable support of tables with __tostring
metamethods as opts.headers values:

* It seems this feature is not much needed: a type convertion can be
  easily done on a user's end with explicit tostring() call if needed.
* It never did work since introduction in 2.1.1-311-g85e1d78bc ('httpc:
  add checking of headers in httpc:request'): __tostring presence was
  verified, but a function in the field was not invoked.

Now http client accepts only a Lua string as a header key or a value.

Closes #3679 (again).

(cherry picked from commit 28e688294b8ae3868c138a9de965479e8de24c3e)

4faa1038

Jul 29, 2019

httpc: verify "headers" option stronger · 67e5f9a5

Roman Khabibov authored 5 years ago

Added the following checks:

* opts.headers is a table.
* opts.headers keys are strings.

Clarified an error message re Lua type of opts.headers values.

Found and fixed a memory leak that appears in http client when invalid
opts.headers is passed.

Closes #4281

(cherry picked from commit fc355a2c)

67e5f9a5

Jul 26, 2019

travis-ci: deploy packages from tagged revisions · 1aa8b927

Alexander Turenko authored 5 years ago

The problem was that a tagged revision is not deployed, so after a
release we did an empty commit to trigger deployment. Now it is worked
around by adding extra deployment rules that deploys tagged revisions.
The workaround was suggested by Hiro Asari in [1].

[1]: https://github.com/travis-ci/travis-ci/issues/7780#issuecomment-302389370

Fixes #3745.

(cherry picked from commit 2fe7036d)

1aa8b927

Revert "travis-ci: freeze curl version on 7.65.0" · ef7ed7b0

Alexander V. Tikhonov authored 5 years ago

Due to the new 7.65.3 curl version released on 2019-07-19,
removed temporary workaround that downgraded the curl to
7.65.0.

This reverts commit 2e880af0.

Follows up #4288

(cherry picked from commit abd5ccee)

ef7ed7b0

Jul 25, 2019

txn: fix txn::sub_stmt_begin array size · 2b0f929e

Vladimir Davydov authored 5 years ago

We may write to txn->sub_stmt_begin[TXN_SUB_STMT_MAX] so the array size
must be TXN_SUB_STMT_MAX+1 (see txn_begin_stmt). This didn't lead to any
problems, because we would only overwrite txn::signature, which wouldn't
break anything. However, should we change the txn struct, we could get
an unexpected error or even a crash.

(cherry picked from commit 4e72874a)

2b0f929e

Jul 22, 2019

test: update test-run · f481f5d6

Alexander Turenko authored 5 years ago

pretest_clean: extended a list of predefined functions with registered
SQL builtins and persistent Lua function 'LUA'.

Follows up #4182.

(cherry picked from commit e5e23ce2)

f481f5d6

Jul 19, 2019

test: fix another net.box failure · 1d5e8ddc

Serge Petrenko authored 5 years ago

This last error
```
[035]  ...
[035]  disconnected_cnt
[035]  ---
[035] -- 1
[035] +- 2
[035]  ...
[035]  conn:close()
[035]  ---
[035]  ...
[035]  disconnected_cnt
[035]  ---
[035] -- 2
[035] +- 3
[035]  ...
[035]  test_run:cmd('stop server connecter')
[035]  ---
[035]
```
Happens because net.box is able to connect to tarantool before it has
finished bootstrap. When connecting, net.box tries to fetch schema
executing a couple of selects, but fails to pass access check since
grants aren't applied yet. This is described in detail in
https://github.com/tarantool/tarantool/issues/2763#issuecomment-499046998
So, alter the test so that it tolerates multiple connection failures.

Closes #4273

(cherry picked from commit 1a2addb8)

1d5e8ddc

Jul 17, 2019

auth: fix empty password authentication · 64d4b754

Vladimir Davydov authored 5 years ago

We are supposed to authenticate guest user without a password. This
used to work before commit 076a8420 ("Permit empty passwords in
net.box"), when guest didn't have any password. Now it has an empty
password and the check in authenticate turns out to be broken, which
breaks assumptions made by certain connectors. This patch fixes the
check.

Closes #4327

(cherry picked from commit c185a387)

64d4b754

Jul 16, 2019

test: box/net.box test flaky fails on grep_log (#4330) · a4fe86b6

avtikhon authored 5 years ago

box/net.box test flaky failed on grepping the log file
for 'ER_NO_SUCH_PROC' pattern on high load running hosts,
found that the issue can be resolved by updating the
grep_log to wait_log function to make able to wait the
needed message for some time.

[008] Test failed! Result content mismatch:
[008] --- box/net.box.result	Tue Jul  9 17:00:24 2019
[008] +++ box/net.box.reject	Tue Jul  9 17:03:34 2019
[008] @@ -1376,7 +1376,7 @@
[008]  ...
[008]  test_run:grep_log("default", "ER_NO_SUCH_PROC")
[008]  ---
[008] -- ER_NO_SUCH_PROC
[008] +- null
[008]  ...
[008]  box.schema.user.revoke('guest', 'execute', 'universe')
[008]  ---

Closes #4329

(cherry picked from commit 9bde3406)

a4fe86b6

box: do not check state in case of reconnect · 45c7fb1d

Mergen Imeev authored 5 years ago

Test box/net.box.test.lua checks state of the connection in case
of an error. It should be 'error_reconnect'. But, in cases where
testing was performed on a slow computer or in the case of a very
large load, it is possible that the connection status may change
from the 'error_reconnect' state to another state. This led to the
failure of the test. Since this check is not the main purpose of
the test, it is better to simply delete the check.

Closes #4335

(cherry picked from commit 77051a11)

45c7fb1d

Jul 15, 2019

test: vinyl/recover fails on dump counter check · bbb5ea48

Alexander V. Tikhonov authored 5 years ago

On high loaded host the test vinyl/recover failed on waiting loop
for the dumper counter check. It expected that the value should be
equal to "2" exactly, while on the high loaded host the dump could
be already run more times and the counter value found to be "3" or
even bigger values. To fix it the counter value check was changed
from the exact value to the range bigger or equal of the expecting
value. The error message before the fix was:

[003] --- vinyl/recover.result	Mon Jul 15 10:46:00 2019
[003] +++ vinyl/recover.reject	Mon Jul 15 10:58:10 2019
[003] @@ -517,7 +517,7 @@
[003]  ...
[003]  test_run:wait_cond(function() return pk:stat().disk.dump.count == 2 end)
[003]  ---
[003] -- true
[003] +- false
[003]  ...
[003]  sk:stat().disk.dump.count -- 1
[003]  ---

Closes #4345

(cherry picked from commit 2d3f299e)

bbb5ea48

Jul 09, 2019

test: fix net.box occasional failure. Again · a7d003a3

Serge Petrenko authored 5 years ago

The test regarding logging corrupted rows failed occasionally with
```
[016]  test_run:grep_log('default', 'Got a corrupted row.*')
[016]  ---
[016] -- 'Got a corrupted row:'
[016] +- null
[016]  ...
```
The logs then had
```
[010] 2019-07-06 19:36:16.857 [13046] iproto sio.c:261 !> SystemError writev(1),
called on fd 23, aka unix/:(socket), peer of unix/:(socket): Broken pipe
```
instead of the expected message.

This happened, because we closed a socket before tarantool could write a
greeting to the client, the connection was then closed, and execution
never got to processing the malformed request and thus printing the
desired message to the log.

To fix this, actually read the greeting prior to writing new data and
closing the socket.

Follow-up #4273

(cherry picked from commit eb0cc50c)

a7d003a3

test: net.box: fix case re invalid msgpack warning · d690002a

Alexander V. Tikhonov authored 5 years ago

The test case has two problems that appear from time to time and lead to
flaky fails. Those fails are look as shown below in a test-run output.

 | Test failed! Result content mismatch:
 | --- box/net.box.result	Mon Jun 24 17:23:49 2019
 | +++ box/net.box.reject	Mon Jun 24 17:51:52 2019
 | @@ -1404,7 +1404,7 @@
 |  ...
 |  test_run:grep_log('default', 'ER_INVALID_MSGPACK.*')
 | ---
 | -- 'ER_INVALID_MSGPACK: Invalid MsgPack - packet body'
 | +- 'ER_INVALID_MSGPACK: Invalid MsgPack - packet length'
 | ...
 | -- gh-983 selecting a lot of data crashes the server or hangs the
 | -- connection

'ER_INVALID_MSGPACK.*' regexp should match 'ER_INVALID_MSGPACK: Invalid
MsgPack - packet body' log message, but if it is not in a log file at a
time of grep_log() call (just don't flushed to the file yet) a message
produced by another test case can be matched ('ER_INVALID_MSGPACK:
Invalid MsgPack - packet length'). The fix here is to match the entire
message and check for the message periodically during several seconds
(use wait_log() instead of grep_log()).

Another problem is the race between writing a response to an iproto
socket on a server side and closing the socket on a client end. If
tarantool is unable to write a response, it does not produce the warning
re invalid msgpack, but shows 'broken pipe' message instead. We need
first grep for the message in logs and only then close the socket on a
client. The similar problem (with another test case) is described in
[1].

[1]: https://github.com/tarantool/tarantool/issues/4273#issuecomment-508939695

Closes: #4311

(cherry picked from commit 0f9fdd72)

d690002a

Jul 08, 2019

vinyl: fix vy_range_update_compaction_priority hang · df8f1fda

Vladimir Davydov authored 5 years ago

Under certain circumstances vy_slice_new() may create an empty slice,
e.g. on range split:

   |------------------ Slice ---------------|
                         |---- Run -----|
                     +
                  split key
   |---- Slice 1 ----||------ Slice 2 ------|
         ^^^^^^^
          Empty

vy_range_update_compaction_priority() uses the size of the last slice in
a range as a base for LSM tree level sizing. If the slice size happens
to be 0, it will simply hang in an infinite loop. Fix this potential
hang by using 1 if the last slice size is 0.

(cherry picked from commit 75dc3e64)

df8f1fda

Jul 06, 2019

test: use unix sockets for iproto connections · 047ee845

avtikhon authored 5 years ago

Enabled use_unix_sockets_iproto option to use unix sockets
iproto provides the new way to handle the problem with
'Address already in use' error. It lets test-run appoint
unix sockets for LISTEN environment variable values.

Before this change the TcpPortDispatcher was used to
eliminate the race condition when two workers trying to use
the same port: the idea was that each worker used its own
ports range. Really these ports could race with client ports
(from, say, net.box or replication), which typically didn't
use bind() and so bound to a random available port (despite
any dispatched ranges) and could produce 'Address already in
use' error.

Closes: #4008

(cherry picked from commit 60f84cbf)

047ee845

test: enable parallel mode for wal_off tests · 52124093

Sergei Voronezhskii authored 6 years ago

- Box configuration parameter `memtx_memory` is increased, because the
  test `lua` after `tuple` failed with the error:
  `Failed to allocate 368569 bytes in slab allocator for memtx_tuple`
  despite `collectgarbage('collect')` calls after cases with huge/many
  tuples.
  The statistics before the allocation fail gives the following values:
  ```
  box.slab.info()
  ---
  - items_size: 72786472
    items_used_ratio: 4.43%
    quota_size: 107374592
    quota_used_ratio: 93.75%
    arena_used_ratio: 6.1%
    items_used: 3222376
    quota_used: 100663296
    arena_size: 100663296
    arena_used: 6105960
  ```
  The reason of the fail seems to be a slab memory fragmentation. It is
  not clear for now whether we should consider this as a tarantool
  issue.

- Test `snapshot_stress` counts snapshot files present in the
  working directory and can reach the default 'checkpoint_count' value
  `2` if a previous test write its snapshots before.

- Restarting the default server w/o cleaning a working directory
  can leave a snapshot that holds a state saved at the middle of a test,
  before dropping of the space 'tweedledum' (because WAL is disabled),
  that can cause the error `Space 'tweedledum' already exists` for a
  following test.

- Use unix sockets because of errors `Address already in use`.

Part of #2436

(cherry picked from commit d837c94b)

52124093

gitlab-ci: merge test and deploy stages · 540c9563

Alexander V. Tikhonov authored 5 years ago

Current results have some tests with flaky results, which
blocks the deploy stage - decided to merge deploy stage into
test stage temporary to fix it.

Follows up #4156

(cherry picked from commit b37959be)

540c9563

Jul 05, 2019

travis-ci: freeze curl version on 7.65.0 on OS X · 2990949a

Alexander V. Tikhonov authored 5 years ago

Homebrew now contains curl-7.65.1 which affected by curl/curl#3995 (this
problem leads to segfaults). The next version is not released yet. The
current commit downgrades the curl version to 7.65.0.

Close #4288

(cherry picked from commit 2e880af0)

2990949a

Jul 04, 2019

Enable GitLab CI testing · 347e2d4b

Alexander V. Tikhonov authored 5 years ago

Implemented GitLab CI testing process additionally to existing Travis
CI. The new testing process is added to run tests faster. It requires to
control a load of machines to avoid flaky fails on timeouts. GitLab CI
allows us to run testing on our machines.

Created 2 stages for testing and deploying packages.

The testing stage contains the following jobs that are run for all
branches:

* Debian 9 (Stretch): release/debug gcc.
* Debian 10 (Buster): release clang8 + lto.
* OSX 14 (Mojave): release.
* FreeBSD 12: release gcc.

And the following jobs that are run of long-term branches (release
branches: for now it is 1.10, 2.1 and master):

* OSX 13 (Sierra): release clang.
* OSX 14 (Mojave): release clang + lto.

The deployment stage contains the same jobs as we have in Travis CI.
They however just build tarballs and packages: don't push them to S3 and
packagecloud.

In order to run full testing on a short-term branch one can name it with
'-full-ci' suffix.

The additional manual work is needed when dependencies are changed in
.travis.mk file ('deps_debian' or 'deps_buster_clang_8' goals):

 | make GITLAB_USER=foo -f .gitlab.mk docker_bootstrap

This command pushes docker images into GitLab Registry and then they are
used in testing. Pre-built images speed up testing.

Fixes #4156

(cherry picked from commit ce623a23)

347e2d4b

travis-ci: needs blind install for brew python2 · 97d8a1be

Alexander V. Tikhonov authored 5 years ago

Travis-ci failed on brew install python2 due to it was
already installed on OSX 10.13 Sierra, so it needs
to be installed without fails on its exists.

Follow up #4254

(cherry picked from commit ceefd0c8)

97d8a1be

test: Enable http_client test · b9446309

Alexander V. Tikhonov authored 5 years ago

Removed skip flag file to switch on the testing of the
http_client test. Enabled http_client test on OSX,
fixed missing of the python2 symlink. Removed the subtest
on '595 error return' from 'error' suite, due to it may
hang forever. To enable test on travis-ci reverted commit:

1d7285c4 ('Disable flaky http_client.test.lua')

Closes #4254

(cherry picked from commit 33254bd6)

b9446309

Disable flaky http_client.test.lua · f45ff1fe
Konstantin Osipov authored 5 years ago
```
Issue pending, gh-4254.

(cherry picked from commit 1d7285c4)
```
f45ff1fe

test: run full testing only on long-term branches · 8893926f

Alexander Turenko authored 5 years ago

Disabled tarballs and packages building on short-term branches.

Removed 'allow_failures' on coverage / debug build.

Replaced matrix expansion with the list of jobs (because Travis-CI
documentation says it does not support condition jobs with matrix
expansion).

Fixes #3755.

(backported from 28fcdaa0)

8893926f

Jul 03, 2019

test: update test-run · 6a0b0cff

Alexander Turenko authored 5 years ago

Fixed app/strict.test.lua fail after a test that sets 'a' global
variable. It was due to the way how 'strict' module tracks declared
global variables and was fixed in pretest_clean.lua test-run's module.

(cherry picked from commit 1df9c3d2)

6a0b0cff

Jul 01, 2019

test: update test-run · bae8ca01

Alexander Turenko authored 5 years ago

* Implement SQL driver (#4123).
* Support line carrying with backslash.
* Support new result file format (for new tests).

See the link below for a full description of the new features.

https://github.com/tarantool/test-run/commit/a04b5b096c607172ce4fc86a84e3531c9f3a7304

Fixes #4123.

(cherry picked from commit d81d7b1c)

bae8ca01

Jun 25, 2019

test: enable parallel run for python test suites · 759f0a80

Sergei Voronezhskii authored 6 years ago

Fixed issues:

- box-py/iproto.test.py
  1) Fixed receive_response() to wait for whole response.
  2) Clean up _cluster space.

- replication-py/cluster.test.py
  1) Clean up _cluster space.

- replication-py/multi.test.py
  1) Removed vclock checking because it fails if previous test make some
     DML and vclock was incremented. Looks like it was used for debug
     and is not part of this test case.
  2) Fixed typo in 'Synchronize' block.

The following test sequences did fail due to unexpected IDs in _cluster
space:

- [box-py/iproto.test.py, null]
- [box-py/bootstrap.test.py, null]

- [replication-py/cluster.test.py, null]
- [replication-py/multi.test.py, null]

Part of #3232

(cherry picked from commit 4ea7d729)

759f0a80

test: skip test backtrace if no libunwind support · 67fda078
Sergei Voronezhskii authored 6 years ago
```
Closes #3824

(cherry picked from commit 2aa25ba5)
```
67fda078

vinyl: free region on vylog commit instead of resetting it · db40dec3

Vladimir Davydov authored 5 years ago

region_reset() only frees memory from the last slab. As a result, if
a vylog transaction happens to use more than one slab, memory used by
vy_log.pool won't be freed, neither will it be reused, i.e. we'll get
a memory leak. Fix it by using region_free() instead of region_reset().
It's okay from performance point of view, because vylog transactions
are rare events.

Note, the master branch isn't affected to this issue, because the vylog
memory management was completely reworked there. 2.1 isn't affected
either, because there region_reset() was modified to free all slabs.
However, rather than backporting any of those commits, I think it's
more appropriate to simply use region_free().

db40dec3

vinyl: clean up region after allocating surrogate statement · 0e7c933d

Vladimir Davydov authored 5 years ago

vy_stmt_new_surrogate_from_key() and vy_stmt_new_surrogate_delete_raw()
allocate temporary objects on the region, but don't clean up after
themselves. Those functions may be called by a vinyl reader threads:

  vy_page_read_cb
    vy_page_find_key
      vy_page_stmt
        vy_stmt_decode

In this case the region will grow infinitely, because reader threads
never call fiber_gc().

The leak was introduced to 1.10 by commit b9072317 ("vinyl: lookup
key in reader thread"), which moved vy_page_find_key() invocation to
reader threads for the sake of performance. The fix is trivial - call
region_truncate() from those functions.

Note, neither the master branch nor 2.1 is affected to this issue,
because region_truncate() was added there in the scope of the multikey
index feature.

0e7c933d

Jun 19, 2019

test: update test-run · 0acce52a

Alexander Turenko authored 5 years ago

Add 'decimal' to a built-in modules list to preserve
package.loaded.decimal when pretest_clean is set in suite.ini.

Needed for #692.

(cherry picked from commit 0b7cc526)

0acce52a

Jun 18, 2019

lua: escape trigraphs in bundled lua sources · ed050296

Alexander Turenko authored 5 years ago

Built-in modules are bundled into tarantool in the following way. A lua
file from src/lua or src/box/lua is stored as a string literal in a C
file, then built and linked into tarantool. During startup tarantool
calls luaL_loadbuffer() on this string.

When a Lua source is converted to a C literal, proper escaping is
performed. However there is one case, which was not covered: trigraphs.
The patch adds escaping of question mark symbols to avoid matching ??X
sequences as trigraphs by C preprocessor.

The most simple way to check that it works is to apply the following
patch:

 | diff --git a/src/lua/string.lua b/src/lua/string.lua
 | index 6e12c59ae..2da2dbf4d 100644
 | --- a/src/lua/string.lua
 | +++ b/src/lua/string.lua
 | @@ -425,3 +425,6 @@ string.fromhex    = string_fromhex
 |  string.strip      = string_strip
 |  string.lstrip      = string_lstrip
 |  string.rstrip      = string_rstrip
 | +string.foo = function()
 | +    return '??('
 | +end

And call the function like so:

 | ./src/tarantool -e 'print(string.foo()) os.exit()'

If it printfs `??(`, then everything is okay. If it prints `[`, then
`??(` was preprocessed as the trigraph.

We hit this problem when tried to bundle luarocks-3: it contains
"^(.-)(%??)$" regexp, where `??)` was interpreted as `]`. Debug build or
a build with -DENABLE_WERROR reports an error in the case, but usual
RelWithDebInfo build passes (with -Wtrigraphs warnings) and can show
this unexpected behaviour.

Fixes #4291.

(cherry picked from commit 177a1713)

ed050296