Commits · 8c06a069d764740b3ed5759c6b9a1ddd1d60f133 · core / tarantool

Aug 08, 2018

test: fix box/errinj.test.lua sporadic failure · 8c06a069

Mergen Imeev authored 6 years ago

In some cases operation box.snapshot() takes longer than expected.
This leads to situations when the previous error is reported instead
of the new one. Now these errors completely separated.

Closes #3599

8c06a069

Aug 07, 2018

test: update test-run submodule · 264fb34b
Kirill Yukhin authored 6 years ago
```
Print reproduce file.
```
264fb34b

test: Switch CI test-run to one job · 0e29a71c

Sergei Voronezhskii authored 6 years ago

The -j -1 used to legacy consistent mode. Reducing the number of jobs
to one by switching to -j 1, uses same part of the code as in parallel
mode. The code in parallel mode kills hung tests.

Part of https://github.com/tarantool/test-run/issues/106

0e29a71c

box: serialize calls to box.cfg · 1d3a6cb0

Vladimir Davydov authored 6 years ago

It is dangerous to call box.cfg() concurrently from different fibers.
For example, replication configuration uses static variables and yields
so calling it concurrently can result in a crash. To make sure it never
happens, let's protect box.cfg() with a lock.

Closes #3606

1d3a6cb0

Aug 03, 2018
- Fix csv crash with ending space and empty field · a9695b98
  Alexander Turenko authored 6 years ago
  
  Fixes #3489.
  a9695b98
Aug 02, 2018

test: update test-run · f29466cd

Alexander Turenko authored 6 years ago

* support expected fail of non-default server
* fix format_process function: prevent crash on some machines (#92)
* print whole reject file when a test failed (#102)

Unverified

f29466cd

Exclude install targets of libyaml from CMake · 3c26b86d
Eugine Blikh authored 6 years ago
```
No more `include/yaml.h` and `lib/libyaml_static.a` installs.

closes gh-3547
```
3c26b86d

Jul 26, 2018
- lua: fix fio.rmtree to work with non empty dirs · 9917edc7
  Konstantin Belyavskiy authored 6 years ago
  
  Fix 'fio.rmtree' to remove a non empty directories. And update test. Closes #3258
  9917edc7
Jul 22, 2018

replication: unregister replica with gc if deleted from cluster · ea28a925

Vladimir Davydov authored 6 years ago

When a replica is removed from the cluster table, the corresponding
replica struct isn't destroyed unless both the relay and the applier
attached to it are stopped, see replica_clear_id(). Since replica struct
is a holder of the garbage collection state, this means that in case an
evicted replica has an applier or a relay that fails to exit for some
reason, garbage collection will hang.

A relay thread stops as soon as the replica it was started for receives
a row that tries to delete it from the cluster table (because this isn't
allowed by the cluster space trigger, see on_replace_dd_cluster()).
If a replica isn't running, the corresponding relay can't run as well,
because writing to a closed socket isn't allowed. That said, a relay
can't block garbage collection.

An applier, however, is deleted only when replication is reconfigured.
So if a replica that was evicted from the cluster was configured as a
master, its replica struct will hang around blocking garbage collection
for as long as the replica remains in box.cfg.replication. This is what
happens in #3546.

Fix this issue by forcefully unregistering a replica with the garbage
collector when it is deleted from the cluster table. This is OK as it
won't be able to resubscribe and so we don't need to keep WALs for it
any longer. Note, the relay thread may still be running when a replica
is deleted from the cluster table, in which case we can't unregister it
with the garbage collector right away, because the relay may need to
access the garbage collection state. In such a case, leave the job to
replica_clear_relay, which is called as soon as the relay thread exits.

Closes #3546

ea28a925

Jul 19, 2018
- say: fix invalid arguments · 1046f851
  Kirill Shcherbatov authored 6 years ago
  
  _say function was called with invalid arguments. Thank @sorc1 for patch. Closes #3433.
  1046f851
- say: add missing strdup failure check · c422b267
  Olga Arkhangelskaia authored 6 years ago
  
  Strdup may silently fail without any message from tarantool. Patch adds this checks.
  c422b267
Jul 17, 2018

net.box: fix invalid index:count() with iterator · 25b9f0f0

Kirill Shcherbatov authored 6 years ago

Net.box didn't pass options containing iterator to
server side.
There were also invalid results for two :count tests in
net.box.result file.

Thanks @ademenev for contributing problem and help with
problem locating.

Closes #3262.

25b9f0f0

Jul 16, 2018

Do not recycle a fiber if it is canceled · 1f187cac

Georgy Kirichenko authored 6 years ago

If a fiber pool reuses already canceled fiber then the fiber reports an
error for any next request. Now canceled fiber returns and fiber pool
creates a new one.

Fixes #3527

1f187cac

Jul 13, 2018
- Update libyaml version · 687cf3b6
  Kirill Yukhin authored 6 years ago
  
  New commit in third_party/libyaml downgrades required cmake version.
  687cf3b6
- export api functions for sequences · 2d52eda4
  Ivan Kosenko authored 6 years ago
  
  2d52eda4
Jul 12, 2018

third-party: update libyaml submodule · aeabe633

Kirill Shcherbatov authored 6 years ago

Need to update tests as with fixup in upstrem
commit baf636a74b4b6d055d93e2d01366d6097eb82d90
Author: Tina Müller <cpan2@tinita.de>
Date:   Thu Jun 14 19:27:04 2018 +0200

The closing single quote needs to be indented...
if it's on its own line.

Closes #3275.

aeabe633

Update libyaml submodule · b6111ab2
Kirill Yukhin authored 6 years ago
```
Closes #3275.
```
b6111ab2
Remove redundant '}' from box/lua/info.h · 52775d91
Vladislav Shpilevoy authored 6 years ago
```
Found by @ImeevMA
```
52775d91

Jul 10, 2018

recovery: recount offset on checkpoint_interval change · 94ced96b

Konstantin Belyavskiy authored 6 years ago

Next checkpoint time is set by the formula:
 period = self.checkpoint_interval + offset,
where offset is defined as follow:
 offset = random % self.checkpoint_interval

So offset must be calculated again if at least the new
interval is less than the old one.

Closes #3370

94ced96b

app: fix parsing integers with exponent in json · f9f89acb

Kirill Shcherbatov authored 6 years ago

Now it is possible to specify a number in exponential
form via all formats allowed by json standard.
json.decode('{"remained_amount":2.0e+3}')
json.decode('{"remained_amount":2.0E+3}')
json.decode('{"remained_amount":2e+3}')
json.decode('{"remained_amount":2E+3}')     <-- fixed

Closes #3514.

f9f89acb

Jul 09, 2018

Do not update schema_version on space:truncate(). · 2407e389

Serge Petrenko authored 6 years ago

Schema version is used by both clients and internal modules to check
whether there vere any updates in spaces and indices. While clients
only need to be notified when there is a noticeable change, e.g.
space is removed, internal components also need to be notified when
something like space:truncate() happens, because even though this
operation doesn't change space id or any of its indices, it creates a
new space object, so all the pointers to the old object have to be updated.
Currently both clients and internals share the same schema version, which
leads to unnecessary updates on the client side.

Fix this by implementing 2 separate counters for internal and public use:
schema_state gets updated on every change, including recreation of the same
space object, while schema_version is updated only when there are noticable
changes for the clients. Introduce a new AlterOp to alter.cc to update
public schema_version.
Now all the internals reference schema_state, while all the clients use
schema_version. box.iternal.schema_version() returns schema_version
(the public one).

Closes: #3414

2407e389

Jul 05, 2018

lib/bitset: rename bitset structs · befd4ee1

Kirill Shcherbatov authored 6 years ago

Fixed FreeBSD build: there were conflicting types bitset
declared in lib/bitset and _cpuset.h that is the part of
pthread_np.h used on FreeBSD.

Resolves #3046.

befd4ee1

test: fix box-tap/cfg.test · 965ada65

Kirill Yukhin authored 6 years ago

After read-only flag is dropped, a test space
is created successfully and on next launch creation
will fail since it is not droppped.
Drop the space.

Closes #3507

965ada65

Jul 04, 2018

Fix nested calls to box.session.su() · 566e066c

Serge Petrenko authored 6 years ago

box.session.su() set effective user to user
after its execution, which made nested calls
to it not work. Fixed this by saving current
effective user and recovering from the save
after sudo execution. This opened up a bug in
box.schema.user.drop(): it has unnecessary
check for privelege PRIV_REVOKE, which never
gets granted to anyone but admin. Also fixed
this by adding one extra box.session.su() call.

Closes #3090, #3492

566e066c

Jul 03, 2018

memtx: vocally abort a transaction in case of implicit yield · 131121c9

Konstantin Osipov authored 6 years ago

Before this patch, memtx would silently roll back a multi-statement
transaction on yield, switching the session to autocommit mode.

It would do nothing in case yield happened in a sub-statement
in auto-commit mode.

This could lead to nasty/painful to debug side-effects in
malformed Lua programs.

Fix by adding a special transaction state - aborted, and enter
this state in case of implicit yield.

Check for what happens when a sub-statement yields.
Check that yield trigger is removed by a rollback.

Fixes gh-2631
Fixes gh-2528

131121c9

Jun 29, 2018

fiber: invoke on_yield triggers from fiber_call(). · 2239e023

Konstantin Osipov authored 6 years ago

fiber->on_yield triggers were not invoked in fiber_call(),
which meant that memtx transaction was not rolled back by
fiber.create().

Fixes gh-3493

2239e023

Jun 28, 2018

http: Fix parse long headers names · 3d121dd4

Ilya Markov authored 6 years ago

Bug: During parsing http headers, long headers names are truncated
to zero length, but values are not ignored.

Fix this with adding parameter  max_header_name_length to http request.
If header name is bigger than this value, header name is truncated to
this length. Default value of max_header_name_length is 32.

Do some refactoring with renaming long names in http_parser.

Closes #3451

3d121dd4

http: Remove parsed status line from headers · 139aa814

Ilya Markov authored 6 years ago

Bug: Header parser validates http status line and besides saving http
status, saves valid characters to header name, which is wrong.

Fix this with skipping status line after validation without saving it as
a header.

In scope of #3451

139aa814

xdir: remove inprogress files after restart · f41aac61

Vladimir Davydov authored 6 years ago

If tarantool is stopped while writing a snapshot or a vinyl run file,
inprogress files will never be removed. Fix this by collecting those
files on recovery completion.

Original patch by @IlyaMarkovMipt. Reworked by @locker.

Closes #3406

f41aac61

xdir: change log messages in gc functions · 93a50580

Ilya Markov authored 6 years ago

In order to log only about files that were actually removed change log
messages from "removing <name of file>" to "removed <name of file>" in
vy_run_remove_files and xdir_collect_garbage functions.

Needed for #3406

93a50580

Fix for https://github.com/tarantool/doc/issues/491 · cdc454b8
LapaevPavel authored 6 years ago

cdc454b8
test: update test results · aaa9bdbe
Konstantin Osipov authored 6 years ago
```
A minor follow up on the fix for gh-3452 (http.client timeout bug)
```
aaa9bdbe

http.client: Fix waiting after received result · 7dcc8b42

Ilya Markov authored 6 years ago

Current implementation of http.client relies on fiber_cond which is set
after the request was registered and doesn't consider the fact that
response may be handled before the set of fiber_cond.

So we may have the following situation:
1. Register request in libcurl(curl_multi_add_handle in curl_execute).
2. Receive and process response, fiber_cond_signal on cond_var which no
one waits.
3. fiber_cond_wait on cond which is already signaled. Wait until timeout
is fired.

In this case user have to wait timeout, though data was received
earlier.

Fix this with adding extra flag in_progress to curl_request struct.
Set this flag true before registering request in libcurl and set it
false when request is finished before fiber_cond_signal.
When in_progress flag is false, don't wait on cond variable.

Add 1 error injection.

Closes #3452

7dcc8b42

Jun 27, 2018

net.box: update a test case after cherry-pick · e04b5b23
Konstantin Osipov authored 6 years ago
```
schema_version must be passed to perform_request in 1.9
```
e04b5b23

iproto: on input discard do nothing for closed con · a60c8dff

Vladislav Shpilevoy authored 6 years ago

When a connection is closed, some of long-poll requests still may
by in TX thread with non-discarded input. If a connection is
closed, and then an input is discarded, then connection must not
try to read new data.

The bug was introduced here:
f4d66dae by me.

Closes #3400

a60c8dff

Jun 25, 2018

socket: fix race between unix tcp server stop and start · 80d379ee

Vladimir Davydov authored 6 years ago

If called on a unix socket, bind(2) creates a new file, see unix(7).
When we stop a unix tcp server, we should remove that file. Currently,
we do it from the tcp server fiber, after the server loop is broken,
which happens when the socket is closed, see tcp_server_loop(). This
opens a time window for another tcp server to reuse the same path:

    main fiber                  tcp server loop
    ----------                  ---------------

    -- Start a tcp server.
    s = socket.tcp_server('unix/', sock_path, ...)
    -- Stop the server.
    s:close()

                                socket_readable? => no, break loop

    -- Start a new tcp server. Use the same path as before.
    -- This function succeeds, because the socket is closed
    -- so tcp_server_bind_addr() will clean up by itself.
    s = socket.tcp_server('unix/', sock_path, ...)

     tcp_server_bind
      tcp_server_bind_addr
       socket_bind => EADDRINUSE
       tcp_connect => ECONNREFUSED
       -- Remove dead unix socket.
       fio.unlink(addr.port)
       socket_bind => success

                                -- Deletes unix socket used
                                -- by the new server.
                                fio.unlink(addr.port)

In particular, the race results in sporadic failures of app-tap/console
test, which restarts a tcp server using the same file path.

To fix this issue, let's close the socket after removing the socket
file. This is absolutely legit on any UNIX system, and this eliminates
the race shown above, because a new server that tries to bind on the
same path as the one already used by a dying server will not receive
ECONNREFUSED until the socket fd is closed and hence the file is
removed.

A note about the app-tap/console test. After this patch is applied,
socket.close() takes a little longer for unix tcp server, because it
yields twice, once for removing the socket file and once for closing the
socket file descriptor. As a result, on_disconnect() trigger left from
the previous test case has time to run after session.type() check.
Actually, those triggers have already been tested and we should have
cleared them before proceeding to the next test case. So instead of
adding two new on_disconnect checks to the test plan, let's clear the
triggers before session.type() test case and remove 3 on_connect and 5
on_auth checks from the test plan.

Closes #3168

80d379ee

iproto: protect from false-correct size in msg header · c6951c92

Vladislav Shpilevoy authored 6 years ago

Consider this packet:

    msgpack = require('msgpack')
    data = msgpack.encode(18400000000000000000)..'aaaaaaa'

Tarantool interprets 18400000000000000000 as size of a coming
iproto request, and tries with no any checks to allocate buffer
of such size. It calculates needed capacity like this:

    capacity = start_value;
    while (capacity < size)
        capacity *= 2;

Here it is possible that on i-th iteration 'capacity' < 'size',
but 'capacity * 2' overflows 64 bits and becomes < 'size' again,
so this loop never ends and occupies 100% CPU.

Strictly speaking overflow has undefined behavior. On the
original system it led to nullifying 'capacity'.

Such size is improbable as a real packet gabarits, but can appear
as a result of parsing of some invalid packet, first bytes of
which accidentally appears to be valid MessagePack uint. This is
how the bug emerged on the real system.

Lets restrict the maximal packet size to 2GB.

Closes #3464

c6951c92

Jun 14, 2018
- memtx: don't delay deletion of temporary tuples during snapshot · f9299c43
  Vladimir Davydov authored 6 years ago
  
  Since tuples stored in temporary spaces are never written to disk, we can always delete them immediately, even when a snapshot is in progress. Closes #3432
  f9299c43
- Remove unused space_noop · 93ed36ea
  Vladimir Davydov authored 6 years ago
  
  93ed36ea
Jun 08, 2018

debian: don't install systemd service file twice · e38d2762

Alexander Turenko authored 6 years ago

It fixes the following errors during tarantool installation from
packages on debian / ubuntu:

```
Unpacking tarantool (1.9.1.23.gacbd91c-1) ...
dpkg: error processing archive /var/cache/apt/archives/tarantool_1.9.1.23.gacbd91c-1_amd64.deb (--unpack):
 trying to overwrite '/lib/systemd/system/tarantool.service', which is also in package tarantool-common 1.9.1.23.gacbd91c-1
```

The problem is that tarantool.service file was shipped with
tarantool-common and tarantool packages both. It is the regression after
8925b862.

The way to avoid installing / enabling the service file within tarantool
package is to pass `--name` option to dh_systemd_enable, but do not pass
the service file name. In that case dh_systemd_enable does not found the
service file and does not enforce existence of the file.

Hope there is less hacky way to do so, but I don't found one at the
moment.

Unverified

e38d2762