Commits · 3d121dd4a82d4199e053b09a8059bf445240f749 · core / tarantool

Jun 28, 2018

http: Fix parse long headers names · 3d121dd4

Ilya Markov authored 6 years ago

Bug: During parsing http headers, long headers names are truncated
to zero length, but values are not ignored.

Fix this with adding parameter  max_header_name_length to http request.
If header name is bigger than this value, header name is truncated to
this length. Default value of max_header_name_length is 32.

Do some refactoring with renaming long names in http_parser.

Closes #3451

3d121dd4

http: Remove parsed status line from headers · 139aa814

Ilya Markov authored 6 years ago

Bug: Header parser validates http status line and besides saving http
status, saves valid characters to header name, which is wrong.

Fix this with skipping status line after validation without saving it as
a header.

In scope of #3451

139aa814

xdir: remove inprogress files after restart · f41aac61

Vladimir Davydov authored 6 years ago

If tarantool is stopped while writing a snapshot or a vinyl run file,
inprogress files will never be removed. Fix this by collecting those
files on recovery completion.

Original patch by @IlyaMarkovMipt. Reworked by @locker.

Closes #3406

f41aac61

xdir: change log messages in gc functions · 93a50580

Ilya Markov authored 6 years ago

In order to log only about files that were actually removed change log
messages from "removing <name of file>" to "removed <name of file>" in
vy_run_remove_files and xdir_collect_garbage functions.

Needed for #3406

93a50580

Fix for https://github.com/tarantool/doc/issues/491 · cdc454b8
LapaevPavel authored 6 years ago

cdc454b8
test: update test results · aaa9bdbe
Konstantin Osipov authored 6 years ago
```
A minor follow up on the fix for gh-3452 (http.client timeout bug)
```
aaa9bdbe

http.client: Fix waiting after received result · 7dcc8b42

Ilya Markov authored 6 years ago

Current implementation of http.client relies on fiber_cond which is set
after the request was registered and doesn't consider the fact that
response may be handled before the set of fiber_cond.

So we may have the following situation:
1. Register request in libcurl(curl_multi_add_handle in curl_execute).
2. Receive and process response, fiber_cond_signal on cond_var which no
one waits.
3. fiber_cond_wait on cond which is already signaled. Wait until timeout
is fired.

In this case user have to wait timeout, though data was received
earlier.

Fix this with adding extra flag in_progress to curl_request struct.
Set this flag true before registering request in libcurl and set it
false when request is finished before fiber_cond_signal.
When in_progress flag is false, don't wait on cond variable.

Add 1 error injection.

Closes #3452

7dcc8b42

Jun 27, 2018

net.box: update a test case after cherry-pick · e04b5b23
Konstantin Osipov authored 6 years ago
```
schema_version must be passed to perform_request in 1.9
```
e04b5b23

iproto: on input discard do nothing for closed con · a60c8dff

Vladislav Shpilevoy authored 6 years ago

When a connection is closed, some of long-poll requests still may
by in TX thread with non-discarded input. If a connection is
closed, and then an input is discarded, then connection must not
try to read new data.

The bug was introduced here:
f4d66dae by me.

Closes #3400

a60c8dff

Jun 25, 2018

socket: fix race between unix tcp server stop and start · 80d379ee

Vladimir Davydov authored 6 years ago

If called on a unix socket, bind(2) creates a new file, see unix(7).
When we stop a unix tcp server, we should remove that file. Currently,
we do it from the tcp server fiber, after the server loop is broken,
which happens when the socket is closed, see tcp_server_loop(). This
opens a time window for another tcp server to reuse the same path:

    main fiber                  tcp server loop
    ----------                  ---------------

    -- Start a tcp server.
    s = socket.tcp_server('unix/', sock_path, ...)
    -- Stop the server.
    s:close()

                                socket_readable? => no, break loop

    -- Start a new tcp server. Use the same path as before.
    -- This function succeeds, because the socket is closed
    -- so tcp_server_bind_addr() will clean up by itself.
    s = socket.tcp_server('unix/', sock_path, ...)

     tcp_server_bind
      tcp_server_bind_addr
       socket_bind => EADDRINUSE
       tcp_connect => ECONNREFUSED
       -- Remove dead unix socket.
       fio.unlink(addr.port)
       socket_bind => success

                                -- Deletes unix socket used
                                -- by the new server.
                                fio.unlink(addr.port)

In particular, the race results in sporadic failures of app-tap/console
test, which restarts a tcp server using the same file path.

To fix this issue, let's close the socket after removing the socket
file. This is absolutely legit on any UNIX system, and this eliminates
the race shown above, because a new server that tries to bind on the
same path as the one already used by a dying server will not receive
ECONNREFUSED until the socket fd is closed and hence the file is
removed.

A note about the app-tap/console test. After this patch is applied,
socket.close() takes a little longer for unix tcp server, because it
yields twice, once for removing the socket file and once for closing the
socket file descriptor. As a result, on_disconnect() trigger left from
the previous test case has time to run after session.type() check.
Actually, those triggers have already been tested and we should have
cleared them before proceeding to the next test case. So instead of
adding two new on_disconnect checks to the test plan, let's clear the
triggers before session.type() test case and remove 3 on_connect and 5
on_auth checks from the test plan.

Closes #3168

80d379ee

iproto: protect from false-correct size in msg header · c6951c92

Vladislav Shpilevoy authored 6 years ago

Consider this packet:

    msgpack = require('msgpack')
    data = msgpack.encode(18400000000000000000)..'aaaaaaa'

Tarantool interprets 18400000000000000000 as size of a coming
iproto request, and tries with no any checks to allocate buffer
of such size. It calculates needed capacity like this:

    capacity = start_value;
    while (capacity < size)
        capacity *= 2;

Here it is possible that on i-th iteration 'capacity' < 'size',
but 'capacity * 2' overflows 64 bits and becomes < 'size' again,
so this loop never ends and occupies 100% CPU.

Strictly speaking overflow has undefined behavior. On the
original system it led to nullifying 'capacity'.

Such size is improbable as a real packet gabarits, but can appear
as a result of parsing of some invalid packet, first bytes of
which accidentally appears to be valid MessagePack uint. This is
how the bug emerged on the real system.

Lets restrict the maximal packet size to 2GB.

Closes #3464

c6951c92

Jun 14, 2018
- memtx: don't delay deletion of temporary tuples during snapshot · f9299c43
  Vladimir Davydov authored 6 years ago
  
  Since tuples stored in temporary spaces are never written to disk, we can always delete them immediately, even when a snapshot is in progress. Closes #3432
  f9299c43
- Remove unused space_noop · 93ed36ea
  Vladimir Davydov authored 6 years ago
  
  93ed36ea
Jun 08, 2018

debian: don't install systemd service file twice · e38d2762

Alexander Turenko authored 6 years ago

It fixes the following errors during tarantool installation from
packages on debian / ubuntu:

```
Unpacking tarantool (1.9.1.23.gacbd91c-1) ...
dpkg: error processing archive /var/cache/apt/archives/tarantool_1.9.1.23.gacbd91c-1_amd64.deb (--unpack):
 trying to overwrite '/lib/systemd/system/tarantool.service', which is also in package tarantool-common 1.9.1.23.gacbd91c-1
```

The problem is that tarantool.service file was shipped with
tarantool-common and tarantool packages both. It is the regression after
8925b862.

The way to avoid installing / enabling the service file within tarantool
package is to pass `--name` option to dh_systemd_enable, but do not pass
the service file name. In that case dh_systemd_enable does not found the
service file and does not enforce existence of the file.

Hope there is less hacky way to do so, but I don't found one at the
moment.

Unverified

e38d2762

Fix libunwind segfault · 5c3b3001

Georgy Kirichenko authored 6 years ago

Use volatile asm modifier to prevent unwanted and awkward optimizations
causing segfault while backtracing

5c3b3001

Jun 07, 2018

test: update test-run · acbd91cc

Alexander Turenko authored 6 years ago

* added --verbose to show output of successful TAP13 test (#73)
* allow to call create_cluster(), drop_cluster() multiple times (#83)
* support configurations (*.cfg files) in core = app tests
* added return_listen_uri = <boolean> option for create_cluster()
* save and print at fail tarantool log for core = app tests (#87)

Unverified

acbd91cc

Fix debian package build with recent debhelper · 8925b862

Alexander Turenko authored 6 years ago

It is necessary for build on Ubuntu Bionic.

Debian bugreport:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=881481

Debhelper commit:
https://github.com/Debian/debhelper/commit/740c628a1e571acded7e2aac5d6e7058e61da37f

Unverified

8925b862

Travis CI: add Ubuntu Bionic · ab4416f8
lifemaker authored 6 years ago

Unverified

ab4416f8

Jun 02, 2018
- vclock: clang 6.0 fix · 1f9aa176
  Konstantin Osipov authored 6 years ago
  
  Fix a compiler warning with clang 6
  1f9aa176
Jun 01, 2018

vinyl: fix compaction vs checkpoint race resulting in invalid gc · b25e3168

Vladimir Davydov authored 6 years ago

The callback invoked upon compaction completion uses checkpoint_last()
to determine whether compacted runs may be deleted: if the max LSN
stored in a compacted run (run->dump_lsn) is greater than the LSN of the
last checkpoint (gc_lsn) then the run doesn't belong to the last
checkpoint and hence is safe to delete, see commit 35db70fa ("vinyl:
remove runs not referenced by any checkpoint immediately").

The problem is checkpoint_last() isn't synced with vylog rotation - it
returns the signature of the last successfully created memtx snapshot
and is updated in memtx_engine_commit_checkpoint() after vylog is
rotated. If a compaction task completes after vylog is rotated but
before snap file is renamed, it will assume that compacted runs do not
belong to the last checkpoint, although they do (as they have been
appended to the rotated vylog), and delete them.

To eliminate this race, let's use vylog signature instead of snap
signature in vy_task_compact_complete().

Closes #3437

b25e3168

May 31, 2018

vinyl: fix false-positive assertion at exit · ff02157f

Vladimir Davydov authored 6 years ago

latch_destroy() and fiber_cond_destroy() are basically no-op. All they
do is check that latch/cond is not used. When a global latch or cond
object is destroyed at exit, it may still have users and this is OK as
we don't stop fibers at exit. In vinyl this results in the following
false-positive assertion failures:

  src/latch.h:81: latch_destroy: Assertion `l->owner == NULL' failed.

  src/fiber_cond.c:49: fiber_cond_destroy: Assertion `rlist_empty(&c->waiters)' failed.

Remove "destruction" of vy_log::latch to suppress the first one. Wake up
all fibers waiting on vy_quota::cond before destruction to suppress the
second one. Add some test cases.

Closes #3412

ff02157f

May 29, 2018

Don't raise on box.cfg for instance_uuid and replicaset_uuid · 8ed45cef

Georgy Kirichenko authored 6 years ago

Handle cases if instance_uuid and replicaset_uuid are present in
box.cfg and have same values as already set.

Fixes #3421

8ed45cef

May 25, 2018

test: update replication_connect_timeout in tests to a lower value · e9bf00fc
Konstantin Osipov authored 6 years ago
```
replication: make replication_connect_timeout dynamic
```
e9bf00fc
test: update replication_connect_timeout in tests to a lower value · c2220fe4
Konstantin Osipov authored 6 years ago

c2220fe4

replication: fix log message in case of sync failure · 6c35bf9b

Vladimir Davydov authored 6 years ago

replicaset_sync() returns not only if the instance synchronized to
connected replicas, but also if some replicas have disconnected and
the quorum can't be formed any more. Nevertheless, it always prints
that sync has been completed. Fix it.

See #3422

6c35bf9b

replication: do not stop syncing if replicas are loading · 1785e79c

Vladimir Davydov authored 6 years ago

If a replica disconnects while sync is in progress, box.cfg{} may stop
syncing leaving the instance in 'orphan' mode. This will happen if not
enough replicas are connected to form a quorum. This makes sense e.g. on
network error, but not when a replica is loading, because in the latter
case it should be up and running quite soon. Let's account replicas that
disconnected because they haven't completed initial configuration yet
and continue syncing if connected + loading > quorum.

Closes #3422

1785e79c

replication: use applier_state to check quorum · ca53ab91

Konstantin Belyavskiy authored 6 years ago

Small refactoring: remove 'enum replica_state' since reuse a subset
from applier state machine 'enum replica_state' to check if we have
achieved replication quorum and hence can leave read-only mode.

ca53ab91

replication: change default replication_connect_timeout to 30 seconds · 06a63686
Konstantin Osipov authored 6 years ago
```
The default of 4 seconds is too low to bootstrap a large cluster.
```
06a63686

May 24, 2018

replication: add strict ordering for appliers operating in a full mesh · edd76a2a

Georgy Kirichenko authored 6 years ago

In some cases when an applier processing yielded, other applier might
start some conflicting operation and break replication and database
consistency.
Now applier locks a per-server-id latch before processing a transaction.
This guarantees that there is only one applier request for each server
in progress at each given moment.

The problem was very rare until full mesh topologies in vinyl
became a commonplace.

Fixes gh-3339

edd76a2a

May 22, 2018

replication: a minor refactoring in leader election · 77dfe1b0
Konstantin Osipov authored 6 years ago
```
Avoid goto, a follow up on gh-3257.
```
77dfe1b0

replication: fix bug with read-only replica as a bootstrap leader · 77098294

Konstantin Belyavskiy authored 6 years ago

Another broken case. Adding a new replica to cluster:
+		if (replica->applier->remote_is_ro &&
+		    replica->applier->vclock.signature == 0)
In this case we may got an ER_READONLY, since signature is not 0.
So leader election now has two phases:
 1. To select among read-write replicas.
 2. If no such found, try old algorithm for backward compatibility
    (case then all replicas exist in cluster table).

Closes #3257

77098294

replication: add logging to replication connect/sync · 861803ee
Konstantin Osipov authored 6 years ago

861803ee

May 17, 2018

vinyl: remove runs not referenced by any checkpoint immediately · 35db70fa

Vladimir Davydov authored 6 years ago

If a compacted run was created after the last checkpoint, it is not
needed to recover from any checkpoint and hence can be deleted right
away to save disk space.

Closes #3407

35db70fa

May 15, 2018

test: improve vinyl/select_consistency · 47fe6ced

Vladimir Davydov authored 6 years ago

Improve the test by decreasing range_size so that it creates a lot of
ranges for test indexes, not just one. This helped find bugs causing
the crash described in #3393.

Follow-up #3393

47fe6ced

vinyl: do not panic if secondary index is inconsistent with primary · 1558c538

Vladimir Davydov authored 6 years ago

Although the bug in vy_task_dump_complete() due to which a tuple could
be lost during dump was fixed, there still may be affected deployments
as the bug was persisted on disk. To avoid occasional crashes on such
deployments, let's make vinyl_iterator_secondary_next() skip tuples that
are present in a secondary index but missing in the primary.

Closes #3393

1558c538

vinyl: fix lost key on dump completion · 1f0023ad

Vladimir Davydov authored 6 years ago

vy_task_dump_complete() creates a slice per each range overlapping with
the newly written run. It uses vy_range_tree_psearch(min_key) to find
the first overlapping range and nsearch(max_key) to find the range
immediately following the last overlapping range. This is incorrect as
nsearch rb tree method returns the element matching the search key if it
is present in the tree. That is, if the max key written to a run turns
out to be equal the beginning of a range, the slice won't be created for
it and it will be silently and persistently lost.

The issue manifests itself as crash in vinyl_iterator_secondary_next(),
when we fail to find the tuple in the primary index corresponding to a
statement found in a secondary index.

Part of #3393

1f0023ad

vinyl: fix EQ check in run iterator · 7ee79a0a

Vladimir Davydov authored 6 years ago

vy_run_iterator_seek() is supposed to check that the resulting statement
matches the search key in case of ITER_EQ, but if the search key lies at
the beginning of the slice, it doesn't. As a result, vy_point_lookup()
may fail to find an existing tuple as demonstrated below.

Suppose we are looking for key {10} in the primary index which consists
of an empty mem and two runs:

    run 1: DELETE{15}
    run 2: INSERT{10}

vy_run_iterator_next() returns DELETE{15} for run 1 because of the
missing EQ check and vy_point_lookup() stops at run 1 (since the
terminal statement is found) and mistakenly returns NULL.

The issue manifests itself as crash in vinyl_iterator_secondary_next(),
when we fail to find the tuple in the primary index corresponding to a
statement found in a secondary index.

Part of #3393

7ee79a0a

Add test case for fiber safety of digest.pbkdf2 · ec9ec946
Alexander Turenko authored 6 years ago
```
Follows up #3396.
```
Unverified

ec9ec946

May 14, 2018

Use automatic storage for digest.pbkdf2 results · 06ec3d50

Alexander Turenko authored 6 years ago

It prevents rewriting result by an another thread after coio_call(), but
before lua_pushlstring(). Such case is possible because libeio uses
thread pool internally and static __thread storage can be reused before
lua_pushlstring() if many parallel digest.pbkdf2() calls are on the fly.

Fixes #3396.

Unverified

06ec3d50

May 08, 2018

socket: Fix socket test · 2b973c05

Ilya Markov authored 6 years ago

In sequential launch of app-tap/console.test, tests failed with "User
exists" and binding errors.

Make sockets path relative.
Add users cleanup.

Relates #3168

2b973c05