Commits · 3736c3798cde5c0cfbd24623f9227726b53b09e1 · core / tarantool

Oct 25, 2018
- Merge remote-tracking branch 'origin/1.7' into 1.9 · 3736c379
  Alexander Turenko authored 6 years ago
  
  Unverified
  
  3736c379
- Merge remote-tracking branch 'origin/1.6' into 1.7 · 7ef5be2e
  Alexander Turenko authored 6 years ago
  
  Unverified
  
  7ef5be2e
- Upload tarballs to the new S3 buckets · a5da60bd
  Alexander Turenko authored 6 years ago
  
  Upload tarballs of alpha and beta tarantool versions (*.0 and *.1 branches) into 2x (3x, 4x...) buckets. See more details about the release process in the documentation: [1]. [1]: https://tarantool.io/en/doc/2.0/dev_guide/release_management/
  Unverified
  
  a5da60bd
Sep 20, 2018

Fix clang build (fails on -Wno-cast-function-type) · cea113cf

The problem is that clang does not support -Wno-cast-function-type flag.
It is the regression from 8c538963.

Follow up of #3685.
Fixes #3701.

Unverified

cea113cf

Sep 15, 2018

Fix Debug build on GCC 8 · 8c538963

Alexander Turenko authored 6 years ago

Fixed false positive -Wimplicit-fallthrough in http_parser.c by adding a
break. The code jumps anyway, so the execution flow is not changed.

Fixed false positive -Wparenthesis in reflection.h by removing the
parentheses. The argument 'method' of the macro 'type_foreach_method' is
just name of the loop variable and is passed to the macro for
readability reasons.

Fixed false positive -Wcast-function-type triggered by reflection.h by
adding -Wno-cast-function-type for sources and unit tests. We cast a
pointer to a member function to an another pointer to member function to
store it in a structure, but we cast it back before made a call. It is
legal and does not lead to an undefined behaviour.

Fixes #3685.

Unverified

8c538963

Sep 04, 2018

box: sync on replication configuration update · 113ade24

Vladimir Davydov authored 6 years ago

Now box.cfg() doesn't return until 'quorum' appliers are in sync not
only on initial configuration, but also on replication configuration
update. If it fails to synchronize within replication_sync_timeout,
box.cfg() returns without an error, but the instance enters 'orphan'
state, which is basically read-only mode. In the meantime, appliers
will keep trying to synchronize in the background, and the instance
will leave 'orphan' state as soon as enough appliers are in sync.

Note, this patch also changes logging a bit:
 - 'ready to accept request' is printed on startup before syncing
   with the replica set, because although the instance is read-only
   at that time, it can indeed accept all sorts of ro requests.
 - For 'connecting', 'connected', 'synchronizing' messages, we now
   use 'info' logging level, not 'verbose' as they used to be, because
   those messages are important as they give the admin idea what's
   going on with the instance, and they can't flood logs.
 - 'sync complete' message is also printed as 'info', not 'crit',
   because there's nothing critical about it (it's not an error).

Also note that we only enter 'orphan' state if failed to synchronize.
In particular, if the instnace manages to synchronize with all replicas
within a timeout, it will jump from 'loading' straight into 'running'
bypassing 'orphan' state. This is done for the sake of consistency
between initial configuration and reconfiguration.

Closes #3427

@TarantoolBot document
Title: Sync on replication configuration update
The behavior of box.cfg() on replication configuration update is
now consistent with initial configuration, that is box.cfg() will
not return until it synchronizes with as many masters as specified
by replication_connect_quorum configuration option or the timeout
specified by replication_connect_sync occurs. On timeout, it will
return without an error, but the instance will enter 'orphan' state.
It will leave 'orphan' state as soon as enough appliers have synced.

113ade24

box: add replication_sync_timeout configuration option · ca9fc33a

Olga Arkhangelskaia authored 6 years ago

In the scope of #3427 we need timeout in case if an instance waits for
synchronization for too long, or even forever. Default value is 300.

Closes #3674

@locker: moved dynamic config check to box/cfg.test.lua; code cleanup

@TarantoolBot document
Title: Introduce new configuration option replication_sync_timeout
After initial bootstrap or after replication configuration changes we
need to sync up with replication quorum. Sometimes sync can take too
long or replication_sync_lag can be smaller than network latency we
replica will stuck in sync loop that can't be cancelled.To avoid this
situations replication_sync_timeout can be used. When time set in
replication_sync_timeout is passed replica enters orphan state.
Can be set dynamically. Default value is 300 seconds.

ca9fc33a

box: make replication_sync_lag option dynamic · 5eb5c181

Olga Arkhangelskaia authored 6 years ago

In #3427 replication_sync_lag should be taken into account during
replication reconfiguration. In order to configure replication properly
this parameter is made dynamic and can be changed on demand.

@locker: moved dynamic config check to box/cfg.test.lua

@TarantoolBot document
Title: recation_sync_lag option can be set dynamically
box.cfg.recation_sync_lag now can be set at any time.

5eb5c181

Aug 30, 2018

replication: rename a return pipe from tx to tx_prio · de80c262

Konstantin Belyavskiy authored 6 years ago

There are two different pipes: 'tx' and 'tx_prio'. The latter
does not support yield(). Rename it to avoid misunderstanding.

Needed for #3397

de80c262

Update test-run · 6004bd9d

Vladimir Davydov authored 6 years ago

The new version marks more file descriptors used by test-run
internals as CLOEXEC. Needed to make replication/misc test pass
(it lowers RLIMIT_NOFILE).

6004bd9d

Aug 29, 2018

iproto: don't throw exception in replication handler · 2e87902e

Georgy Kirichenko authored 6 years ago

It is an error to throw an error out of a cbus message handler because
it breaks cbus message delivery. In case of replication throwing an
error prevents iproto against replication socket closing.

Closes #3642

2e87902e

Update test-run · 3e382d6e

Vladimir Davydov authored 6 years ago

So as instances started by test-run don't inherit file descriptors
corresponding to logs and sockets of all running instances.

Needed for testing #3642

3e382d6e

Aug 24, 2018

socket: prevent recvfrom from returning garbage · 87f9be4d

Alexander Turenko authored 6 years ago

In C recvfrom function sets addrlen parameter to zero when called on TCP
socket (at least on Linux). The src_addr parameter can contain garbage
in the case, so we should not dereference it.

Before this commit socket:recvfrom() can return 'from' table with only
family field (don't sure why, but addr->sa_family often contain PF_INET
value in my case) or return nil depending on the garbage at the address.
Now it always return nil.

87f9be4d

replication: fix exit with ER_NO_SUCH_USER during bootstrap · 33950162

Serge Petrenko authored 6 years ago

When replication is configured via some user created in box.once()
function and box.once() takes more than replication_timeout seconds
to execute, appliers recieve ER_NO_SUCH_USER error, which they don't
handle. This leads to occasional test failures in replication suite.
Fix this by handling the aforementioned case in applier_f() and add a
test case.

Closes #3637

33950162

lua: wrong 'tomap' work with nullable fields · 50a0f1e8

Mergen Imeev authored 6 years ago

Tuple method 'tomap' in some cases worked improperly if tuple
length less than it should be according to space format. Fixed
in this patch.

Closes #3631

50a0f1e8

Aug 21, 2018

lua: fix for option pid_file overwritten by tarantoolctl · a1d685f3

Konstantin Belyavskiy authored 6 years ago

During startup tarantoolctl ignores 'pid_file' option and set it to
default value.
This cause a fault if user tries to execute config with option set.
In case of being started with tarantoolctl shadow this option with
additional wrapper around box.cfg.

Closes #3214

a1d685f3

Add FindICONV and iconv wrapper · dcac64af

Konstantin Belyavskiy authored 6 years ago

Fixing build under FreeBSD:
Undefined symbol "iconv_open"
Add compile time build check with FindICONV.cmake
and a wrapper to import relevant symbol names with include file.

Closes #3441

dcac64af

box: fix long uri output in box.info() · aa7831c2

Serge Petrenko authored 6 years ago

lua_pushapplier() had an inexplicably small buffer for uri representation.
Enlarged the buffer. Also lua_pushapplier() didn't take into account
that uri_format() could return a value larger than buffer size. Fixed.

Closes #3630

aa7831c2

Aug 14, 2018

replication: do not ignore replication_connect_quorum · c1a16b26

Serge Petrenko authored 6 years ago

On bootstrap and after initial configuration replication_connect_quorum
was ignored. The instance tried to connect to every replica listed in
replication parameter, and failed if it wasn't possible.

The patch alters this behaviour. An instance still tries to connect to
every node listed in box.cfg.replication, but does not raise an error if
it was able to connect to at least replication_connect_quorum instances.

Closes #3428

@TarantoolBot document
Title: replication_connect_quorum is not ignored
Now on replica set bootstrap and in case of replication reconfiguration
(e.g. calling box.cfg{replication=...} for the second time) tarantool
doesn't fail, if it couldn't connect to to every replica, but could
connect to replication_connect_quorum replicas. If after
replication_connect_timeout seconds the instance is not connected to at
least replication_connect_quorum other instances, we throw an error.

c1a16b26

test: add arguments to replication instances · 438a4e65

Serge Petrenko authored 6 years ago

Add start arguments to replication test instances to control
replication_timeout and replication_connect_timeout settings
between restarts.

Needed for #3428

438a4e65

Update test-run · f702beeb
Serge Petrenko authored 6 years ago
```
Allows to pass arguments to servers started with create_cluster().
```
f702beeb

Aug 13, 2018

say: fix O_NONBLOCK flag loss · 878ff7f7

Olga Arkhangelskaia authored 6 years ago

During syslog reconnect we lose nonblock flag. This leads to misbehavior
while logging. Tarantool hangs forever.

Closes #3615

878ff7f7

Aug 11, 2018

test: fix box/bitset test failure · 444355dd

Vladimir Davydov authored 6 years ago

Reproduce file:

- [box/access.test.lua, null]
- [box/iterator.test.lua, null]
- [box/bitset.test.lua, null]

The issue happens, because box/bitset.lua:dump() uses iterate(), which
gets cleared by box/iterator test. Fix this by using utils.iterate()
instead.

444355dd

Aug 10, 2018

vinyl: fix appearance of phantom tuple in secondary index after update · e72867cb

Vladimir Davydov authored 6 years ago

index.update() looks up the old tuple in the primary index, applies
update operations to it, then writes a DELETE statement to secondary
indexes to delete the old tuple and a REPLACE statement to all indexes
to insert the new tuple. It also sets a column mask for both DELETE and
REPLACE statements. The column mask is a bit mask which has a bit set if
the corresponding field is updated by update operations. It is used by
the write iterator for two purposes. First, the write iterator skips
REPLACE statements that don't update key fields. Second, the write
iterator turns a REPLACE that has a column mask that intersects with key
fields into an INSERT (so that it can get annihilated with a DELETE when
the time comes). The latter is correct, because if an update() does
update secondary key fields, then it must have deleted the old tuple and
hence the new tuple is unique in terms of extended key (merged primary
and secondary key parts, i.e. cmp_def).

The problem is that a bit may be set in a column mask even if the
corresponding field does not actually get updated. For example, consider
the following example.

  s = box.schema.space.create('test', {engine = 'vinyl'})
  s:create_index('pk')
  s:create_index('sk', {parts = {2, 'unsigned'}})
  s:insert{1, 10}
  box.snapshot()
  s:update(1, {{'=', 2, 10}})

The update() doesn't modify the secondary key field so it only writes
REPLACE{1, 10} to the secondary index (actually it writes DELETE{1, 10}
too, but it gets overwritten by the REPLACE). However, the REPLACE has
column mask that says that update() does modify the key field, because a
column mask is generated solely from update operations, before applying
them. As a result, the write iterator will not skip this REPLACE on
dump. This won't have any serious consequences, because this is a mere
optimization. What is worse, the write iterator will also turn the
REPLACE into an INSERT, which is absolutely wrong as the REPLACE is
preceded by INSERT{1, 10}. If the tuple gets deleted, the DELETE
statement and the INSERT created by the write iterator from the REPLACE
will get annihilated, leaving the old INSERT{1, 10} visible.

The issue may result in invalid select() output as demonstrated in the
issue description. It may also result in crashes, because the tuple
cache is very sensible to invalid select() output.

To fix this issue let's clear key bits in the column mask if we detect
that an update() doesn't actually update secondary key fields although
the column mask says it does.

Closes #3607

e72867cb

Aug 08, 2018

test: fix box/errinj.test.lua sporadic failure · 8c06a069

Mergen Imeev authored 6 years ago

In some cases operation box.snapshot() takes longer than expected.
This leads to situations when the previous error is reported instead
of the new one. Now these errors completely separated.

Closes #3599

8c06a069

Aug 07, 2018

test: update test-run submodule · 264fb34b
Kirill Yukhin authored 6 years ago
```
Print reproduce file.
```
264fb34b

test: Switch CI test-run to one job · 0e29a71c

Sergei Voronezhskii authored 6 years ago

The -j -1 used to legacy consistent mode. Reducing the number of jobs
to one by switching to -j 1, uses same part of the code as in parallel
mode. The code in parallel mode kills hung tests.

Part of https://github.com/tarantool/test-run/issues/106

0e29a71c

box: serialize calls to box.cfg · 1d3a6cb0

Vladimir Davydov authored 6 years ago

It is dangerous to call box.cfg() concurrently from different fibers.
For example, replication configuration uses static variables and yields
so calling it concurrently can result in a crash. To make sure it never
happens, let's protect box.cfg() with a lock.

Closes #3606

1d3a6cb0

Aug 03, 2018
- Fix csv crash with ending space and empty field · a9695b98
  Alexander Turenko authored 6 years ago
  
  Fixes #3489.
  a9695b98
Aug 02, 2018

test: update test-run · f29466cd

Alexander Turenko authored 6 years ago

* support expected fail of non-default server
* fix format_process function: prevent crash on some machines (#92)
* print whole reject file when a test failed (#102)

Unverified

f29466cd

Exclude install targets of libyaml from CMake · 3c26b86d
Eugine Blikh authored 6 years ago
```
No more `include/yaml.h` and `lib/libyaml_static.a` installs.

closes gh-3547
```
3c26b86d

Jul 26, 2018
- lua: fix fio.rmtree to work with non empty dirs · 9917edc7
  Konstantin Belyavskiy authored 6 years ago
  
  Fix 'fio.rmtree' to remove a non empty directories. And update test. Closes #3258
  9917edc7
Jul 22, 2018

replication: unregister replica with gc if deleted from cluster · ea28a925

Vladimir Davydov authored 6 years ago

When a replica is removed from the cluster table, the corresponding
replica struct isn't destroyed unless both the relay and the applier
attached to it are stopped, see replica_clear_id(). Since replica struct
is a holder of the garbage collection state, this means that in case an
evicted replica has an applier or a relay that fails to exit for some
reason, garbage collection will hang.

A relay thread stops as soon as the replica it was started for receives
a row that tries to delete it from the cluster table (because this isn't
allowed by the cluster space trigger, see on_replace_dd_cluster()).
If a replica isn't running, the corresponding relay can't run as well,
because writing to a closed socket isn't allowed. That said, a relay
can't block garbage collection.

An applier, however, is deleted only when replication is reconfigured.
So if a replica that was evicted from the cluster was configured as a
master, its replica struct will hang around blocking garbage collection
for as long as the replica remains in box.cfg.replication. This is what
happens in #3546.

Fix this issue by forcefully unregistering a replica with the garbage
collector when it is deleted from the cluster table. This is OK as it
won't be able to resubscribe and so we don't need to keep WALs for it
any longer. Note, the relay thread may still be running when a replica
is deleted from the cluster table, in which case we can't unregister it
with the garbage collector right away, because the relay may need to
access the garbage collection state. In such a case, leave the job to
replica_clear_relay, which is called as soon as the relay thread exits.

Closes #3546

ea28a925

Jul 19, 2018
- say: fix invalid arguments · 1046f851
  Kirill Shcherbatov authored 6 years ago
  
  _say function was called with invalid arguments. Thank @sorc1 for patch. Closes #3433.
  1046f851
- say: add missing strdup failure check · c422b267
  Olga Arkhangelskaia authored 6 years ago
  
  Strdup may silently fail without any message from tarantool. Patch adds this checks.
  c422b267
Jul 17, 2018

net.box: fix invalid index:count() with iterator · 25b9f0f0

Kirill Shcherbatov authored 6 years ago

Net.box didn't pass options containing iterator to
server side.
There were also invalid results for two :count tests in
net.box.result file.

Thanks @ademenev for contributing problem and help with
problem locating.

Closes #3262.

25b9f0f0

Jul 16, 2018

Do not recycle a fiber if it is canceled · 1f187cac

Georgy Kirichenko authored 6 years ago

If a fiber pool reuses already canceled fiber then the fiber reports an
error for any next request. Now canceled fiber returns and fiber pool
creates a new one.

Fixes #3527

1f187cac

Jul 13, 2018
- Update libyaml version · 687cf3b6
  Kirill Yukhin authored 6 years ago
  
  New commit in third_party/libyaml downgrades required cmake version.
  687cf3b6
- export api functions for sequences · 2d52eda4
  Ivan Kosenko authored 6 years ago
  
  2d52eda4
Jul 12, 2018

third-party: update libyaml submodule · aeabe633

Kirill Shcherbatov authored 6 years ago

Need to update tests as with fixup in upstrem
commit baf636a74b4b6d055d93e2d01366d6097eb82d90
Author: Tina Müller <cpan2@tinita.de>
Date:   Thu Jun 14 19:27:04 2018 +0200

The closing single quote needs to be indented...
if it's on its own line.

Closes #3275.

aeabe633