- Oct 25, 2018
-
-
Alexander Turenko authored
-
Alexander Turenko authored
-
Alexander Turenko authored
Upload tarballs of alpha and beta tarantool versions (*.0 and *.1 branches) into 2x (3x, 4x...) buckets. See more details about the release process in the documentation: [1]. [1]: https://tarantool.io/en/doc/2.0/dev_guide/release_management/
-
- Sep 20, 2018
-
-
Alexander Turenko authored
The problem is that clang does not support -Wno-cast-function-type flag. It is the regression from 8c538963. Follow up of #3685. Fixes #3701.
-
- Sep 15, 2018
-
-
Alexander Turenko authored
Fixed false positive -Wimplicit-fallthrough in http_parser.c by adding a break. The code jumps anyway, so the execution flow is not changed. Fixed false positive -Wparenthesis in reflection.h by removing the parentheses. The argument 'method' of the macro 'type_foreach_method' is just name of the loop variable and is passed to the macro for readability reasons. Fixed false positive -Wcast-function-type triggered by reflection.h by adding -Wno-cast-function-type for sources and unit tests. We cast a pointer to a member function to an another pointer to member function to store it in a structure, but we cast it back before made a call. It is legal and does not lead to an undefined behaviour. Fixes #3685.
-
- Sep 04, 2018
-
-
Vladimir Davydov authored
Now box.cfg() doesn't return until 'quorum' appliers are in sync not only on initial configuration, but also on replication configuration update. If it fails to synchronize within replication_sync_timeout, box.cfg() returns without an error, but the instance enters 'orphan' state, which is basically read-only mode. In the meantime, appliers will keep trying to synchronize in the background, and the instance will leave 'orphan' state as soon as enough appliers are in sync. Note, this patch also changes logging a bit: - 'ready to accept request' is printed on startup before syncing with the replica set, because although the instance is read-only at that time, it can indeed accept all sorts of ro requests. - For 'connecting', 'connected', 'synchronizing' messages, we now use 'info' logging level, not 'verbose' as they used to be, because those messages are important as they give the admin idea what's going on with the instance, and they can't flood logs. - 'sync complete' message is also printed as 'info', not 'crit', because there's nothing critical about it (it's not an error). Also note that we only enter 'orphan' state if failed to synchronize. In particular, if the instnace manages to synchronize with all replicas within a timeout, it will jump from 'loading' straight into 'running' bypassing 'orphan' state. This is done for the sake of consistency between initial configuration and reconfiguration. Closes #3427 @TarantoolBot document Title: Sync on replication configuration update The behavior of box.cfg() on replication configuration update is now consistent with initial configuration, that is box.cfg() will not return until it synchronizes with as many masters as specified by replication_connect_quorum configuration option or the timeout specified by replication_connect_sync occurs. On timeout, it will return without an error, but the instance will enter 'orphan' state. It will leave 'orphan' state as soon as enough appliers have synced.
-
Olga Arkhangelskaia authored
In the scope of #3427 we need timeout in case if an instance waits for synchronization for too long, or even forever. Default value is 300. Closes #3674 @locker: moved dynamic config check to box/cfg.test.lua; code cleanup @TarantoolBot document Title: Introduce new configuration option replication_sync_timeout After initial bootstrap or after replication configuration changes we need to sync up with replication quorum. Sometimes sync can take too long or replication_sync_lag can be smaller than network latency we replica will stuck in sync loop that can't be cancelled.To avoid this situations replication_sync_timeout can be used. When time set in replication_sync_timeout is passed replica enters orphan state. Can be set dynamically. Default value is 300 seconds.
-
Olga Arkhangelskaia authored
In #3427 replication_sync_lag should be taken into account during replication reconfiguration. In order to configure replication properly this parameter is made dynamic and can be changed on demand. @locker: moved dynamic config check to box/cfg.test.lua @TarantoolBot document Title: recation_sync_lag option can be set dynamically box.cfg.recation_sync_lag now can be set at any time.
-
- Aug 30, 2018
-
-
Konstantin Belyavskiy authored
There are two different pipes: 'tx' and 'tx_prio'. The latter does not support yield(). Rename it to avoid misunderstanding. Needed for #3397
-
Vladimir Davydov authored
The new version marks more file descriptors used by test-run internals as CLOEXEC. Needed to make replication/misc test pass (it lowers RLIMIT_NOFILE).
-
- Aug 29, 2018
-
-
Georgy Kirichenko authored
It is an error to throw an error out of a cbus message handler because it breaks cbus message delivery. In case of replication throwing an error prevents iproto against replication socket closing. Closes #3642
-
Vladimir Davydov authored
So as instances started by test-run don't inherit file descriptors corresponding to logs and sockets of all running instances. Needed for testing #3642
-
- Aug 24, 2018
-
-
Alexander Turenko authored
In C recvfrom function sets addrlen parameter to zero when called on TCP socket (at least on Linux). The src_addr parameter can contain garbage in the case, so we should not dereference it. Before this commit socket:recvfrom() can return 'from' table with only family field (don't sure why, but addr->sa_family often contain PF_INET value in my case) or return nil depending on the garbage at the address. Now it always return nil.
-
Serge Petrenko authored
When replication is configured via some user created in box.once() function and box.once() takes more than replication_timeout seconds to execute, appliers recieve ER_NO_SUCH_USER error, which they don't handle. This leads to occasional test failures in replication suite. Fix this by handling the aforementioned case in applier_f() and add a test case. Closes #3637
-
Mergen Imeev authored
Tuple method 'tomap' in some cases worked improperly if tuple length less than it should be according to space format. Fixed in this patch. Closes #3631
-
- Aug 21, 2018
-
-
Konstantin Belyavskiy authored
During startup tarantoolctl ignores 'pid_file' option and set it to default value. This cause a fault if user tries to execute config with option set. In case of being started with tarantoolctl shadow this option with additional wrapper around box.cfg. Closes #3214
-
Konstantin Belyavskiy authored
Fixing build under FreeBSD: Undefined symbol "iconv_open" Add compile time build check with FindICONV.cmake and a wrapper to import relevant symbol names with include file. Closes #3441
-
Serge Petrenko authored
lua_pushapplier() had an inexplicably small buffer for uri representation. Enlarged the buffer. Also lua_pushapplier() didn't take into account that uri_format() could return a value larger than buffer size. Fixed. Closes #3630
-
- Aug 14, 2018
-
-
Serge Petrenko authored
On bootstrap and after initial configuration replication_connect_quorum was ignored. The instance tried to connect to every replica listed in replication parameter, and failed if it wasn't possible. The patch alters this behaviour. An instance still tries to connect to every node listed in box.cfg.replication, but does not raise an error if it was able to connect to at least replication_connect_quorum instances. Closes #3428 @TarantoolBot document Title: replication_connect_quorum is not ignored Now on replica set bootstrap and in case of replication reconfiguration (e.g. calling box.cfg{replication=...} for the second time) tarantool doesn't fail, if it couldn't connect to to every replica, but could connect to replication_connect_quorum replicas. If after replication_connect_timeout seconds the instance is not connected to at least replication_connect_quorum other instances, we throw an error.
-
Serge Petrenko authored
Add start arguments to replication test instances to control replication_timeout and replication_connect_timeout settings between restarts. Needed for #3428
-
Serge Petrenko authored
Allows to pass arguments to servers started with create_cluster().
-
- Aug 13, 2018
-
-
Olga Arkhangelskaia authored
During syslog reconnect we lose nonblock flag. This leads to misbehavior while logging. Tarantool hangs forever. Closes #3615
-
- Aug 11, 2018
-
-
Vladimir Davydov authored
Reproduce file: - [box/access.test.lua, null] - [box/iterator.test.lua, null] - [box/bitset.test.lua, null] The issue happens, because box/bitset.lua:dump() uses iterate(), which gets cleared by box/iterator test. Fix this by using utils.iterate() instead.
-
- Aug 10, 2018
-
-
Vladimir Davydov authored
index.update() looks up the old tuple in the primary index, applies update operations to it, then writes a DELETE statement to secondary indexes to delete the old tuple and a REPLACE statement to all indexes to insert the new tuple. It also sets a column mask for both DELETE and REPLACE statements. The column mask is a bit mask which has a bit set if the corresponding field is updated by update operations. It is used by the write iterator for two purposes. First, the write iterator skips REPLACE statements that don't update key fields. Second, the write iterator turns a REPLACE that has a column mask that intersects with key fields into an INSERT (so that it can get annihilated with a DELETE when the time comes). The latter is correct, because if an update() does update secondary key fields, then it must have deleted the old tuple and hence the new tuple is unique in terms of extended key (merged primary and secondary key parts, i.e. cmp_def). The problem is that a bit may be set in a column mask even if the corresponding field does not actually get updated. For example, consider the following example. s = box.schema.space.create('test', {engine = 'vinyl'}) s:create_index('pk') s:create_index('sk', {parts = {2, 'unsigned'}}) s:insert{1, 10} box.snapshot() s:update(1, {{'=', 2, 10}}) The update() doesn't modify the secondary key field so it only writes REPLACE{1, 10} to the secondary index (actually it writes DELETE{1, 10} too, but it gets overwritten by the REPLACE). However, the REPLACE has column mask that says that update() does modify the key field, because a column mask is generated solely from update operations, before applying them. As a result, the write iterator will not skip this REPLACE on dump. This won't have any serious consequences, because this is a mere optimization. What is worse, the write iterator will also turn the REPLACE into an INSERT, which is absolutely wrong as the REPLACE is preceded by INSERT{1, 10}. If the tuple gets deleted, the DELETE statement and the INSERT created by the write iterator from the REPLACE will get annihilated, leaving the old INSERT{1, 10} visible. The issue may result in invalid select() output as demonstrated in the issue description. It may also result in crashes, because the tuple cache is very sensible to invalid select() output. To fix this issue let's clear key bits in the column mask if we detect that an update() doesn't actually update secondary key fields although the column mask says it does. Closes #3607
-
- Aug 08, 2018
-
-
Mergen Imeev authored
In some cases operation box.snapshot() takes longer than expected. This leads to situations when the previous error is reported instead of the new one. Now these errors completely separated. Closes #3599
-
- Aug 07, 2018
-
-
Kirill Yukhin authored
Print reproduce file.
-
Sergei Voronezhskii authored
The -j -1 used to legacy consistent mode. Reducing the number of jobs to one by switching to -j 1, uses same part of the code as in parallel mode. The code in parallel mode kills hung tests. Part of https://github.com/tarantool/test-run/issues/106
-
Vladimir Davydov authored
It is dangerous to call box.cfg() concurrently from different fibers. For example, replication configuration uses static variables and yields so calling it concurrently can result in a crash. To make sure it never happens, let's protect box.cfg() with a lock. Closes #3606
-
- Aug 03, 2018
-
-
Alexander Turenko authored
Fixes #3489.
-
- Aug 02, 2018
-
-
Alexander Turenko authored
* support expected fail of non-default server * fix format_process function: prevent crash on some machines (#92) * print whole reject file when a test failed (#102)
-
Eugine Blikh authored
No more `include/yaml.h` and `lib/libyaml_static.a` installs. closes gh-3547
-
- Jul 26, 2018
-
-
Konstantin Belyavskiy authored
Fix 'fio.rmtree' to remove a non empty directories. And update test. Closes #3258
-
- Jul 22, 2018
-
-
Vladimir Davydov authored
When a replica is removed from the cluster table, the corresponding replica struct isn't destroyed unless both the relay and the applier attached to it are stopped, see replica_clear_id(). Since replica struct is a holder of the garbage collection state, this means that in case an evicted replica has an applier or a relay that fails to exit for some reason, garbage collection will hang. A relay thread stops as soon as the replica it was started for receives a row that tries to delete it from the cluster table (because this isn't allowed by the cluster space trigger, see on_replace_dd_cluster()). If a replica isn't running, the corresponding relay can't run as well, because writing to a closed socket isn't allowed. That said, a relay can't block garbage collection. An applier, however, is deleted only when replication is reconfigured. So if a replica that was evicted from the cluster was configured as a master, its replica struct will hang around blocking garbage collection for as long as the replica remains in box.cfg.replication. This is what happens in #3546. Fix this issue by forcefully unregistering a replica with the garbage collector when it is deleted from the cluster table. This is OK as it won't be able to resubscribe and so we don't need to keep WALs for it any longer. Note, the relay thread may still be running when a replica is deleted from the cluster table, in which case we can't unregister it with the garbage collector right away, because the relay may need to access the garbage collection state. In such a case, leave the job to replica_clear_relay, which is called as soon as the relay thread exits. Closes #3546
-
- Jul 19, 2018
-
-
Kirill Shcherbatov authored
_say function was called with invalid arguments. Thank @sorc1 for patch. Closes #3433.
-
Olga Arkhangelskaia authored
Strdup may silently fail without any message from tarantool. Patch adds this checks.
-
- Jul 17, 2018
-
-
Kirill Shcherbatov authored
Net.box didn't pass options containing iterator to server side. There were also invalid results for two :count tests in net.box.result file. Thanks @ademenev for contributing problem and help with problem locating. Closes #3262.
-
- Jul 16, 2018
-
-
Georgy Kirichenko authored
If a fiber pool reuses already canceled fiber then the fiber reports an error for any next request. Now canceled fiber returns and fiber pool creates a new one. Fixes #3527
-
- Jul 13, 2018
-
-
Kirill Yukhin authored
New commit in third_party/libyaml downgrades required cmake version.
-
Ivan Kosenko authored
-
- Jul 12, 2018
-
-
Kirill Shcherbatov authored
Need to update tests as with fixup in upstrem commit baf636a74b4b6d055d93e2d01366d6097eb82d90 Author: Tina Müller <cpan2@tinita.de> Date: Thu Jun 14 19:27:04 2018 +0200 The closing single quote needs to be indented... if it's on its own line. Closes #3275.
-