- Feb 06, 2018
-
-
Vladimir Davydov authored
When a tarantool instance starts for the first time (the local directory is empty), it chooses the peer with the lowest UUID as the bootstrap master. As a result, one cannot reliably rebootstrap a cluster node (delete all local files and restart): if the node happens to have the lowest UUID in the cluster after restart, it will assume that it is the leader of a new cluster and bootstrap locally, splitting the cluster in two. To fix this problem, let's always give preference to peers with a higher vclock when choosing a bootstrap master and only fall back on selection by UUID if two or more peers have the same vclock. To achieve that, we need to introduce a new iproto request type for fetching the current vclock of a tarantool instance (we cannot squeeze the vclock in the greeting, because the latter is already packed). The new request type is called IPROTO_REQUEST_VOTE so that in future it can be reused for a more sophisticated leader election algorithm. It has no body and does not require authentication. In reply to such a request, a tarantool instance will send IPROTO_OK and its current vclock. If the version of the master is >= 1.7.7, an applier will send IPROTO_REQUEST_VOTE to fetch the master's vclock before trying to authenticate. The vclock will then be to determine the node to bootstrap from. Closes #3108
-
Vladimir Davydov authored
No functional changes, just a trivial cleanup: - Move all C functions inside extern "C" section. - Rename xrow_decode_join to xrow_decode_join_xc. - Make XXX_xc wrappers around XXX functions.
-
- Feb 05, 2018
-
-
Vladimir Davydov authored
Before commit 2788dc1b ("Add APPLIER_READY state") we only printed the 'authenticated' message to the log in case credentials were set in the replication URI. The commit changed that: now we print the message even in case of guest connections, when applier does not send the AUTH command to the master at all. As a result if guest connections are not permitted by the master, the applier will keep printing 'authenticated' after every unsuccessful attempt to subscribe. This is misleading. Let us revert back to the behavior we had before commit 2788dc1b. Closes #3113
-
- Feb 02, 2018
-
-
Konstantin Nazarov authored
As there is now support for Alpine Linux in packpack, there is no longer any need in a custom Dockerfile builder.
-
Konstantin Nazarov authored
This patch is to get in line with the Alpine support in packpack: - don't rely on git, and use a source package instead - add subpackages with debug symbols, documentation and headers - don't build tarantool 3 times in a row
-
Vladimir Davydov authored
If one node of a cluster is rebootstrapped (i.e. restarted from an empty directory with the same configuration), other replicas will never try to reconnect to it - the appliers will simply stop with the ER_REPLICASET_UUID_MISMATCH error. The only way to fix this is reconfigure replication on all other nodes. Let's fix this problem by reassigning an applier to a new replica in case its UUID mismatches the UUID of the replica it is currently assigned to. Cannot write a test, because rebootstrap is unreliable - see #3108. Closes #3112
-
Vladimir Davydov authored
If the master closes its end of the socket when there are still unread rows available for the replica to apply, we will get tons of EPIPE error messages at the replica's side, emitted every time it attempts to send an ACK back to the master (i.e. one per each row left in the socket): main/107/applierw/ sio.cc:303 !> SystemError writev(2), called on fd 12, aka 127.0.0.1:50852: Broken pipe To avoid that, let's make the applier writer fiber (the one that sends ACKs) exit immediately if it receives EPIPE error while trying to send an ACK. Closes #2945
-
Konstantin Osipov authored
-
Konstantin Belyavskiy authored
* Fix force_recovery behaviour on empty xlog files and ones with corrupted header. * Add a test * Update xlog-py/empty.test.py, since corrupted xlog no longer leads to a broken startup. Closes #3026, #3076
-
Konstantin Osipov authored
For backward compatibility, automatically grant CREATE, DROP ACL to all users who have READ and WRITE access. Our automatic upgrade script automatically grants CREATE and ALTER to users with READ/WRITE access on universe, but this is insufficient, since new users could be created after upgrade. Follow up on gh-945 and gh-3089.
-
IlyaMarkovMipt authored
* Add privileges Create, Drop, Alter on universe support. * Fix super role behavior, allowing users with this role to drop any objects. Relates #945 Closes #3089
-
IlyaMarkovMipt authored
* Add possibility to use file:read without len parameter. In this case, whole file will be read. Closes #2925
-
Vladimir Davydov authored
11 was initially used for SQL EXECUTE in 1.8, but 1.7 commit b73030f2 ("iproto: add IPROTO_NOP request type") reassigned it to NOP so after the merge SQL EXECUTE landed at 12, which broke connectors. Let's shift NOP to 12 and move EXECUTE back to 11. This is OK as 1.7.7 which introduced the new iproto type hasn't been officially released yet.
-
- Feb 01, 2018
-
-
Kirill Yukhin authored
fio.is_mount() routine is not working properly on Docker, since it uses non-transparent incremental filesystem and hence each new file has new device id which in turns means for fio.is_mount() that its parent is actually mount. But it is not. Remove the routine and corresponding test entries.
-
Vladimir Davydov authored
- Start a long call, which runs forever - Close the connection - Stop the fiber running the long call - Check that the connection does not leak, box.session.on_disconnect trigger is called once the fiber has been stopped Suggested by @kostja Follow-up #946
-
Konstantin Osipov authored
-
Konstantin Osipov authored
-
Roman Tsisyk authored
-
Roman Tsisyk authored
Try to fix coverage.
-
- Jan 31, 2018
-
-
Vladimir Davydov authored
This patch modifies the replication configuration procedure so as to fully conform to the specification presented in #2958. In a nutshell, now box.cfg() tries to synchronize all connected replicas before returning. If it fails to connect enough replicas to form a quorum, it leaves the server in a degraded 'orphan' mode, which is basically read-only. More details below. First of all, it's worth mentioning that we already have 'orphan' status in Tarantool (between 'loading' and 'hot_standby'), but it has nothing to do with replication. Actually, it's unclear why it was introduced in the first place so we agreed to silently drop it. We assume that a replica is synchronized if its lag is not greater than the value of new configuration option box.cfg.replication_sync_lag. Otherwise a replica is considered to be syncing and has "sync" status. If replication_sync_lag is unset (nil) or set to TIMEOUT_INFINITY, then a replica skips the "sync" state and switches to "follow" immediately. The default value of replication_sync_lag is 10 seconds, but it is ignored (assumed to be inf) in case the master is running tarantool older than 1.7.7, which does not send heartbeat messages. If box.cfg() is called for the very first time (bootstrap) for a given instance, then 1. It tries to connect to all configured replicas for as long as it takes (replication_timeout isn't taken into account). If it fails to connect to at least one replica, bootstrap is aborted. 2. If this is a cluster bootstrap and the current instance turns out to be the new cluster leader, then it performs local bootstrap and switches to 'running' state and leaves box.cfg() immediately. 3. Otherwise (i.e. if this is bootstrap of a slave replica), then it bootstraps from a remote master and then stays in 'orphan' state until it synchronizes with all replicas before switching to 'running' state and leaving box.cfg(). If box.cfg() is called after bootstrap, in order to recover from the local storage, then 1. It recovers the last snapshot and xlogs stored in the local directory. 2. Then it switches to 'orphan' mode and tries to connect to at least as many replicas as specified by box.cfg.replication_connect_quorum for a time period which is a multiple of box.cfg.replication_timeout (4x). If it fails, it doesn't abort, but leaves box.cfg() in 'orphan' mode. The state will switch to 'running' asynchronously as soon as the instance has synced with 'replication_connect_quorum' replicas. 3. If it managed to connect to enough replicas to form a quorum at step 2, it synchronizes with them: box.cfg() doesn't return until at least 'replication_connect_quorum' replicas have been synchronized. If box.cfg() is called after recovery to reconfigure replication, then it tries to connect to all specified replicas within a time period which is a multiple of box.cfg.replication_timeout (4x). The value of box.cfg.replication_connect_quorum isn't taken into account, neither is the value of box.cfg.replication_sync_lag - box.cfg() returns as soon as all configured replicas have been connected. Just like any other status, the new one is reflected by box.info.status. Suggested by @kostja Follow-up #2958 Closes #999
-
Kirill Yukhin authored
rb_gen used incorrect order of function attributes: sttic MAYBE_UNUSED, which caused fails while compiling w/ Clang. Change order of mentioned attributes.
-
Roman Tsisyk authored
-
Roman Tsisyk authored
-
Roman Tsisyk authored
- Remove old versions of Fedora and Ubuntu - Add Fedora 26 and Fedora 27
-
- Jan 30, 2018
-
-
IlyaMarkovMipt authored
* Add following behavior: Owner of object can't utilize her own objects if she has not usage access. * Change access checks of space, sequence, function objects Similar checks of other objects are performed in alter.cc. Closes gh-3089
-
IlyaMarkovMipt authored
* Fix typo in net_box.lua in rare error case
-
imarkov authored
* Delete contructor delegation in ClientError * Move code body from one contructor to another
-
Vladimir Davydov authored
Currently, a realy sends a heartbeat message to the replica only if there was no WAL events for 'replication_timeout' seconds. As a result, a replica that happens to be uptodate on subscribe will not update the lag until the timeout passes, which may delay configuration. Let's make relay send a heartbeat message right after subscribe in case the replica is uptodate.
-
Vladimir Davydov authored
These operations are going to become more complicated than just setting a pointer so let's introduce helpers for them.
-
Vladimir Davydov authored
There is already a handful of global variables describing the replica set state and there is going to be more so let's consolidate them in a singleton struct: replicaset => replicaset.hash replica_pool => replicaset.pool anon_replicas => replicaset.anon replicaset_vclock => replicaset.vclock While we are at it, let's also move INSTANCE_UUID definition from xrow.c to replication.cc, where it truly belongs. The only reason I see for it to be defined in xrow.c is to compile vinyl unit tests without linking replication.o, but we can easily circumvent this by defining INSTANCE_UUID in vy_iterators_helpers.c. Suggested by @kostja
-
Vladimir Davydov authored
replicaset_connect() leaves appliers that failed to connect within the specified time period running. To prevent them from entering 'subscribe' stage prematurely (i.e. before replicaset_follow() is called), we set replica->pause_on_connect flag, which will force them to freeze upon successful connection. We clear this flag in replicaset_follow(). This juggling with flags looks ugly. Instead, let's stop failed appliers in replicaset_connect() and restart them in replicaset_follow(). Follow-up #2958
-
IlyaMarkovMipt authored
Introduce in fio new methods taken from Python os.path: * fio.path.exists() * fio.path.lexists() * fio.path.is_file() * fio.path.is_dir() * fio.path.is_mount()
-
Konstantin Osipov authored
-
Ilya Konyukhov authored
It builds the last stable tarantool release at the moment (1.7.6). It clones tarantool from github repo, then updates submodules, then compiles tarantool, small and msgpuck. Then it packs everything into a package and cleans everything after itself
-
Vladimir Davydov authored
If vinyl fails to do memory dumps in time on a replica (e.g. it ran out of disk space), replication will stop forever with an error, and the admin will have to call box.cfg() to restart replication. Since replication is asynchronous anyway, we shouldn't stop it on vinyl timeout - it isn't critical as the replica will recover as soon as the admin fixes the problem (e.g. frees up some disk space). Let's ignore vinyl timeout altogether for applier fibers (currently, we ignore it only on join) - the admin can monitor how badly a replica lags behind the master via box.info.replication lag/idle. Closes #3087
-
Vladimir Davydov authored
To register a BEFORE trigger for a space, call space:before_replace() function. Similarly to space:on_replace(), this function takes a new trigger callback as the first argument and a function to remove from the registered trigger list as the second optional argument. Trigger callbacks are executed from space_execute_dml(), right before passing down a request to the engine implementation, but after resolving the space sequence. Just like on_replace, a before_replace callback is passed old and new tuples, but it can also return a tuple or nil, which will affect the current statement as follows: - If a callback function returns the old tuple, the statement is ignored and IPROTO_NOP is written to xlog to bump LSN. - If a callback function returns the new tuple or doesn't return anything, the statement is executed as is. - If a callback function returns nil, the statement is turned into DELETE. - If a callback function returns a tuple, the statement is turned into REPLACE for this tuple. Other return values result in ER_BEFORE_REPLACE_RET error. Note, the trigger must not change the primary key of the old tuple, because that would require splitting the resulting statement into two - DELETE and REPLACE. The new trigger can be used to resolve asynchronous replication conflicts as illustrated by replication/before_replace test. Closes #2993
-
- Jan 29, 2018
-
-
Vladimir Davydov authored
To implement space:before_replace trigger, we need to introduce a new request type for bumping LSN, because the new trigger may turn any DML operation into a no-op. Let's call it IPROTO_NOP. It is treated as DML (passed to apply_row, etc), but it is ignored by space_execute_dml() and so doesn't actually modify anyting, only bumps LSN on the server. The new request type has name "NOP" (for xlog reader), however it isn't reported via box.stat(). Needed for #2993
-
Vladimir Davydov authored
This patch moves helpers used to fix requests after certain DML operations to a separate source file. Currently, there are only two of them, but there are going to be more so it seems to be a good idea to isolate them. No functional changes. Suggested by @kostja
-
Vladimir Davydov authored
I'm planning to call BEFORE triggers in space.c. Since a BEFORE trigger can change the request type, we can't call it from functions handling particular kinds of requests (space_execute_replace() and others). So let's move the switch-case statement that executes different space callbacks depending on the request type from process_rw() to a new function, space_execute_dml(), defined in space.c. We will execute BEFORE triggers from this new function, right before dispatching the request by its type. Needed for #2993
-
- Jan 27, 2018
-
-
IlyaMarkovMipt authored
-