- Jul 26, 2018
-
-
Konstantin Belyavskiy authored
Fix 'fio.rmtree' to remove a non empty directories. And update test. Closes #3258
-
- Jul 22, 2018
-
-
Vladimir Davydov authored
When a replica is removed from the cluster table, the corresponding replica struct isn't destroyed unless both the relay and the applier attached to it are stopped, see replica_clear_id(). Since replica struct is a holder of the garbage collection state, this means that in case an evicted replica has an applier or a relay that fails to exit for some reason, garbage collection will hang. A relay thread stops as soon as the replica it was started for receives a row that tries to delete it from the cluster table (because this isn't allowed by the cluster space trigger, see on_replace_dd_cluster()). If a replica isn't running, the corresponding relay can't run as well, because writing to a closed socket isn't allowed. That said, a relay can't block garbage collection. An applier, however, is deleted only when replication is reconfigured. So if a replica that was evicted from the cluster was configured as a master, its replica struct will hang around blocking garbage collection for as long as the replica remains in box.cfg.replication. This is what happens in #3546. Fix this issue by forcefully unregistering a replica with the garbage collector when it is deleted from the cluster table. This is OK as it won't be able to resubscribe and so we don't need to keep WALs for it any longer. Note, the relay thread may still be running when a replica is deleted from the cluster table, in which case we can't unregister it with the garbage collector right away, because the relay may need to access the garbage collection state. In such a case, leave the job to replica_clear_relay, which is called as soon as the relay thread exits. Closes #3546
-
- Jul 17, 2018
-
-
Kirill Shcherbatov authored
Net.box didn't pass options containing iterator to server side. There were also invalid results for two :count tests in net.box.result file. Thanks @ademenev for contributing problem and help with problem locating. Closes #3262.
-
- Jul 16, 2018
-
-
Georgy Kirichenko authored
If a fiber pool reuses already canceled fiber then the fiber reports an error for any next request. Now canceled fiber returns and fiber pool creates a new one. Fixes #3527
-
- Jul 12, 2018
-
-
Kirill Shcherbatov authored
Need to update tests as with fixup in upstrem commit baf636a74b4b6d055d93e2d01366d6097eb82d90 Author: Tina Müller <cpan2@tinita.de> Date: Thu Jun 14 19:27:04 2018 +0200 The closing single quote needs to be indented... if it's on its own line. Closes #3275.
-
- Jul 10, 2018
-
-
Kirill Shcherbatov authored
Now it is possible to specify a number in exponential form via all formats allowed by json standard. json.decode('{"remained_amount":2.0e+3}') json.decode('{"remained_amount":2.0E+3}') json.decode('{"remained_amount":2e+3}') json.decode('{"remained_amount":2E+3}') <-- fixed Closes #3514.
-
- Jul 09, 2018
-
-
Serge Petrenko authored
Schema version is used by both clients and internal modules to check whether there vere any updates in spaces and indices. While clients only need to be notified when there is a noticeable change, e.g. space is removed, internal components also need to be notified when something like space:truncate() happens, because even though this operation doesn't change space id or any of its indices, it creates a new space object, so all the pointers to the old object have to be updated. Currently both clients and internals share the same schema version, which leads to unnecessary updates on the client side. Fix this by implementing 2 separate counters for internal and public use: schema_state gets updated on every change, including recreation of the same space object, while schema_version is updated only when there are noticable changes for the clients. Introduce a new AlterOp to alter.cc to update public schema_version. Now all the internals reference schema_state, while all the clients use schema_version. box.iternal.schema_version() returns schema_version (the public one). Closes: #3414
-
- Jul 05, 2018
-
-
Kirill Shcherbatov authored
Fixed FreeBSD build: there were conflicting types bitset declared in lib/bitset and _cpuset.h that is the part of pthread_np.h used on FreeBSD. Resolves #3046.
-
Kirill Yukhin authored
After read-only flag is dropped, a test space is created successfully and on next launch creation will fail since it is not droppped. Drop the space. Closes #3507
-
- Jul 04, 2018
-
-
Serge Petrenko authored
box.session.su() set effective user to user after its execution, which made nested calls to it not work. Fixed this by saving current effective user and recovering from the save after sudo execution. This opened up a bug in box.schema.user.drop(): it has unnecessary check for privelege PRIV_REVOKE, which never gets granted to anyone but admin. Also fixed this by adding one extra box.session.su() call. Closes #3090, #3492
-
- Jul 03, 2018
-
-
Konstantin Osipov authored
Before this patch, memtx would silently roll back a multi-statement transaction on yield, switching the session to autocommit mode. It would do nothing in case yield happened in a sub-statement in auto-commit mode. This could lead to nasty/painful to debug side-effects in malformed Lua programs. Fix by adding a special transaction state - aborted, and enter this state in case of implicit yield. Check for what happens when a sub-statement yields. Check that yield trigger is removed by a rollback. Fixes gh-2631 Fixes gh-2528
-
- Jun 29, 2018
-
-
Konstantin Osipov authored
fiber->on_yield triggers were not invoked in fiber_call(), which meant that memtx transaction was not rolled back by fiber.create(). Fixes gh-3493
-
- Jun 28, 2018
-
-
Ilya Markov authored
Bug: During parsing http headers, long headers names are truncated to zero length, but values are not ignored. Fix this with adding parameter max_header_name_length to http request. If header name is bigger than this value, header name is truncated to this length. Default value of max_header_name_length is 32. Do some refactoring with renaming long names in http_parser. Closes #3451
-
Ilya Markov authored
Bug: Header parser validates http status line and besides saving http status, saves valid characters to header name, which is wrong. Fix this with skipping status line after validation without saving it as a header. In scope of #3451
-
Vladimir Davydov authored
If tarantool is stopped while writing a snapshot or a vinyl run file, inprogress files will never be removed. Fix this by collecting those files on recovery completion. Original patch by @IlyaMarkovMipt. Reworked by @locker. Closes #3406
-
Konstantin Osipov authored
A minor follow up on the fix for gh-3452 (http.client timeout bug)
-
Ilya Markov authored
Current implementation of http.client relies on fiber_cond which is set after the request was registered and doesn't consider the fact that response may be handled before the set of fiber_cond. So we may have the following situation: 1. Register request in libcurl(curl_multi_add_handle in curl_execute). 2. Receive and process response, fiber_cond_signal on cond_var which no one waits. 3. fiber_cond_wait on cond which is already signaled. Wait until timeout is fired. In this case user have to wait timeout, though data was received earlier. Fix this with adding extra flag in_progress to curl_request struct. Set this flag true before registering request in libcurl and set it false when request is finished before fiber_cond_signal. When in_progress flag is false, don't wait on cond variable. Add 1 error injection. Closes #3452
-
- Jun 27, 2018
-
-
Konstantin Osipov authored
schema_version must be passed to perform_request in 1.9
-
Vladislav Shpilevoy authored
When a connection is closed, some of long-poll requests still may by in TX thread with non-discarded input. If a connection is closed, and then an input is discarded, then connection must not try to read new data. The bug was introduced here: f4d66dae by me. Closes #3400
-
- Jun 25, 2018
-
-
Vladimir Davydov authored
If called on a unix socket, bind(2) creates a new file, see unix(7). When we stop a unix tcp server, we should remove that file. Currently, we do it from the tcp server fiber, after the server loop is broken, which happens when the socket is closed, see tcp_server_loop(). This opens a time window for another tcp server to reuse the same path: main fiber tcp server loop ---------- --------------- -- Start a tcp server. s = socket.tcp_server('unix/', sock_path, ...) -- Stop the server. s:close() socket_readable? => no, break loop -- Start a new tcp server. Use the same path as before. -- This function succeeds, because the socket is closed -- so tcp_server_bind_addr() will clean up by itself. s = socket.tcp_server('unix/', sock_path, ...) tcp_server_bind tcp_server_bind_addr socket_bind => EADDRINUSE tcp_connect => ECONNREFUSED -- Remove dead unix socket. fio.unlink(addr.port) socket_bind => success -- Deletes unix socket used -- by the new server. fio.unlink(addr.port) In particular, the race results in sporadic failures of app-tap/console test, which restarts a tcp server using the same file path. To fix this issue, let's close the socket after removing the socket file. This is absolutely legit on any UNIX system, and this eliminates the race shown above, because a new server that tries to bind on the same path as the one already used by a dying server will not receive ECONNREFUSED until the socket fd is closed and hence the file is removed. A note about the app-tap/console test. After this patch is applied, socket.close() takes a little longer for unix tcp server, because it yields twice, once for removing the socket file and once for closing the socket file descriptor. As a result, on_disconnect() trigger left from the previous test case has time to run after session.type() check. Actually, those triggers have already been tested and we should have cleared them before proceeding to the next test case. So instead of adding two new on_disconnect checks to the test plan, let's clear the triggers before session.type() test case and remove 3 on_connect and 5 on_auth checks from the test plan. Closes #3168
-
Vladislav Shpilevoy authored
Consider this packet: msgpack = require('msgpack') data = msgpack.encode(18400000000000000000)..'aaaaaaa' Tarantool interprets 18400000000000000000 as size of a coming iproto request, and tries with no any checks to allocate buffer of such size. It calculates needed capacity like this: capacity = start_value; while (capacity < size) capacity *= 2; Here it is possible that on i-th iteration 'capacity' < 'size', but 'capacity * 2' overflows 64 bits and becomes < 'size' again, so this loop never ends and occupies 100% CPU. Strictly speaking overflow has undefined behavior. On the original system it led to nullifying 'capacity'. Such size is improbable as a real packet gabarits, but can appear as a result of parsing of some invalid packet, first bytes of which accidentally appears to be valid MessagePack uint. This is how the bug emerged on the real system. Lets restrict the maximal packet size to 2GB. Closes #3464
-
- Jun 14, 2018
-
-
Vladimir Davydov authored
Since tuples stored in temporary spaces are never written to disk, we can always delete them immediately, even when a snapshot is in progress. Closes #3432
-
- Jun 01, 2018
-
-
Vladimir Davydov authored
The callback invoked upon compaction completion uses checkpoint_last() to determine whether compacted runs may be deleted: if the max LSN stored in a compacted run (run->dump_lsn) is greater than the LSN of the last checkpoint (gc_lsn) then the run doesn't belong to the last checkpoint and hence is safe to delete, see commit 35db70fa ("vinyl: remove runs not referenced by any checkpoint immediately"). The problem is checkpoint_last() isn't synced with vylog rotation - it returns the signature of the last successfully created memtx snapshot and is updated in memtx_engine_commit_checkpoint() after vylog is rotated. If a compaction task completes after vylog is rotated but before snap file is renamed, it will assume that compacted runs do not belong to the last checkpoint, although they do (as they have been appended to the rotated vylog), and delete them. To eliminate this race, let's use vylog signature instead of snap signature in vy_task_compact_complete(). Closes #3437
-
- May 31, 2018
-
-
Vladimir Davydov authored
latch_destroy() and fiber_cond_destroy() are basically no-op. All they do is check that latch/cond is not used. When a global latch or cond object is destroyed at exit, it may still have users and this is OK as we don't stop fibers at exit. In vinyl this results in the following false-positive assertion failures: src/latch.h:81: latch_destroy: Assertion `l->owner == NULL' failed. src/fiber_cond.c:49: fiber_cond_destroy: Assertion `rlist_empty(&c->waiters)' failed. Remove "destruction" of vy_log::latch to suppress the first one. Wake up all fibers waiting on vy_quota::cond before destruction to suppress the second one. Add some test cases. Closes #3412
-
- May 29, 2018
-
-
Georgy Kirichenko authored
Handle cases if instance_uuid and replicaset_uuid are present in box.cfg and have same values as already set. Fixes #3421
-
- May 25, 2018
-
-
Konstantin Osipov authored
replication: make replication_connect_timeout dynamic
-
Konstantin Osipov authored
-
Vladimir Davydov authored
If a replica disconnects while sync is in progress, box.cfg{} may stop syncing leaving the instance in 'orphan' mode. This will happen if not enough replicas are connected to form a quorum. This makes sense e.g. on network error, but not when a replica is loading, because in the latter case it should be up and running quite soon. Let's account replicas that disconnected because they haven't completed initial configuration yet and continue syncing if connected + loading > quorum. Closes #3422
-
Konstantin Osipov authored
The default of 4 seconds is too low to bootstrap a large cluster.
-
- May 24, 2018
-
-
Georgy Kirichenko authored
In some cases when an applier processing yielded, other applier might start some conflicting operation and break replication and database consistency. Now applier locks a per-server-id latch before processing a transaction. This guarantees that there is only one applier request for each server in progress at each given moment. The problem was very rare until full mesh topologies in vinyl became a commonplace. Fixes gh-3339
-
- May 22, 2018
-
-
Konstantin Belyavskiy authored
Another broken case. Adding a new replica to cluster: + if (replica->applier->remote_is_ro && + replica->applier->vclock.signature == 0) In this case we may got an ER_READONLY, since signature is not 0. So leader election now has two phases: 1. To select among read-write replicas. 2. If no such found, try old algorithm for backward compatibility (case then all replicas exist in cluster table). Closes #3257
-
- May 17, 2018
-
-
Vladimir Davydov authored
If a compacted run was created after the last checkpoint, it is not needed to recover from any checkpoint and hence can be deleted right away to save disk space. Closes #3407
-
- May 15, 2018
-
-
Vladimir Davydov authored
Improve the test by decreasing range_size so that it creates a lot of ranges for test indexes, not just one. This helped find bugs causing the crash described in #3393. Follow-up #3393
-
Alexander Turenko authored
Follows up #3396.
-
- May 08, 2018
-
-
Ilya Markov authored
In sequential launch of app-tap/console.test, tests failed with "User exists" and binding errors. Make sockets path relative. Add users cleanup. Relates #3168
-
- May 07, 2018
-
-
Georgy Kirichenko authored
Any ddl is prohibited in a multistatement transaction, there is no reason to try to lock a ddl latch in tis case. Locking for already locked latch will cause an yield and a silent transaction rollback, and this will crash or assert tarantool server. Fixes #2783
-
- May 03, 2018
-
-
Vladislav Shpilevoy authored
Any option of base64 leads to urlsafe encoding. It is wrong, and caused by incorrect flag checking. Fix it. Closes #3358
-
Konstantin Osipov authored
* rename request_limit.test.lua to net_msg_max.test.lua * make net_msg_max.test.lua stable (courtesy of @Gerold103) * exclude disconnect messages from iproto_msg_max limit * add a separate warning for throttling based on readahead buffer overflow
-
Vladislav Shpilevoy authored
Starting with 1.9, CALL request which yields releases the intput buffer in net thread before CALL is complete. A release trigger is fired when the CALL fiber yields. The problem is that by default the input socket is not included into poll() list of the event loop: thanks to an optimization by @kostja for strict request/response scenario, the socket is included into poll() list only after the response is sent to the client. Thus, the following could happen: * a client sends a long-polling request * the request yields and maybe never finishes * the socket is not being read until the long-polling request is finished The patch is to explicitly feed EV_READ event to the event loop on the client socket whenever we release the input buffer for a long-polling request. We may remove iproto_resume() from net_discard_input() along with this patch since iproto_resume() will be called by iproto_connection_on_input().
-
- Apr 18, 2018
-
-
Ilya Markov authored
When tuple in insert/replace request has NULL value in the field incremented by sequence, request body is changed, NULL is replaced by value taken from sequence. But request header is not updated. So Redo log, which takes body from header if header exists, writes the old version of request to wal. Fixed this with updating header value after handling the sequence. Closes #3247
-