- Apr 06, 2018
-
-
Vladimir Davydov authored
space_vtab::commit_alter is implemented only by memtx, which uses it to set bsize for a new space. However, it isn't necessary to use commit_alter for this - instead we can set bsize in prepare_alter and reset it in drop_primary_key, along with memtx_space::replace. Let's do it and zap space_vtab::commit_alter altogether, because the callback is confusing: judging by its name it should be called after WAL write, but it is called before.
-
Vladimir Davydov authored
When the last index of a memtx space is dropped, we need to delete all tuples stored in the space. We do it in space_vtab::commit_alter, but this is wrong, because this function is not rolled back and so we may get use-after-free error if we fail to write a DDL operation to WAL. To avoid that, let's delete tuples in index_vtab::commit_drop, which is called after WAL write. There's a nuance here: index_vtab::commit_drop is called if an index is rebuilt (because essentially it is drop + create) so we must elevate the reference counter of every tuple added to the new index during rebuild and, respectively, drop all the references in index_vtab::abort_create, which is called if index creation is aborted for some reason. This also means that now we iterate over all tuples twice when a primary key is rebuilt - first to build the new index, then to unreference all tuples stored in the old index. This is OK as we can make the last step asynchronous, which will also speed up the more common case of space drop. Closes #3289
-
Vladislav Shpilevoy authored
When a schema version change is detected, there is no reason to cancel and retry already sent requests. They can be already executed on a server, and their retrying leads to multiple execution. A request must be retried only if a server responded with WRONG_SCHEMA_VERSION error exactly to this request. Closes #3325
-
Konstantin Osipov authored
Introducing a formal, trackable process for server enhancement. Before working on a complex feature, please write an RFC document, describing what and how you see changed, and get it approved. All historical RFCs are kept in doc/rfc.
-
- Apr 05, 2018
-
-
Vladimir Davydov authored
Currently, space_vtab::commit_alter is called before WAL write so we can use it for preparing new indexes in vinyl. However, this is going to change soon, because actually space_vtab::commit_alter should be called after WAL write, like index_vtab::commit_drop or commit_create. Calling it before WAL write may result in use-after-free in memtx (see #3289). Besides, using this function for iterating over space indexes just feels wrong, as we have index methods invoked by AlterSpaceOp descendants for this. So let's move all the operations performed by vinyl_space_commit_alter somewhere else. Fortunately, it's easy to do without damaging code readability or efficiency: - Update of vy_lsm::pk can be done by vinyl_space_swap_index and vinyl_build_secondary_key. - vy_lsm->check_is_unique can be computed by vinyl_engine_create_space and then set by vinyl_space_swap_index. - Call to key_def_update_optionality is implied by key_def_swap, which is already called by vinyl_space_swap_index, hence it can be removed. Needed for #3289
-
Vladimir Davydov authored
Basically, index_begin_build() followed by index_end_build() is a no-op. There's absolutely no point in calling it for primary indexes after initial recovery has completed.
-
Vladimir Davydov authored
We read tuples from disk hence we should use disk_format, not mem_format. Fix it. While we are at it, let's also update the outdated comment to vy_run_rebuild_index.
-
Vladimir Davydov authored
The rtree is empty when this function is called (in fact, it is called right after creating the index), there's no need to purge it.
-
Vladimir Davydov authored
To allow extending key definition for non-empty vinyl spaces, this patch performs the following steps: - Revert commit c31dd19a ("vinyl: forbid vinyl index key definition alter") that forbade any key def alter. It isn't needed anymore. - Update key_def and cmp_def in vinyl_space_swap_index(). We simply swap the definitions between the old and new indexes in memory. Since all vinyl objects reference either vy_lsm::key_def or vy_lsm::cmp_def or both, and the change is compatible (does not change the order for existing tuples), this should work just fine. - Update key definition in vylog on ALTER. For this, we introduce a new vylog record type, VY_LOG_MODIFY_LSM, which updates key definition. To be able to replay it on recovery in case we failed to flush it before restart, we also store the LSN of the WAL record that triggered the ALTER. It also adds the following test cases: - Modify key definition of primary and secondary indexes of a non-empty space (engine/ddl). - Modify key definition before snapshot and relay it to a newly joined replica (engine/replica_join). - Make sure key definition is updated in vylog on ALTER (vinyl/layout).
-
Vladimir Davydov authored
There's no point in writing this record to snapshot, because we can store LSN of the last index dump right in VY_LOG_CREATE_LSM record.
-
Vladimir Davydov authored
So as to draw the line between LSN of index creation and LSN of last index modification, which is introduced by later in the series.
-
Vladimir Davydov authored
There are two functions in vy_stmt.c that blindly copy tuple field map, vy_stmt_dup() and vy_stmt_replace_from_upsert(). Both these functions take a tuple format to use for the new statement and require this format to be the same as the source tuple format in terms of fields definition, otherwise they'll just crash. The only reason why we did that is that back when these functions were written we used a separate format for UPSERT statements so we needed this extra argument for creating a REPLACE from UPSERT. Now it's not needed, and we can use the source tuple format instead. Moreover, passing the current tuple format to any of those functions is even harmful, because tuple format can be extended by ALTER, in which case these functions will crash if called on a statement created before ALTER. That being said, let's drop the tuple format argument.
-
Vladimir Davydov authored
space->format and cmp_def must be compatible, i.e. space->format has is_nullable flag set for a field iff it is set for all key parts indexing this field. Therefore there's no point to set is_nullable for disk_format as it must have been initialized by tuple_format_create(). Remove the pointless loop. Also, while we are at it, fix the minor memory leak - disk_format is referenced twice for the primary key.
-
Vladimir Davydov authored
A piece of code left from the inglorious past, which doesn't even have a forward declaration, let alone used anywhere. Remove it.
-
Vladimir Davydov authored
We allocate index formats in the only place, vy_lsm_new, so there's no point in this debug-only check anymore.
-
Vladimir Davydov authored
We create new formats for all indexes of the new space in vinyl_space_commit_alter() while we don't actually need to do this, because the new formats have already been created by vy_lsm_new() - all we need to do is reuse them somehow. This patch does the trick: it implements the swap_index() space virtual method for vinyl so that it swaps tuple formats between the old and new spaces.
-
Vladimir Davydov authored
This function is called by MoveIndex and ModifyIndex ALTER operations, i.e. when the index definition is not changed at all or is extended. Making this method virtual will allow to avoid reallocation of vinyl formats in vinyl_space_commit_alter().
-
Vladimir Davydov authored
If the new cmp_def of a secondary index is compatible with the old one after the primary key parts have changed, we don't need to rebuild it, we just need to update its definition.
-
Vladimir Davydov authored
If the primary key is modified, we schedule rebuild of all non-unique (including nullable) secondary TREE indexes. This is valid for memtx, but is not quite right for vinyl. For vinyl we have to rebuild all secondary indexes, because they are all non-clustered (i.e. point to tuples via primary key parts). This doesn't result in any bugs for now, because rebuild of vinyl indexes is not supported, but hopefully this is going to change soon. So let's introduce a new virtual index method, index_vtab::depends_on_pk, which returns true iff the index needs to be updated if the primary key changes, and define this new method for vinyl and memtx TREE indexes.
-
Vladimir Davydov authored
The new method is called after successful update of index definition. It is passed the signature of the WAL record that committed the operation. It will be used by Vinyl to update key definition in vylog.
-
Konstantin Osipov authored
-
Ilya Markov authored
* Remove rewriting format of default logger in case of syslog option. * Add facility option parsing and use parsed results in format message according to RFC3164. Possible values and default value of syslog facility are taken from nginx (https://nginx.ru/en/docs/syslog.html) * Move initialization of logger type and format fucntion before initialization of descriptor in log_XXX_init, so that we can test format function of syslog logger. Closes gh-3244.
-
- Apr 04, 2018
-
-
Vladimir Davydov authored
The only difference between format of UPSERT statements and format of other DML statements of the same index is that the former reserves one byte for UPSERT counter, which is needed to schedule UPSERT squashing. Since we store UPSERT counter on lsregion now, we don't need a special format for UPSERTs anymore. Remove it.
-
Vladimir Davydov authored
Currently, we store upsert counter in tuple metadata (that's what upsert_format is for), but since it's only relevant for tuples of the memory level, we can store it on lsregion, right before tuple data. Let's do it now so that we can get rid of upsert_format.
-
Kirill Yukhin authored
-
Alexander Turenko authored
Filed gh-3311 to remove this export soon. Fixes #3310.
-
- Apr 03, 2018
-
-
Konstantin Osipov authored
-
Vladimir Davydov authored
If the size of a transaction is greater than the configured memory limit (box.cfg.vinyl_memory), the transaction will hang on commit for 60 seconds (box.cfg.vinyl_timeout) and then fail with the following error message: Timed out waiting for Vinyl memory quota This is confusing. Let's fail such transactions immediately with OutOfMemory error. Closes #3291
-
- Apr 02, 2018
-
-
Arseny Antonov authored
-
Arseny Antonov authored
-
- Mar 30, 2018
-
-
Konstantin Belyavskiy authored
In case of sudden power-loss, if data was not written to WAL but already sent to remote replica, local can't recover properly and we have different datasets. Fix it by using remote replica's data and LSN comparison. Based on @GeorgyKirichenko proposal and @locker race free check. Closes #3210
-
Konstantin Belyavskiy authored
Stay in orphan (read-only) mode until local vclock is lower than master's to make sure that datasets are the same across replicaset. Update replication/catch test to reflect the change. Suggested by @kostja Needed for #3210
-
Vladimir Davydov authored
Closes #3148
-
Vladislav Shpilevoy authored
Text console tried to learn about SIGPIPE before its raising by read-before-write. If a socket is readable, but read returns 0, then it is closed, and writing to it can raise SIGPIPE. But Tarantool ignores SIGPIPE, so the process will not be terminated, write() just returns -1. The original code checks for SIGPIPE, because when Tarantool is run under debugger (gdb or lldb), the debugger by default sets its own signal handlers, and SIGPIPE terminates the process. But debugger settings can be changed to ignore SIGPIPE too, so lets remove this overengineering from the console code.
-
Vladislav Shpilevoy authored
If a remote host is unreachable on the first connection attempt, and reconnect_after is set, then netbox state machine enters error state, but it must enter error_reconnect. Do it. The bug was introduced by me in d2468dac.
-
Vladimir Davydov authored
EV_USE_REALTIME and EV_USE_MONOTONIC, which force libev to use clock_gettime, are enabled automatically on Linux, but not on OS X. We used to forcefully enable them for performance reasons, but this broke compilation on certain OS X versions and so was disabled by commit d36ba279 ("Fix gh-1777: clock_gettime detected but unavailable in macos"). Today we need these features enabled not just because of performance, but also to avoid crashes when time changes on the host - see issue #2527 and commit a6c87bf9 ("Use ev_monotonic_now/time instead of ev_now/time for timeouts"). Fortunately, we have this cmake defined macro HAVE_CLOCKGETTIME_DECL, which is set if clock_gettime is available. Let's enable EV_USE_REALTIME and EV_USE_MONOTONIC if this macro is defined. Closes #3299
-
- Mar 29, 2018
-
-
Vladislav Shpilevoy authored
-
Vladimir Davydov authored
When a vylog transaction is rolled back, we always reset vy_log.tx_size. Generally speaking, this is incorrect as rollback doesn't necessarily remove all pending records from the tx buffer - there still may be records committed with vy_log_tx_try_commit() that were left in the buffer due to write errors. We don't rollback such records, but we still reset tx_size, which leads to a discrepancy between vy_log.tx_size and the actual length of vy_log.tx list, which further on results in an assertion failure: src/box/vy_log.c:698: vy_log_flush: Assertion `i < vy_log.tx_size' failed. We need vy_log.tx_size to allocate xrow_header array of a proper size so that we can flush pending vylog records to disk. This isn't a hot path there, because vylog operations are rare. Besides, we iterate over all records anyway to fill the xrow_header array. That said, let's remove vy_log.tx_size altogether and instead calculate the vy_log.tx list length right in place.
-
Vladimir Davydov authored
Currently, we use mh_foreach, but each object is on an rlist, which suits better for iteration.
-
Vladimir Davydov authored
The new method is called if index creation failed, either due to WAL write error or build error. It will be used by Vinyl to purge prepared LSM tree from vylog.
-