- Mar 28, 2019
-
-
Konstantin Osipov authored
Speed up txn_is_distributed() check by keeping track of the number of local rows of the transaction.
-
Konstantin Osipov authored
We need to keep count of different kind of rows in a transaction: - rows which already exist in some WAL and are replayed locally. These used to be called n_remote_rows, and renamed to n_applier_rows - rows which were created locally, on this server (previously called n_local_rows, renamed to n_new_rows). Of the latter, we need to distinguish between GROUP_ID=LOCAL rows, i.e. rows which need not be replicated, and GROUP_ID=REPLICA rows, which need to be replicated. For example, a remote transaction can fire local triggers which generate local rows. If these triggers generate rows which need to be replicated, the transaction has to be aborted. In a subsequent patch I plan to add n_local_rows, which tracks the number of new rows with GROUP_ID=local and use it in txn_is_distributed() check.
-
Georgy Kirichenko authored
Disallow changes for non-local spaces during replication stream applying. As we do not support distributed transaction yet we could not provide a transactional replication for such side effects if there are not NOPed. Needed for: #2798 Follow up for: 27283deb
-
- Mar 27, 2019
-
-
Georgy Kirichenko authored
Allow single statement transactions within begin/commit in case of an ddl operation instead of auto commit requirements. This is essential for a transactional applier. Needed for: #2798
-
- Mar 11, 2019
-
-
Georgy Kirichenko authored
Form a separate transaction with local changes in case of replication. This is important because we should be able to replicate such changes (e.g. made within an on_replace trigger) back. In the opposite case local changes will be incorporated into originating transaction and would be skipped by the originator replica. Needed for #2798
-
- Feb 22, 2019
-
-
Konstantin Osipov authored
Follow up on the patch adding transaction boundaries to xrow stream. Use tsn as an abbreviation for transaction identifier (transaction sequence number). It is an important enough concept to use a short and a convenient tag name for. Deploy the name across the code - in names and comments. Clarify comments. Still use box_txn_id() as API method since box_tsn() and box_txn() would be too easy to mistype.
-
- Dec 12, 2018
-
-
Vladimir Davydov authored
There are a few warning messages that can easily flood the log, making it more difficult to figure out what causes the problem. Those are - too long WAL write - waited for ... bytes of vinyl memory quota for too long - get/select(...) => ... took too long - readahead limit is reached - net_msg_max limit is reached Actually, it's pointless to print each and every of them, because all messages of the same kind are similar and don't convey any additional information. So this patch limits the rate at which those messages may be printed. To achieve that, it introduces say_ratelimited() helper, which works exactly like say() except it does nothing if too many messages of the same kind have already been printed in the last few seconds. The implementation is trivial - say_ratelimited() defines a static ratelimit state variable at its call site (it's a macro) and checks it before logging anything. If the ratelimit state says that an event may be emitted, it will log the message, otherwise it will skip it and eventually print the total number of skipped messages instead. The rate limit is set to 10 messages per 5 seconds for each kind of a warning message enumerated above. Here's how it looks in the log: 2018-12-11 18:07:21.830 [30404] iproto iproto.cc:524 W> stopping input on connection fd 15, aka 127.0.0.1:12345, peer of 127.0.0.1:59212, readahead limit is reached 2018-12-11 18:07:21.831 [30404] iproto iproto.cc:524 W> stopping input on connection fd 15, aka 127.0.0.1:12345, peer of 127.0.0.1:59212, readahead limit is reached 2018-12-11 18:07:21.831 [30404] iproto iproto.cc:524 W> stopping input on connection fd 15, aka 127.0.0.1:12345, peer of 127.0.0.1:59212, readahead limit is reached 2018-12-11 18:07:21.831 [30404] iproto iproto.cc:524 W> stopping input on connection fd 15, aka 127.0.0.1:12345, peer of 127.0.0.1:59212, readahead limit is reached 2018-12-11 18:07:21.831 [30404] iproto iproto.cc:524 W> stopping input on connection fd 15, aka 127.0.0.1:12345, peer of 127.0.0.1:59212, readahead limit is reached 2018-12-11 18:07:21.832 [30404] iproto iproto.cc:524 W> stopping input on connection fd 15, aka 127.0.0.1:12345, peer of 127.0.0.1:59212, readahead limit is reached 2018-12-11 18:07:21.832 [30404] iproto iproto.cc:524 W> stopping input on connection fd 15, aka 127.0.0.1:12345, peer of 127.0.0.1:59212, readahead limit is reached 2018-12-11 18:07:21.832 [30404] iproto iproto.cc:524 W> stopping input on connection fd 15, aka 127.0.0.1:12345, peer of 127.0.0.1:59212, readahead limit is reached 2018-12-11 18:07:21.832 [30404] iproto iproto.cc:524 W> stopping input on connection fd 15, aka 127.0.0.1:12345, peer of 127.0.0.1:59212, readahead limit is reached 2018-12-11 18:07:21.832 [30404] iproto iproto.cc:524 W> stopping input on connection fd 15, aka 127.0.0.1:12345, peer of 127.0.0.1:59212, readahead limit is reached 2018-12-11 18:07:26.851 [30404] iproto iproto.cc:524 W> 9635 messages suppressed Closes #2218
-
- Nov 09, 2018
-
-
Vladislav Shpilevoy authored
Now txn_commit is judge, jury and executioner. It both commits and rollbacks data, and collects it calling fiber_gc, which destroys the region. But SQL wants to use some transactional data after commit. It is autogenerated identifiers - a list of sequence values generated for autoincrement columns and explicit sequence:next() calls. It is possible to store the list on malloced mem inside Vdbe, but it complicates deallocation. Much more convenient to store all transactional data on the transaction memory region, so it would be freed together with fiber_gc. After this patch applied, Vdbe takes care of txn memory deallocation in a finalizer routine. Between commit and finalization transactional data can be serialized wherever. Needed for #2618
-
- Oct 25, 2018
-
-
Vladimir Davydov authored
This function introduces a new xlog method xlog_fallocate() that makes sure that the requested amount of disk space is available at the current write position. It does that with posix_fallocate(). The new method is called before writing anything to WAL, see wal_fallocate(). In order not to invoke the system call too often, wal_fallocate() allocates disk space in big chunks (1 MB). The reason why I'm doing this is that I want to have a single and clearly defined point in the code to handle ENOSPC errors, where I could delete old WALs and retry (this is what #3397 is about). Needed for #3397
-
- Aug 01, 2018
-
-
Vladimir Davydov authored
Add txn_is_first_statement() function, which returns true if this is the first statement of the transaction. The function is supposed to be used from on_replace trigger to detect transaction boundaries. Needed for #2129
-
- Jul 23, 2018
-
-
Vladimir Davydov authored
Sysview is a special engine that is used for filtering out objects that a user can't access due to lack of privileges. Since it's treated as a separate engine by the transaction manager, we can't query sysview spaces from a memtx/vinyl transaction. In particular, if called from a transaction space:format() will return error: A multi-statement transaction can not use multiple storage engines which is inconvenient. To fix this, let's mark sysview engine with a new ENGINE_BYPASS_TX flag and make the transaction manager skip binding a transaction to an engine in case this flag is set. Closes #3528
-
- Jul 21, 2018
-
-
Vladimir Davydov authored
Currently, the way txn_stmt::old_tuple and new_tuple are referenced depends on the engine. For vinyl, the rules are straightforward: if txn_stmt::{old_tuple,new_tuple} is not NULL, then the reference to the corresponding tuple is elevated. Hence when a transaction is committed or rolled back, vinyl calls tuple_unref on both txn_stmt::old_tuple and new_tuple. For memtx, things are different: the engine doesn't explicitly increment the reference counter of the tuples - it simply sets them to the newly inserted tuple and the replaced tuple. On commit, the reference counter of the old tuple is decreased to delete the replaced tuple, while on rollback the reference counter of the new tuple is decreased to delete the new tuple. Because of this, we can't implement the blackhole engine (aka /dev/null) without implementing commit and rollback engine methods - even though such an engine doesn't store anything it still has to set the new_tuple for on_replace trigger and hence it is responsible for releasing it on commit or rollback. Since commit/rollback are rather inappropriate for this kind of engine, let's instead unify txn_stmt reference counting rules and make txn.c unreference the tuples no matter what engine is. This doesn't change vinyl, because it already conforms. For memtx, this means that we need to increase the reference counter when we insert a new tuple into a space - not a big deal as tuple_ref is almost free.
-
- Jul 10, 2018
-
-
Vladimir Davydov authored
This patch introduces a new space option, group_id, which defines how the space is replicated. If it is 0 (default), the space is replicated throughout the entire cluster. If it is 1, the space is replica local, i.e. all changes made to it are invisible to other replicas in the cluster. Currently, no other value is permitted, but in future we will use this option for setting up arbitrary replication groups in a cluster. The option can only be set on space creation and cannot be altered. Since the concept of replication groups hasn't been established yet, group_id isn't exposed to Lua. Instead, we use is_local flag, both in box.schema.space.create arguments and in box.space output. Technically, to support this feature, we introduce a new header key, IPROTO_GROUP_ID, which is set to the space group id for all rows corresponding to a space, both in xlog and in snap. Relay won't send snapshot rows whose group_id is 1. As for xlog rows, they are transformed to IPROTO_NOP so as to promote vclock on replicas without any actual data modification. The feature is currently supported for memtx spaces only, but it should be easy to implement it for vinyl spaces as well. Closes #3443 @TarantoolBot document Title: Document new space option - is_local If a space is created with is_local flag set in options, changes made to the space will be persisted, but won't be replicated.
-
Konstantin Osipov authored
-
- Jul 09, 2018
-
-
Vladimir Davydov authored
Currently, IPROTO_NOP can only be generated by a before_replace trigger, when it returns the old tuple thus turning the original operation into a NOP. In such a case we know the space id and we write it to the request body. This allows us to dispatch NOP requests via DML route. As a part of replica local spaces feature, we will substitute requests operating on local spaces with NOP in relay in order to promote vclock on replicas without actual data modification. Since space_id is stored in request body, sending it to replicas would mean decoding the request body in relay, which is an overkill. To avoid that, let's separate NOP and DML paths and remove space_id from NOP requests. Needed for #3443
-
- Jul 03, 2018
-
-
Konstantin Osipov authored
Before this patch, memtx would silently roll back a multi-statement transaction on yield, switching the session to autocommit mode. It would do nothing in case yield happened in a sub-statement in auto-commit mode. This could lead to nasty/painful to debug side-effects in malformed Lua programs. Fix by adding a special transaction state - aborted, and enter this state in case of implicit yield. Check for what happens when a sub-statement yields. Check that yield trigger is removed by a rollback. Fixes gh-2631 Fixes gh-2528
-
- Jun 14, 2018
-
-
Vladislav Shpilevoy authored
Replace it with more specific structures and pointers in order to prepare to add `net` storage. This allows to make the code working with fiber storage simpler, remove useless wrappers and casts, and in the next patch - remove broken session.sync and add fiber sync. Note that under no circumstances fiber.h is allowed to include application-specific headers like session.h or txn.h. One only is allowed to announce a struct and add opaque pointer to it.
-
- May 11, 2018
-
-
Nikita Pettik authored
This patch makes possible to start transaction in Lua and continue operations in SQL as well, and vice versa. Previously, such transactions result in assertion fault. To support them, it is required to hold deferred foreign keys constraints as attributes of transaction, not particular VDBE. Thus, deferred foreign keys counters have been completely removed from VDBE and transfered to sql_txn struct. In its turn, if there is at least one deferred foreign key violation, error will be raised alongside with rollback - that is what ANSI SQL says. Note that in SQLite rollback doesn't occur: transaction remains open untill explicit rollback or resolving all FK violations. Also, 'PRAGMA defer_foreign_keys' has been slightly changed: now it is not automatically turned off after trasaction's rollback or commit. It can be turned off by explicit PRAGMA statement only. It was made owing to the fact that execution of PRAGMA statement occurs in auto-commit mode, so it ends with COMMIT. Hence, it turns off right after turning on (outside the transaction). Closes #3237
-
- Feb 08, 2018
-
-
Vladimir Davydov authored
There are two issues in the rollback code: - txn_rollback_stmt() rollbacks the current autocommit transaction even if it is called from a sub-statement. As a result, if a sub-statement (i.e. a statement called from a before_replace or on_replace trigger) fails (e.g. due to a conflict), it will trash the current transaction leading to a bad memory access upon returning from the trigger. - txn_begin_stmt() calls txn_rollback_stmt() on failure even if it did not instantiate the statement. So if it is called from a trigger and fails (e.g. due to nesting limit), it may trash the parent statement, again leading to a crash. Fix them both and add some tests. Closes #3127
-
- Jan 23, 2018
-
-
Vladimir Davydov authored
If there are multiple on_replace triggers installed for the same space and one of them creates a new statement (by issuing a DML request), then the trigger that is called next will get the statement created by the previous trigger instead of the original statement. To fix that, let's patch txn_current_stmt() so as to return the first statement at the current transaction level instead of the last statement and use it in txn_commit_stmt(). This will also allow us to issue DML requests from a before_replace trigger without disrupting the current statement. Needed for #2993 Follow-up #3020
-
Vladimir Davydov authored
If space.on_replace callback fails (throws), we must rollback all statements inserted by the callback before failing and the statement that triggered the callback while currently we only rollback the last statement on txn->stmts list. To fix this, let's remember the position to rollback to in case of failure for each sub statement, similarly to how we do with savepoints. Note, there's a comment to txn_rollback_stmt() that says that it doesn't remove the last statement from the txn->stmts list, just clears it: /** * Void all effects of the statement, but * keep it in the list - to maintain * limit on the number of statements in a * transaction. */ void txn_rollback_stmt() It isn't going to hold after this patch, because this patch reuses the savepoint code to rollback statements on failure. Anyway, I haven't managed to figure out why we would ever need to keep statements on the list after rollback. The comment is clearly misleading as we don't have any limits on the number of statements, and even if we had, a statement counter would suffice. I guess the real reason why we don't delete statements from the list is that it is simply impossible to do in case of a singly linked list like stailq, but now it isn't going to be a problem. So remove the comment and the test case of engine/savepoint which relies on it. Needed for #2993 Closes #3020
-
Vladimir Davydov authored
stailq_splice(head1, item, head2) moves elements from list 'head1' starting from 'item' to list 'head2'. To follow the protocol, it needs to know the element previous to 'item' in 'head1' so as to make it the new last element of 'head1'. To achieve that, it has to loop over 'head1', which is inefficient. Actually, wherever we use this function, we know in advance the element stailq_splice() has to look up, but we still pass the next one, making its life difficult and obscuring the code at the caller's side. For example, look at how stailq_splice() is used in txn.c: if (stmt == NULL) { rollback_stmts = txn->stmts; stailq_create(&txn->stmts); } else { stailq_create(&rollback_stmts); stmt = stailq_next_entry(stmt, next); stailq_splice(&txn->stmts, &stmt->next, &rollback_stmts); } while under the hood stailq_splice() has the loop to find 'stmt': stailq_splice(struct stailq *head1, struct stailq_entry *elem, struct stailq *head2) { if (elem) { *head2->last = elem; head2->last = head1->last; head1->last = &head1->first; while (*head1->last != elem) head1->last = &(*head1->last)->next; *head1->last = NULL; } } This is utterly preposterous. Let's replace stailq_splice() with a new method with the same signature, but with a slightly different semantics: move all elements from list 'head1' starting from the element *following* 'item' to list 'head2'; if 'item' is NULL, move all elements from 'head1' to 'head2'. This greatly simplifies the code for both parties, as the callee doesn't have to loop any more while the caller doesn't have to handle the case when 'item' is NULL. Also, let's change the name of this function, because stailq_splice() sounds kinda confusing: after all, this function tears a list in two first and only then splices the tail with another list. Let's remove the 'splice' part altogether (anyway, there's another function for splicing lists - stailq_concat()) and call it stailq_cut_tail().
-
Vladimir Davydov authored
This flag isn't necessary as we can set txn_savepoint->stmt to NULL when a savepoint is created inside an empty transaction. Using a separate flag for this purpose obscures the code flow and complicates further progress so let's remove it.
-
- Jan 05, 2018
-
-
AKhatskevich authored
This comit introduces a number of changes: 1. move a transaction state to fiber local struct This is important because the `sqlite3` is a shared structure and it was used to store data related to the transaction. However it was working because yield is called only on commit and it garanteed unique access. (With possible effects on ddl.) NOTE: `nDeferredCons` and `nDeferredImmCons` are stored in vdbe during vdbe execution and moved to sql_txn when it needs to be saved until execution of the next vdbe in the same transaction. 2. support savepoints 2.1. support abort (anonymous savepoints) Abort mechanism was simplified. Instead of storing track of all savepoints without name, this commit introduces `anonymous_savepoint`. `anonymous_savepoint` is a structure which is stored in Vdbe and represents the state of database on the beginning of the current statement. Tarantool disallow multistatement, so a vdbe can have one statement max. That is why having one savepoint is enough to perform abort. 2.2. named savepoints Key points: - It uses Tarantool's savepoints - It allocates savepoints on the "region" (they are destroyed automatically) - There are some crutches around ddl (ddl should not be placed into a transaction) Closes #2989 #2931 #2964
-
- Dec 06, 2017
-
-
Vladimir Davydov authored
Report the LSN and the number of rows that caused the delay so that the admin can find the problematic record in the xlog. Example: too long WAL write: 3 rows at LSN 65: 0.003 sec Closes #2743
-
- Oct 19, 2017
-
-
Vladimir Davydov authored
> src/box/txn.c:454:40: error: '_Alignof' applied to an expression is a GNU extension [-Werror,-Wgnu-alignof-expression] > diag_set(OutOfMemory, sizeof(*svp) + alignof(*svp) - 1, > ^ Do not try to be smart and guess allocation size using alignof. > src/box/memtx_tree.c:391:11: error: comparison of unsigned enum expression < 0 is always false [-Werror,-Wtautological-compare] > if (type < 0 || type > ITER_GT) { /* Unsupported type */ > ~~~~ ^ ~ > src/box/vinyl_index.c:184:29: error: comparison of unsigned enum expression < 0 is always false [-Werror,-Wtautological-compare] > if (type > ITER_GT || type < 0) { > ~~~~ ^ ~ Move the check for illegal params (i.e. 'type < 0') to the box API. In index callbacks, only check that the iterator type is supported by the index.
-
Vladimir Davydov authored
Current internal tnx API is not particularly user-friendly in regards to error handling: if txn_begin_stmt(), txn_commit_stmt(), or txn_commit() fails, the txn state is undefined and the caller must rollback manually. To make the API easier to use, let's oblige these function rollback automatically in case of failure. Needed for #2776
-
- Oct 15, 2017
-
-
Vladimir Davydov authored
Needed for #2776
-
Vladimir Davydov authored
Preparation for converting txn to C. This will help removing exceptions from txn methods. Needed for #2776
-
Vladimir Davydov authored
To convert engine and space infrastructure to C, we need to make txn usable from C code. The only reason why we can't convert txn.cc to C right now is triggers: they are set from alter.cc and may throw exceptions. To handle this, let's make trigger_run() catch all exceptions and return an error code instead. For C++ code, introduce trigger_run_xc() which calls trigger_run() under the hood and throws the exception suppressed by it. Needed for #2776
-
Vladimir Davydov authored
Preparation for converting engine implementation to C. This will help removing exceptions from engine callbacks. Needed for #2776
-
Vladimir Davydov authored
Preparation for converting class Engine to plain struct. Needed for #2776
-
Vladimir Davydov authored
Rename MemtxEngine, VinylEngine, and SysviewEngine to memtx_engine, vinyl_engine, and sysview_engine as well. Preparation for converting class Engine to plain struct. Needed for #2776
-
- Oct 06, 2017
-
-
Vladimir Davydov authored
space->handler->engine->... looks ugly. Since all Handler methods receive space as an argument anyway, let's move engine to struct space. This is a step towards converting class Handler to struct space_vtab. Needed for #2776
-
- Sep 08, 2017
-
-
Vladislav Shpilevoy authored
Closes #2746
-
- Sep 05, 2017
-
-
Konstantin Osipov authored
* update error messages * rename variables * add a few comments
-
Vladislav Shpilevoy authored
Savepoint allows to partialy rollback a transaction. After savepoint creation a transaction owner can rollback all changes applied after the savepoint without rolling back the entire transaction. Multiple savepoints can be created in each transaction. Rollback to a savepoint cancels changes made after the savepoint, and deletes all newer savepoints. It is impossible to rollback to a savepoint from a substatements level, different from the savepoint's one. For example, a transaction can not rollback to a savepoint, created outside of a trigger, from a trigger body. Closes #2025
-
Vladislav Shpilevoy authored
Vinyl can not calculate bsize during transaction execution because of DELETE and UPSERT in vinyl spaces with single index. Move space bsize into MemtxSpace, because Vinyl can not calculate it now. In a future, Vinyl bsize can be calculated after dumps and compactions, but never during transaction execution.
-
- Aug 22, 2017
-
-
Vladimir Davydov authored
We should use ev_monotonic_now()/ev_monotonic_time() instead of ev_now()/ev_time() for calculating timeouts, because the latter are affected by system time changes so that using them for timeouts can lead to unexpected hangs in case system time changes. Needed for #2527
-
- Jul 28, 2017
-
-
Vladislav Shpilevoy authored
-