Commits · 7bf7a0d16325418a4a55f94c41945c74c3989e0c · core / tarantool

Sep 11, 2017

replication: add a test case for timeouts · 7bf7a0d1
Georgy Kirichenko authored 7 years ago
```
Fixes #2707
```
7bf7a0d1

Configure applier timeouts via box.cfg · 42f8d9d8

Set applier reconnect delay and ack interval (hearthbeat interval) via
box.cfg replication_timeout parameter. Relay timeout (time interval
without hearthbeat messages) is four times bigger than replication_timeout,
so up to three hearthbeat messages can be skipped until connection to close.
Fixed #2708

42f8d9d8

vinyl: fix read iterator restoration to a newer version of the same key · 61ffe793

Vladimir Davydov authored 7 years ago

After the read iterator selects the minimal key across all available
sources, it checks mutable sources for new statements using ->restore()
callback. If there is a new statement in a source, it uses it as the min
key provided it is *strictly* less than the current min key. If they are
equal, the min key isn't changed, but this is wrong, because the new
statement may be newer than the statement selected previously. If we
don't select it, we might end up with stale data in the cache. Fix this.

61ffe793

vinyl: simplify cache iterator restore · 557b4026

Vladimir Davydov authored 7 years ago

Since ->restore() is not used by the read iterator to start iteration
any more, we can remove the corresponding code from the cache iterator
->restore() callback. Although it might be tempting to simplify it even
more by doing a full lookup every time the cache version changes, as we
already do in case of memory and txw iterators, it doesn't seem to be a
sound idea, because the read iterator itself can change the cache
version on each iteration by inserting new elements into the cache, even
if there were no disk accesses.

557b4026

vinyl: simplify txw iterator restore · d7342aa5

Vladimir Davydov authored 7 years ago

We don't need to handle iterator restart in the ->restore() callback, so
we can remove the corresponding code. Also, let's reuse the start
iteration function for restoration, because the two cases are in fact
equivalent.

d7342aa5

vinyl: simplify run iterator restore · eebba30e

Vladimir Davydov authored 7 years ago

After the recent changes in the read iterator, the ->restore() callback
does not need to handle the case of iterator restart any more. Taking
this into account and keeping in mind that on-disk runs are immutable,
we can turn the run iterator ->restore() callback into no-op.

eebba30e

vinyl: simplify memory iterator restore · 44a62002

Vladimir Davydov authored 7 years ago

To avoid lookup in the memory tree, the memory iterator ->restore()
callback tries to walk from the current iterator position to the first
statement matching the restoration criteria. Such an optimization
complicates the restoration procedure beyond comprehension and makes it
extremely error prone. Ironically, all this complexity seems to be
pointless, because a change in the memory tree means either a disk
access, which is by orders of magnitudes more expensive than a memory
lookup, or an insertion of a new statement into the tree, which has
exactly the same complexity as a lookup. That said, let's rewrite the
restoration procedure so that it always does a full lookup in case the
version of the memory tree has changed.

Also, remove handling of iterator restart and the corresponding test
case as a ->restore() callback does not need to handle them any more.

44a62002

vinyl: do not use restore callback for starting read iterator source · 0feb32b8

Vladimir Davydov authored 7 years ago

Apart from restoring the iterator position in case the source changed,
the vy_stmt_iterator_iface->restore callback is also used for starting
iteration in vy_merge_iterator_next_key() even though next_key() can be
used instead. Let's rewrite the function so that it uses next_key()
instead of restore() where appropriate. This will allow us to simplify
restore() by making it handle nothing but iterator restoration.

0feb32b8

Sep 10, 2017

vinyl: do not use restore callback for restarting read iterator · 7675313a

Vladimir Davydov authored 7 years ago

The read iterator has to restart (i.e. reopen all its sources) from the
position last returned to the caller when the current range or the whole
range tree changes as a result of dump or compaction. To reposition the
iterator, we use vy_stmt_iterator_iface->restore callback, which was
initially designed to restore an individual merge source (txw, mem, or
cache) after a statement is added to or removed from it. Abusing the
callback like that complicates its implementation as well as the read
iterator itself. We can avoid that by simply reopening merge sources
with the proper key when we need to restart the read iterator.

7675313a

vinyl: zap vy_stmt_iterator_iface->cleanup · 8cb06ffe

Vladimir Davydov authored 7 years ago

The 'cleanup' callback is always called together with 'close'. The two
callbacks were separated long time ago, when vy_merge_iterator was used
for writing runs. There is no point in keeping them apart any more.

8cb06ffe

Sep 08, 2017
- Allow multiple rollback to the same savepoint · e939c6ea
  Vladislav Shpilevoy authored 7 years ago
  
  Closes #2746
  e939c6ea
Sep 07, 2017

vinyl: fix column mask when statement is overwritten in transaction · 960aad6a

Vladimir Davydov authored 7 years ago

Statement generated by the following piece code ({1, 1, 2}) isn't dumped
to the secondary index:

    s = box.schema.space.create('test', {engine = 'vinyl'})
    s:create_index('i1', {parts = {1, 'unsigned'}})
    s:create_index('i2', {parts = {2, 'unsigned'}})

    box.begin()
    s:insert{1, 1, 1}
    s:update(1, {{'+', 3, 1}})
    box.commit()

This happens, because UPDATE is replaced with DELETE + REPLACE in the
transaction log both of which have colun_mask = 0x04 (field #3 is
updated). These statements overwrite the original INSERT in the memory
index on commit, but they are not dumped, because their column_mask does
not intersect with the column mask of the secondary index (0x02).

To avoid that, the new statement (UPDATE = DELETE + REPLACE in this
case) must inherit the column mask of the overwritten statement
(REPLACE).

Fixes #2745

960aad6a

Sep 06, 2017
- lua socket: add basic LuaSocket emulation · bb6170e4
  Roman Tsisyk authored 7 years ago
  
  Emulate http://w3.impa.br/~diego/software/luasocket/tcp.html API Needed for MobDebug Closes #2727
  bb6170e4
- lua socket: add support for TCP options · d76ded40
  Roman Tsisyk authored 7 years ago
  
  Closes #598
  d76ded40
- lua socket: group socket methods together · ba6c0c16
  Roman Tsisyk authored 7 years ago
  
  No semantic changes. In context of #2727
  ba6c0c16
- lua socket: declare all methods as local functions · b9b5724b
  Roman Tsisyk authored 7 years ago
  
  No semantic changes. Needed for #2727
  b9b5724b
Sep 05, 2017

box: review fixes for gh-2025 (savepoints) · 09ba3ca3
Konstantin Osipov authored 7 years ago
```
* update error messages
* rename variables
* add a few comments
```
09ba3ca3

Implement box.savepoint · ff0ff603

Vladislav Shpilevoy authored 7 years ago

Savepoint allows to partialy rollback a transaction. After
savepoint creation a transaction owner can rollback all changes applied
after the savepoint without rolling back the entire transaction.

Multiple savepoints can be created in each transaction. Rollback to
a savepoint cancels changes made after the savepoint, and deletes all
newer savepoints.

It is impossible to rollback to a savepoint from a substatements level,
different from the savepoint's one. For example, a transaction can not
rollback to a savepoint, created outside of a trigger, from a trigger body.

Closes #2025

ff0ff603

Make space bsize virtual · 0ddfedcb

Vladislav Shpilevoy authored 7 years ago

Vinyl can not calculate bsize during transaction execution
because of DELETE and UPSERT in vinyl spaces with single index.

Move space bsize into MemtxSpace, because Vinyl can not calculate
it now.

In a future, Vinyl bsize can be calculated after dumps and
compactions, but never during transaction execution.

0ddfedcb

vinyl: review fixes for tx interval tracking · ca31811a

Vladimir Davydov authored 7 years ago

Address issues spotted by Alex Lyapunov:

 - Fix key part count computation in vy_read_interval_cmp[lr]()
   and vy_read_interval_should_merge() and add the corresponding
   test case.
 - Simplify comparison in vy_read_interval_cmp[lr]().
 - Improve comment to vy_tx_track().

See #2671

ca31811a

CMake: detect missing submodules properly · 2d9afbcb
Roman Tsisyk authored 7 years ago
```
Fix misleading "C atomics not supported" when git submodules
are missing.

Closes #2088
```
2d9afbcb

Sep 04, 2017

Debian: install /usr/bin/tarantool as lua interpreter · a305d08e

Roman Tsisyk authored 7 years ago

Since #1265 tarantool is fully compatible with lua5.1.
Install /usr/bin/tarantool as /usr/bin/lua alternative.

Closes #2730

a305d08e

test: vinyl/mvcc: check uniqueness constraint for secondary indexes · bb358cbb

Vladimir Davydov authored 7 years ago

The check was accidentally broken by commit eb5cd536 ("vinyl: do not
track partial reads in tx manager"). Add a test case to avoid similar
screw-ups in future.

See #2716

bb358cbb

test: vinyl/hermitage: test gap locks · baea6c3a

Vladimir Davydov authored 7 years ago

There are two cases in the hermitage test that check gap locks - PMP
(predicate with many preceders) and G4 (anti-dependency cycles). As we
didn't have gap locks, we used get() to put a non-existent value to the
conflict set. Now we can use select(*) instead.

baea6c3a

test: vinyl/hermitage: use truncate on teardown · 45063496
Vladimir Davydov authored 7 years ago

45063496

vinyl: track key intervals in conflict manager · 785a1bdb

Vladimir Davydov authored 7 years ago

Currently, the conflict manager only tracks keys returned by the read
iterator, so Vinyl isn't really serializable as select() can return
phantom records, e.g.

  space: {10}, {20}, {30}, {40}, {50}

  Transaction 1                         Transaction 2
  -------------                         -------------
  box.begin()
  space:select({30}, {iterator='GE'})
  -- returns {30}, {40}, {50}
                                        box.begin()
                                        box.insert{35}
                                        box.insert{45}
                                        box.insert{55}
                                        box.commit()
  space:select({30}, {iterator='GE'})
  -- returns {30}, {35}, {40}, {45}, {50}, {55};
  -- were it serializable, the transaction would
  -- be sent to read view so that this select()
  -- would return the same set of values as the
  -- previous one
  box.commit()

Besides, tracking individual keys read by a transaction can be very
expensive from the memory consumption point of view: think of calling
select(*) on a big space.

So this patch makes the conflict manager track intervals instead of
individual keys. To achieve that it splits tx_manager->read_set in two:

 - vy_tx->read_set. Contains intervals read by a transaction. Needed to
   efficiently search intervals that should be merged with a new one.
   Intervals in this tree cannot intersect.

 - vy_index->read_set. Contains intervals read by all transaction from
   an index. Needed to efficiently search transactions that conflict
   with a write. Intervals can intersect.

When vy_tx_track() is called, it first looks up all intervals
intersecting with the new interval in vy_tx->read_set, removes them, and
extends the new interval to span them. Then it inserts the new interval
into both vy_index->read_set and vy_tx->read_set. The vy_index->read_set
is used on commit to send all transactions that read intervals modified
by the committed statement to read view.

Note, now we don't differentiate 'gaps', i.e. non-existent keys read by
a transaction. Gaps were used to avoid aborting a transaction if a
non-existent key read by it is deleted. We can't track gaps without
bloating the read set on select(*).

Closes #2671

785a1bdb

vinyl: convert GT/LT to GE/LE for empty key in vy_read_iterator · 1fb352ac

Vladimir Davydov authored 7 years ago

Currently, this is done in each plain iterator (run, mem, txw, cache).
To handle the empty search key the same way as non-empty keys when
setting a gap lock, this needs to be handled in vy_read_iterator.

Needed for #2671

1fb352ac

vinyl: move ITER_REQ handling from vy_cursor to vy_read_iterator · adefc586

Vladimir Davydov authored 7 years ago

To set a gap lock properly, the read iterator needs to discern ITER_REQ
from ITER_LE, which is used by vy_cursor instead of ITER_REQ.

Needed for #2671

adefc586

Update small to bring augmented rb tree · f48afd5e
Vladimir Davydov authored 7 years ago

f48afd5e

Sep 01, 2017

cbus: implement trigger on flush event · 6b860278

Vladislav Shpilevoy authored 7 years ago

The trigger is called when the flush callback sends messages to
the consumer pipe (in cpipe_flush_cb, if messages queue is not empty).

Needed for #946 to send buffers from tx to iproto.

6b860278

alter: commit old index drop and new index create on index rebuild · c8ac2d90

Vladimir Davydov authored 7 years ago

To rebuild an index when its key def changes, we effectively drop it and
create a new index instead. Skipping Index::commitDrop and commitCreate
stages at this point deprives Vinyl of an opportunity to log the change
in the metadata log and replace the index in the scheduler, which leads
to a crash. This patch adds the commit stage to RebuildIndex which calls
the above-mentioned commitDrop and commitCreate for the old and the new
indexes, respectively.

There is a nuance here. Memtx piggybacks Index::commitDrop to drop space
tuples when the primary index is dropped. This is actually wrong for
tuples belong to a space, not to an index. Besides, it prevents us from
just calling Index::commitDrop() from RebuildIndex::commit() as is,
because RebuildIndex does not modify space data, it just moves space
tuples to a new index. To circumvent this, let us remove commitDrop()
method from MemtxIndex and drop space tuples directly from MemtxSpace's
commitTruncateSpace() and commitAlterSpace().

c8ac2d90

Aug 30, 2017

select: do not fetch the key following the last one from the engine · 09cbbce6

Vladimir Davydov authored 7 years ago

space:select(key, {limit = N}) limits the output to N keys, but it still
fetches the (N+1)-th key from the engine. This is pointless. Besides,
this can result in a conflict in Vinyl as Vinyl adds all keys returned
by iterator to the conflict manager.

09cbbce6

Fix typo in tuple.h · 30cd760e
Vladislav Shpilevoy authored 7 years ago

30cd760e

Aug 28, 2017

box: remove unused FIELD_TYPE_MAX · 779d1cf0

Roman Tsisyk authored 7 years ago

Patch aa549401 "Split key_def.h/.cc" accidentally added
FIELD_TYPE_MAP member to `enum field_type`. Currently this
enum is only used to define index parts. We don't support
'map' indexed field type at least in 1.7.x.

See #2652

779d1cf0

test: fix format errors in tests · 720c17f8
Vladislav Shpilevoy authored 7 years ago
```
Part of #2652
```
720c17f8

Aug 24, 2017

alter: extract opts_parse_key from opts_create_from_field · 88c3e945

Vladislav Shpilevoy authored 7 years ago

In the next patches the field_def will being parsed from space:format.
Field_def will contain char *name, which is limited by BOX_NAME_MAX = 65000.
So neither opt_type OPT_STR or OPT_STRPTR can be used to parse this name from
space:format. Besides, field_def contains enum field_type, which can not
be parsed using and opt_type.
Also, field_def will contain default_value, which can store values of
many types.

Proposal is to use opt_create_from_field not for entire field_def, but only
for several fields using opts_parse_key. And parse other options
manualy.

88c3e945

Split key_def.h/.cc · aa549401
Vladislav Shpilevoy authored 7 years ago
```
Needed for #2652
```
aa549401

test: fix two failing tests · aad99e13

Konstantin Osipov authored 7 years ago

replication/cluster.test.py would fail at server exit, because
at_exit() handler tries to destroy a cbus while its mutex is locked.

args.test.py would fail when run with 'make test'

aad99e13

Aug 22, 2017

Introduce and use fiber_clock() instead of fiber_time() for timeouts · d988d7fb

Vladimir Davydov authored 7 years ago

fiber_time() reports real time, which shouldn't be used for calculating
timeouts as it is affected by system time changes. Add fiber_clock()
based on ev_monotonic_now(), export it to Lua, and use it instead.

Needed for #2527

d988d7fb

Use ev_monotonic_now/time instead of ev_now/time for timeouts · a6c87bf9

Vladimir Davydov authored 7 years ago

We should use ev_monotonic_now()/ev_monotonic_time() instead of
ev_now()/ev_time() for calculating timeouts, because the latter
are affected by system time changes so that using them for timeouts
can lead to unexpected hangs in case system time changes.

Needed for #2527

a6c87bf9