Commits · 9532f5beb6ffeb641a444114f5282557ac1d5279 · core / tarantool

Sep 27, 2016

vinyl: patch holes in index on recovery · 9532f5be

Successful compaction may create a range w/o tuples, and currently we
don't store empty range files on disk. As a result, such a range won't
be loaded on recovery, which breaks the index tree invariant (prev->end
equals next->begin). Hence we must silently create a new range for each
gap found on recovery.

9532f5be

vinyl: do not remove empty range on split · 1ff57e79

Vladimir Davydov authored 8 years ago

Successful merge+split may result in ranges w/o tuples. Before commit
559102a7 ("vinyl: store range lower and upper bounds on disk") it was OK
to delete such ranges, because there was no range->end. To illustrate
this, suppose compaction splits a range as follows:

  [A, C) => [A, B) + [B, C)

If range [B, C) turns out to be empty, then we could simply drop it as
all inserts would go to range [A, B) and as B was not stored anywhere,
it would effectively become range [A, C).

After the above-mentioned commit, however, removal of range [B, C)
breaks the index invariant that for each two adjacent ranges prev->end
always equals next->begin. This, in turn, can result in data loss in
case [A, B) gets split again, as B will be used to break the run write
loop (see vy_range_compact_execute()) which is premature since there may
be tuples >= B.

That being said, let's remove this small optimization altogether.

1ff57e79

vinyl: warn about stale and partial ranges on recovery · ca0445de

Vladimir Davydov authored 8 years ago

A "stale" range is an old range file left after compaction. A "partial"
range is a range file left after failed split. Although we can handle
such files, they should not normally exist in the index directory, so
let's warn about their presence on recovery.

ca0445de

Fix vinyl/compact test · 524789c0

Vladimir Davydov authored 8 years ago

The vinyl/compact issues two snapshots and expects that there will be
two runs after it, then it keeps waiting until compaction merges them.
There is a race condition intrinsic to this test - compaction might have
finished before the test checks that there are exactly two runs. To
eliminate it, let's set the test index's compact_wm parameter to 3 and
add one more snapshot to trigger compaction.

Since currently, it's impossible to modify vinyl options dynamically,
this patch moves compact_wm to index options where page_size and
range_size reside.

Closes #1758

524789c0

vinyl: cleanup, remove unused code, update tests · 3b7e4b46
Konstantin Osipov authored 8 years ago

3b7e4b46
Fix gh-1790: Completion for remote console · 899bc6d8
Nick Zavaritsky authored 8 years ago

899bc6d8
Vinyl: Fixed read iterator restoration, that fixes vinyl/options test · 03b218f7
Alexandr Lyapunov authored 8 years ago

03b218f7

Sep 26, 2016

Fix gh-1772: broken tarantoolctl eval · 33fc47e5
Nick Zavaritsky authored 8 years ago

33fc47e5

vinyl: remove old range file on compaction asap · 3977c18f

Vladimir Davydov authored 8 years ago

Currently, we postpone old range file removal until checkpoint, but we
can do it right after successful compaction - this will save us some
disk space.

3977c18f

test: vinyl: test recovery after incomplete splits · b094b089

Vladimir Davydov authored 8 years ago

The idea behind the test is simple - create several invalid range files,
i.e. those left from previous dumps and incomplete splits, then restart
the server and check that the content of the space was not corrupted.

To make it possible, we need to (1) prevent the garbage collector from
removing unused range files and (2) make the split procedure fail after
successfully writing the first range. We use error injection to achieve
that.

The test runs as follows:

 1. Disable garbage collection with the aid of error injection.

 2. Add a number of tuples to the test space that would make it split.
    Rewrite them several times with different values so that different
    generations of ranges on disk would have different contents.

 3. Inject error to the split procedure.

 4. Rewrite the tuples another couple of rounds. This should trigger
    split which is going to fail leaving invalid range files with newer
    ids on the disk.

 5. Restart the server and check that the test space content was not
    corrupted.

b094b089

vinyl: zap range index · c344dfff

Vladimir Davydov authored 8 years ago

Currently, we store all range ids in an .index file after each range
tree modification. On recovery, we open the latest .index file, get the
list of all ranges, and load them. This .index file introduces extra
complexity to the compaction task: as we can get a consistent list of
all range ids only in the tx thread, we must either write .index file
from the tx thread (which we do now), or introduce a special task for
it, which would be scheduled on compaction completion. The former way
degrades performance of the tx thread, while the latter complicates the
code.

Actually, we can do range recovery w/o having to maintain .index files:
as newer ranges always have greater ids, we can just recover ranges
starting from the greatest id and disregarding ranges that are already
spanned by the index tree. For instance, suppose range A was split in
ranges B and C. Then we recover ranges B and C first (they do not
intersect, so everything's fine), then we get to A and see that it is
already spanned (by B and C), so we just throw it away. If on split, B
(or C) was not created for some reason, then A will not be fully spanned
by the index, and we replace B (or C) with A, still getting a consistent
index view.

This patch implements the recovery process as per above and removes the
.index file. Note, to avoid loading stale index data after drop-create,
we have to name range files not only by id, but also by index lsn (just
like the .index files). As before, old range file removal is postponed
until checkpoint.

c344dfff

vinyl: store range lower and upper bounds on disk · 559102a7

Vladimir Davydov authored 8 years ago

Rename range->min_key to range->begin, as it actually denotes not the
minimal key across all entries in the range, but the lower bound of the
range, and introduce range->end for the upper bound of the range.
For adjacent ranges left->end == right->begin. If a range is leftmost,
then range->begin == NULL. If a range is rightmost, then range->end ==
NULL. Store range->{begin,end} in range file on checkpoint and load them
on recovery.

This is required by the following patch to check that ranges do not
intersect.

559102a7

vinyl: init range->path in vy_range_new() · ccea6f32

Vladimir Davydov authored 8 years ago

All we need to initialize range->path is range->id and index->path. Both
are known at the time of range allocation and never change. So let's do
range->path initialization right in vy_range_new() instead of postponing
it until range recovery/write.

ccea6f32

vinyl: don't print path when reporting temp file creation failure · 6ab11491
Vladimir Davydov authored 8 years ago
```
It is uninitialized in case of error injection.
```
6ab11491
vinyl: release quota on successful compact · 072a739d
Vladimir Davydov authored 8 years ago

072a739d
Fix review remarks for write_iterator · 6c515231
Vladislav Shpilevoy authored 8 years ago

6c515231
Remove read_iterator_get · 45dc4887
Vladislav Shpilevoy authored 8 years ago

45dc4887
Add comments for write_iterator · 5dc19cdf
Vladislav Shpilevoy authored 8 years ago

5dc19cdf
Replace can_purge with prev_tuple_flags · 616c039a
Vladislav Shpilevoy authored 8 years ago

616c039a
Implement second version of write iter optimization · d13fb0ab
Vladislav Shpilevoy authored 8 years ago

d13fb0ab
Remove can_purge and turn 'for' to 'while' loop · 77797258
Vladislav Shpilevoy authored 8 years ago

77797258

Update vy_write_iterator_next() and remove get() · 4bbb8570

Vladislav Shpilevoy authored 8 years ago

vy_write_iterator_next() now is used for getting the next
tuple.
vy_write_iterator->curr_tuple was removed.
vy_write_iterator->keeping_tuple is used for keeping
the tuple that is need between two invocations of next() but
must be deleted after.
Fixed the memory management in vy_write_iterator_next.
Optimized purging in write_iterator.

4bbb8570

Fix zstd decompression. Fixed #1789 · e90498d7
Georgy Kirichenko authored 8 years ago

e90498d7

vinyl: zap scheduler->indexes array · acb35b0c

Vladimir Davydov authored 8 years ago

We have env->indexes list. No need to store all indexes in an array in
addition to that. Note, I move rlist_add adding a new index to the
env->indexes list from vy_index_new() to vy_index_open() so that it only
becomes visible to the scheduler after having been successfully loaded.
This change does not make any difference apart from that.

acb35b0c

vinyl: zap index->ref_lock · 471a785f

Vladimir Davydov authored 8 years ago

All manipulations on index->refs, which ref_lock is supposed to protect,
are done from the tx thread, so the lock is not needed.

471a785f

vinyl: fixed a bug in restoration of mem and run · edfde593
Alexandr Lyapunov authored 8 years ago

edfde593

Sep 23, 2016

net.box: harmonize net.box readahead with the default tarantool readahead · e43800c5
Konstantin Osipov authored 8 years ago

e43800c5

net.box: add comments, update style · ef1bd18c

Konstantin Osipov authored 8 years ago

Rename connection option 'legacy_call' to 'call_16'.
If you live long enough you know that even shiniest and brightiest
call's time may come.

ef1bd18c

New net.box · 3b81e959

Nick Zavaritsky authored 8 years ago

Fix gh-799 net.box: use a single watcher
Fix gh-800 net.box: remove reconnect fiber
Fix gh-1138 net.box: an active connection is never garbage collected
Fix gh-1750 net.box: hangs after reconnect

  * Net.box() connection refuses to work with 'Lua console';
  * console method in net.box dropped (TBD: docs;)
  * internals changed, code depending on internals WILL break;
  * different state chart, see comments in source code (TBD: docs;)
  * new option: legacy_call to request call 1.6 semantics;
  * extension: option wait_connected treated as timeout if T == number;
  * wait_connected() return true/false as docs say;
  * wait_state() moved to public API (TBD: docs.)

3b81e959

Re-wire console to use new protocol machine · 65ecebcc
Nick Zavaritsky authored 8 years ago

65ecebcc
Re-implement net.box protocol machine · 411f2d56
Nick Zavaritsky authored 8 years ago

411f2d56
Fix gh-1777: clock_gettime detected but unavailable in macos · d36ba279
Nick Zavaritsky authored 8 years ago

View commits for tag 1.6.9 1.6.9

d36ba279
vinyl: fix upsert in vy_write_iterator_next() · 6ba8500d
Roman Tsisyk authored 8 years ago

6ba8500d
fix gh-1436: Corrupt identity in syslog · 594dc4d4
Nick Zavaritsky authored 8 years ago

594dc4d4
Replication for partial written transactions. Issue #1656 · 4d34cd1f
Georgy Kirichenko authored 8 years ago

4d34cd1f

Fix error with vy_apply_upsert · 351893ca

Vladislav Shpilevoy authored 8 years ago

Error was with modifying primary key. In following case:
   upsert lsn=10, modify primary key
   upsert lsn=9,
   replace lsn=8
vy_apply_upsert returned replace,lsn=9, but answer must be
   upsert lsn=9 merge with replace lsn=8

351893ca

Sep 22, 2016
- Fix error with build on mac · 395d5260
  Vladislav Shpilevoy authored 8 years ago
  
  395d5260
- vinyl: add back index reference counting to cursor new/delete · 81d65f40
  Konstantin Osipov authored 8 years ago
  
  At some point we (I) decided to implement transaction-level DDL-DML dependency tracking, and removed index reference counting from vinyl. Unfortunately we failed to implement this quickly. Put the reference counting back for now. This prevents index destruction while there is an open cursor using it. Fixes the sporadic crash of options.test.lua
  81d65f40
- xlog: print the value of the unsupported file format version in the message · 88746a58
  Konstantin Osipov authored 8 years ago
  
  88746a58
- Bump xlog version. #Issue 1656 · 238b7121
  Georgy Kirichenko authored 8 years ago
  
  238b7121