Commits · 824ceb32aca3273de4d9eae7838e0a9341998de6 · core / tarantool

Jun 16, 2017

vinyl: do not call vy_scheduler_complete_dump on index deletion · 824ceb32

Currently, vy_scheduler_remove_mem() calls vy_scheduler_complete_dump()
if vy_scheduler_dump_in_progress() returns false, but the latter doesn't
necessarily mean that the dump has just been completed. The point is
that vy_scheduler_remove_mem() is called not only for a memory tree that
has just been dumped to disk, but also for all memory trees of a dropped
index, i.e. dropping an index when there's no dump in progress results
in vy_scheduler_complete_dump() invocation. This doesn't do any harm
now, but looks ugly. Besides, I'm planning to account dump bandwidth in
vy_scheduler_complete_dump(), which must only be done on actual dump
completion.

824ceb32

vinyl: factor out functions for memory dump start and completion · 9719b28a

Vladimir Davydov authored 7 years ago

Following patches will add more logic to them, so it's better to factor
them out now to keep the code clean. No functional changes.

9719b28a

vinyl: fix crash if snapshot is called while dump is in progress · dbfd515f

Vladimir Davydov authored 7 years ago

Currently, to force dumping all in-memory trees, box.snapshot()
increments scheduler->generation directly. If dump is in progress and
there's a space that has more than one index and all its secondary
indexes have been dumped by the time box.snapshot() is called and its
primary index is being dumped, incrementing the generation will force
the scheduler to start dumping secondary indexes of this space again
(provided, of course, the space has fresh data). Then, creating a dump
task for a secondary index will attempt to pin the primary index - see
vy_task_dump_new() => vy_scheduler_pin_index() - which will crash,
because the primary index is being dumped and hence can't be removed
from the scheduler by vy_scheduler_pin_index():

  Segmentation fault
  #0  0x40c3a4 in sig_fatal_cb(int)+214
  #1  0x7f6ac7981890 in ?
  #2  0x4610bd in vy_scheduler_remove_index+46
  #3  0x4610fe in vy_scheduler_pin_index+49
  #4  0x45f93e in vy_task_dump_new+1478
  #5  0x46137e in vy_scheduler_peek_dump+282
  #6  0x461467 in vy_schedule+47
  #7  0x461bf8 in vy_scheduler_f+1143

To fix that let's trigger dump (by bumping generation) only from the
scheduler fiber, from vy_scheduler_peek_dump(). The checkpoint will
force the scheduler to schedule dump by setting checkpoint_in_progress
flag and setting checkpoint_generation.

Closes #2508

dbfd515f

cmake: build should abort if libcurl not found · 40d86fe6
Konstantin Osipov authored 7 years ago

40d86fe6

alter: init space truncate_count after recovering snapshot · 99d6a4f4

Vladimir Davydov authored 7 years ago

The replace trigger of _truncate system space (on_replace_dd_truncate)
does nothing on insertion into or deletion from the space - it only
updates space truncate_count when a tuple gets updated. As a result,
space truncate_count isn't initialized properly after recovering
snapshot. This does no harm to memtx, because it doesn't use space
truncate_count at all, but it breaks the assumption made by vinyl that
if space truncate_count is less than index truncate_count (which is
loaded from vylog), the space will be truncated during WAL recovery and
hence there's no point in applying statements to the space (see
vy_is_committed_one). As a result, all statements inserted into a vinyl
space after snapshot following truncation of the space, are ignored on
WAL recovery. To fix that, we must initialize space truncate_count when
a tuple is inserted into _truncate system space.

Closes #2521

99d6a4f4

Add static_assert() macro for old compilers · 72be507f
Roman Tsisyk authored 7 years ago

Unverified

72be507f

Add HTTP client based on libcurl · 7e62ac79

Ilya authored 7 years ago

Inpspired by tarantool/curl module by Vasiliy Soshnikov.
Reviewed and refactored by Roman Tsisyk.

Closes #2083

7e62ac79

Fix name clash in reflection.h · 414635ed

Roman Tsisyk authored 7 years ago

Rename `struct type` to `struct type_info` and `struct method` to
`struct method_info` to fix name clash with curl/curl.h

414635ed

Jun 15, 2017

box: auto upgrade to 1.7.5 · 29a197cf

Vladimir Davydov authored 7 years ago

We added _truncate space to 1.7.5 and we are going to add new system
spaces for storing sequences and triggers. Without upgrade, the
corresponding operations won't work. Since 1.7.5 is a minor upgrade,
users may not call box.schema.upgrade(), so we need to call it for them
automatically. This patch introduces infrastructure for automatic
upgrades and sets upgrade to 1.7.5 to be called automatically.

While we are at it, rename schema version 1.7.4 to 1.7.5 (1.7.4 has
already been released).

Closes #2517

29a197cf

Rename snapshot_daemon to checkpoint_daemon · e126d26c
Roman Tsisyk authored 7 years ago
```
Follow up #2496
```
e126d26c

Enable snapshot_daemon by default · 92a8e0dc

Roman Tsisyk authored 7 years ago

Set checkpoint_count = 2, checkpoint_interval = 3600 by default.

vinyl/layout.result is updated because checkpoint_count was changed
from 6 to 2.

Closes #2496

92a8e0dc

Follow vclock only for success wal writes · 3c4bac71
Georgy Kirichenko authored 7 years ago
```
If more than one request rollbacks then vclock_follow cause
commit broken order
```
3c4bac71
Fix column_mask unit test · 34c07e5c
Georgy Kirichenko authored 7 years ago
```
Compiling with arm raises a signed and unsigned comparison error.
```
34c07e5c
Fix compilation error · c59f2f98
Georgy Kirichenko authored 7 years ago

c59f2f98

Jun 14, 2017

vinyl: decrease usage of vy_mem_older_lsn · 9b8062d5

Vladislav Shpilevoy authored 7 years ago

Do not call vy_mem_older_lsn on each UPSERT commit. Older lsn
statement is used to squash big count of upserts and to turn
UPSERT into REPLACE, if the older statement has appeared to be
not UPSERT.
But n_upserts could be calculated on prepare phase almost free,
because the bps has method bps_insert_get_iterator, which
returns iterator to the inserted statement. We can move this
iterator forward to the older lsn without searching in the tree and
update n_upserts.

On a commit phase we can get the n_upserts, calculated on a prepare
phase, and call vy_mem_older_lsn only if there is a sense to
optimize the UPSERT.

Closes #1988

9b8062d5

vinyl: rename vy_tx_prepare.replace to vy_tx_prepare.repsert · 95e9a101
Vladislav Shpilevoy authored 7 years ago
```
According to the code, 'replace' tuple can also have UPSERT type.
Lets name it 'repsert' = 'replace' + 'upsert'.
```
95e9a101

Jun 13, 2017
- index: improve error message when selecting from HASH index. · 821b14df
  Konstantin Osipov authored 7 years ago
  
  Improve the error message when selecting from HASH index using a partial key. Fixes gh-1463.
  821b14df
- tuple: introduce no-throw versions of tuple_field · f938e90a
  Vladislav Shpilevoy authored 7 years ago
  
  tuple_field_...() is a family of functions for retrieving a field from tuple with checking of the specified type. Implement no-throw versions of these functions to use them from C code. Needed #944
  f938e90a
- alter: introduce enums for field numbers · 25614560
  Vladislav Shpilevoy authored 7 years ago
  
  Needed for #944 and #2285
  25614560
Jun 12, 2017
- errinj: fix a sporadically failing test case · 3906fb13
  Konstantin Osipov authored 7 years ago
  
  Set of error injection before issuing a select, to avoid its effects on select execution.
  3906fb13
- port: fix error checking in port_dump · b15f8ac1
  Vladislav Shpilevoy authored 7 years ago
  
  b15f8ac1
Jun 10, 2017
- box: fix a typo · 014d03f5
  Konstantin Osipov authored 7 years ago
  
  Fix a typo in a status message (gh-2417)
  014d03f5
- tuple: make tuple_field.. functions family take const tuple · d91e48ab
  Vladislav Shpilevoy authored 7 years ago
  
  d91e48ab
- alter: add comments about linearisability · 6474175a
  Konstantin Osipov authored 7 years ago
  
  6474175a
- memory: add missed malloc error processing · c9afbcd8
  Vladislav Shpilevoy authored 7 years ago
  
  c9afbcd8
- Remove space from cache before wal write · a8b60ad8
  Georgy Kirichenko authored 7 years ago
  
  Space should not be accessible while droping. See #2075
  a8b60ad8
- Update vinyl/upgrade test · 3c1f7c5a
  Vladimir Davydov authored 7 years ago
  
  A new record type has been added to vylog by commit 353bcdc5 ("Rework space truncation"), VY_LOG_TRUNCATE_INDEX. Update the test.
  3c1f7c5a
- tarantoolctl: fix error reading vinyl files · 3e493ec0
  Vladimir Davydov authored 7 years ago
  
  Since vinyl files (run, index, vylog) don't have space id, reading them with tarantoolctl-cat fails with: tarantoolctl:784: attempt to compare nil with number
  3e493ec0
Jun 09, 2017

Improve output of vinyl/gc test · 90901285

Vladimir Davydov authored 7 years ago

In case of failure, print files that were not deleted and
the output of box.internal.gc.info().

Needed for #2486

90901285

Rename box.cfg.vinyl_threads to vinyl_write_threads · c568a658
Vladimir Davydov authored 7 years ago
```
To match box.cfg.vinyl_read_threads introduced by the previous patch.
```
c568a658

vinyl: use cbus instead of coeio for reading run pages · de885dbf

Vladimir Davydov authored 7 years ago

vy_run_iterator_load_page() uses coeio, which is extremely inefficient
for our cases:

 - it locks/unlocks mutexes every time when a task is queued, scheduled,
   or finished
 - it invokes ev_async_send(), which writes to eventfd and wakes up TX
   loop every time on every task completion
 - it blocks tasks until a free worker is available, which leads to
   unpredictable delays

This patch replaces coeio with cbus in the similar way we do TX <-> WAL
interaction. The number of reader threads is set by a new configuration
option, vinyl_read_threads, which is set to 1 by default.

Note, this patch doesn't bother adjusting cbus queue length, i.e. it is
set to INT_MAX as per default. While this is OK when there are a lot of
concurrent read requests, this might be suboptimal for low-bandwidth
workloads, resulting in higher latencies. We should probably update the
queue length dynamically depending on how many clients are out there.

Closes #2493

de885dbf

Jun 08, 2017

Fix for couple of build problems · 2ba51ab2
bigbes authored 7 years ago

2ba51ab2
Add engine/truncate test · e9fc8d48
Vladimir Davydov authored 7 years ago

e9fc8d48

Rework space truncation · 353bcdc5

Vladimir Davydov authored 7 years ago

Space truncation that we have now is not atomic: we recreate all indexes
of the truncated space one by one. This can result in nasty failures if
a tuple insertion races with the space truncation and sees some indexes
truncated and others not.

This patch redesigns space truncation as follows:

 - Truncate is now triggered by bumping a counter in a new system space
   called _truncate. As before, space truncation is implemented by
   recreating all of its indexes, but now this is done internally in one
   go, inside the space alter trigger. This makes the operation atomic.

 - New indexes are created with Handler::createIndex method, old indexes
   are deleted with Index::~Index. Neither Index::commitCreate nor
   Index::commitDrop are called in case of truncation, in contrast to
   space alter. Since memtx needs to release tuples referenced by old
   indexes, and vinyl needs to log space truncation in the metadata log,
   new Handler methods are introduced, prepareTruncateSpace and
   commitTruncateSpace, which are passed the old and new spaces. They
   are called before and after truncate record is written to WAL,
   respectively.

 - Since Handler::commitTruncateSpace must not fail while vylog write
   obviously may, we reuse the technique used by commitCreate and
   commitDrop methods of VinylIndex, namely leave the record we failed
   to write in vylog buffer to be either flushed along with the next
   write or replayed on WAL recovery. To be able to detect if truncation
   was logged while recovering WAL, we introduce a new vylog record
   type, VY_LOG_TRUNCATE_INDEX which takes truncate_count as a key: if
   on WAL recovery index truncate_count happens to be <= space
   truncate_count, then it it means that truncation was not logged and
   we need to log it again.

Closes #618
Closes #2060

353bcdc5

vinyl: convert vy_index->tree to pointer · 801f32c7

Vladimir Davydov authored 7 years ago

Space truncate rework done by the next patch requires the ability to
swap data stored on disk between two indexes on recovery so as not to
reload all runs every time a space gets truncated. Since we can't swap
content of two rb tree (due to rbt_nil), convert vy_index->tree to a
pointer.

801f32c7

Fix -Wunused on on Clang · 85009195
Roman Tsisyk authored 7 years ago

85009195

Lock schema for space and index alteration · 5a200cb3

Georgy Kirichenko authored 7 years ago

Lock schema before any changes to space and index dictionary and unlock
only after commit or rollback. This allow many parallel data definition
statements. Issue #2075

5a200cb3

Add before statement trigger for spaces · c60fa224

Georgy Kirichenko authored 7 years ago

We need to lock box schema while editing a ddl space. This lock should
be done before any changes in a ddl space. Before trigger is the good
place to issue a schema lock. See #2075

c60fa224

box: require box.cfg.checkpoint_count to be >= 1 · 27b86b1d

Vladimir Davydov authored 7 years ago

We must store at least one snapshot, otherwise we wouldn't recover
after restart, so if checkpoint_count is set to 0, we disable garbage
collection. This contravenes the notion followed everywhere else in
tarantool: if we want an option value (timeout, checkpoint count, etc)
to be infinite, we should set it to a very big number, not to 0.
Make checkpoint_count comply.

27b86b1d

box: rework internal garbage collection API · 2c547c26

Vladimir Davydov authored 7 years ago

The current gc implementation has a number of flaws:

 - It tracks checkpoints, not consumers, which makes it impossible to
   identify the reason why gc isn't invoked. All we can see is the
   number of users of each particular checkpoint (reference counter),
   while it would be good to know what references it (replica or
   backup).

 - While tracking checkpoints suits well for backup and initial join, it
   doesn't look good when used for subscribe, because replica is
   supposed to track a vclock, not a checkpoint.

 - Tracking checkpoints from box/gc also violates encapsulation:
   checkpoints are, in fact, memtx snapshots, so they should be tracked
   by memtx engine, not by gc, as they are now. This results in
   atrocities, like having two snap xdirs - one in memtx, another in gc.

 - Garbage collection is invoked by a special internal function,
   box.internal.gc.run(), which is passed the signature of the oldest
   checkpoint to save. This function is then used by the snapshot daemon
   to maintain the configured number of checkpoints. This brings
   unjustified complexity to the snapshot daemon implementation: instead
   of just calling box.snapshot() periodically it has to take on
   responsibility to invoke the garbage collector with the right
   signature. This also means that garbage collection is disabled unless
   snapshot daemon is configured to be running, which is confusing, as
   snapshot daemon is disabled by default.

So this patch reworks box/gc as follows:

 - Checkpoints are now tracked by memtx engine and can be accessed via a
   new module box/src/checkpoint.[hc], which provides simple wrappers
   around corresponding MemtxEngine methods.

 - box/gc.[hc] now tracks not checkpoints, but individual consumers that
   can be registered, unregistered, and advanced. Each consumer has a
   human-readable name displayed by box.internal.gc.info():

   tarantool> box.internal.gc.info()
   ---
   - consumers:
     - name: backup
       signature: 8
     - name: replica 885a81a9-a286-4f06-9cb1-ed665d7f5566
       signature: 12
     - name: replica 5d3e314f-bc03-49bf-a12b-5ce709540c87
       signature: 12
     checkpoints:
     - signature: 8
     - signature: 11
     - signature: 12
   ...

 - box.internal.gc.run() is removed. Garbage collection is now invoked
   automatically by box.snapshot() and doesn't require the snapshot
   daemon to be up and running.

2c547c26