Commits · 5d2614d83706eab8c821a8ccba1238d38183f361 · core / tarantool

Feb 20, 2018

Rename field_type_is_compatible to field_type1_contains_type2 · 5d2614d8

Vladislav Shpilevoy authored 7 years ago

Compatibility must be commutative, but this function is not
commutative. It checks, that one type can store values of another
type, but not conversely.

5d2614d8

test: ensure is_nullable can be enabled in non-empty space format · 06aa0266
Vladislav Shpilevoy authored 7 years ago
```
Closes #2973
```
06aa0266

vinyl: allow to disable bloom filter for index · 68a5e6df

Vladimir Davydov authored 7 years ago

Not all workloads need bloom filters enabled for all indexes. Let's
allow to disable them on per-index basis by setting bloom_fpr to 1.
This will allow to save some memory if bloom filters are unused.

Closes #3138

68a5e6df

Check vinyl index options on box cfg and space alter · 632592bf

Vladimir Davydov authored 7 years ago

Currently, one can set insane values for most vinyl index options, which
will most certainly result in a crash (e.g. bloom_fpr = 100). Add some
sanity checks.

632592bf

replication: fix rebootstrap race that results in broken subscription · 72ed72a2

Vladimir Davydov authored 7 years ago

While a node of the cluster is re-bootstrapping (joining again),
other nodes may try to re-subscribe to it. They will fail, because
the rebootstrapped node hasn't tried to subscribe hence hasn't been
added to the _cluster table yet and so is not present in the hash
at the subscriber's side for replica_on_applier_reconnect() to look
it up.

Fix this by making a subscriber create an id-less (REPLICA_ID_NIL)
struct replica in this case and reattach the applier to it. It will
be assigned an id when it finally subscribes and is registered in
_cluster.

Fixes 71b33405 replication: reconnect applier on master rebootstrap

72ed72a2

replication: stop syncing if quorum cannot be formed · 042c07dc

Vladimir Davydov authored 7 years ago

If box.cfg() successfully connects to a number of replicas sufficient to
form a quorum (>= box.cfg.replication_connect_quorum), it won't return
until it syncs with all of them (lag <= box.cfg.replication_sync_lag).
If one of the replicas forming a quorum disconnects permanently while
sync is in progress, box.cfg() will hang forever.

Such a behavior is rather unreasonable. After all, syncing a quorum is
best-effort. It would be much more sensible to return from box.cfg()
leaving the instance in the 'orphan' mode in this case. This patch does
exactly that: now if we detect that not enough replicas are connected to
form a quorum while we are syncing we stop syncing immediately.

042c07dc

Introduce replication_connect_timeout configuration option · 0e9b87c7

Vladimir Davydov authored 7 years ago

Currently, the max time box.cfg() may wait for connection to replicas to
be established is hardcoded to box.cfg.replication_timeout times 4. As
a result, users can't revert to pre replication_connect_quorum behavior,
when box.cfg() blocks until it connects to all replicas. To fix that,
let's introduce a new configuration option, replication_connect_timeout,
which determines the replication configuration timeout. By default the
option is set to 4 seconds.

Closes #3151

0e9b87c7

vinyl: skip needless checks for duplicates on INSERT · 36838c7c

Vladimir Davydov authored 7 years ago

If a unique index includes all parts of another unique index, we can
skip the check for duplicates for it on INSERT. Let's mark all such
indexes with a special flag on CREATE/ALTER and optimize out the check
if the flag is set. If there are two indexes that index the same set
of fields, check uniqueness for the one with a lower id, because it
is likelier to have have a "warmer" cache.

Closes #3154

36838c7c

vinyl: delete runs compacted during join immediately · 9c446a20

Vladimir Davydov authored 7 years ago

We keep run files corresponding to (at least) the last snapshot, because
we need them for backups and replication. Deletion of compacted run
files is postponed until the next snapshot. As a consequence, we don't
delete run files created on a replica during the join stage. However, in
contrast to run files created during normal operation, these are pure
garbage and should be deleted right away. Not deleting them can result
in depletion of disk space, because vinyl has quite high write
amplification by design.

We can't write a functional test for this, because there's no way to
guarantee that compaction started during join will finish before join
completion - if it doesn't, compacted runs won't be removed, because
they will be assigned to the snapshot created by join.

Closes #3162

9c446a20

vinyl: forbid vinyl index key definition alter · c31dd19a

Vladislav Shpilevoy authored 7 years ago

Vinyl index key definition is stored in vylog even if an index is
empty, and we while do not have a method to update it. So vinyl
index key definition alter is forbidden even on an empty space.

Closes #3169

c31dd19a

Feb 19, 2018
- vinyl: restore disk stats after index file rebuild · 400277bf
  Vladimir Davydov authored 7 years ago
  
  Closes #3173
  400277bf
Feb 17, 2018

Don't exit from ddl until all other ddls are flushed · 350b645a

Georgy Kirichenko authored 7 years ago

If any ddl operation is in progress then all other ddls are
waiting on schema latch. But after first ddl will be done any
other request may be issued just after it and commit order will be
broken in case of multimaster replication.  To prevent this
behavior any ddl operation should wait until all queued ddls are
done.

Fixes #2951

350b645a

Enhance latch behavior · 03fe7a2f

Georgy Kirichenko authored 7 years ago

Prevent latch lock interception by other already scheduled or active
fiber if there is only one waiting. This is needed for strict latch
ordering.

03fe7a2f

Feb 16, 2018

replication: fix relay disconnect due to race condition · ad562340

Konstantin Belyavskiy authored 7 years ago

Incomming ACK lead to race condition and prevent heartbeat
messages. It ends up with disconnect on timeout.
This fix is based on @locker proposal to send vclock only
to reply master (since it itself sends heartbeat messages).

Closes #3160

ad562340

Revert "Revert "replication: disconnect applier on timeout"" · b9b7eb74
Vladimir Davydov authored 7 years ago
```
This reverts commit a7871247.
```
b9b7eb74

Feb 15, 2018

Revert "replication: disconnect applier on timeout" · a7871247
Konstantin Osipov authored 7 years ago
```
This reverts commit 99c7a971.
```
a7871247

vinyl: warn when transaction waits for quota for too long · d6a61904

Vladimir Davydov authored 7 years ago

If a vinyl transaction stalls waiting for quota for more than
box.cfg.too_long_threshold seconds, emit a warning to the log:

  W> waited for 699089 bytes of vinyl memory quota for too long: 0.504 sec

This will help us understand whether our users experience lags
due to absence of throttling in vinyl (see #1862).

Closes #3096

d6a61904

schema: improve arguments check in grant or revoke on universe · 7e652c22

imarkov authored 7 years ago

The name of the universe is optional, so we don't check it. If a user
wants to specify extra options in the grant, such as if_not_exists, and
mistakes object name argument with options argument, options are
silently ignored:

  box.schema.user.grant('tnt', 'read,write,execute', 'universe', {if_not_exists = true})

Fix this by adding Lua code that ensures that universe name is a scalar
(string or nil).

Closes #3146

7e652c22

Feb 13, 2018

replication: disconnect applier on timeout · 99c7a971

Konstantin Belyavskiy authored 7 years ago

In replication schema if one of the instances was powered off, it isn't
detected by others and the connection hangs. Alive machines show
'follow' state. Add timeout to solve this issue. It's safe since
applier and relay both send messages every replication_timeout so we can
assume that if we read nothing we have problem with connection. Use
replication_disconnect_timeout which is replication_timeout * 4 as for
now.

The test fixed and comments improved by @locker.

Closes #3025

99c7a971

Make box.ctl.wait_rw check orphan state · 5e75e2fa

Vladimir Davydov authored 7 years ago

If an instance is 'orphan', it is read-only hence box.ctl.wait_rw()
should block until the instance syncs, but currently it doesn't. Fix it.

5e75e2fa

vinyl: do not release latch in the middle of vylog rotate · 686814ea

Vladimir Davydov authored 7 years ago

vy_log_rotate() releases the log latch between reading the last vylog
file and writing the new vylog file. This works as long as the latch
implementation guarantees that latch_lock() called immediately after
latch_unlock() on the same lock doesn't yield. Although this is true
now, we shouldn't rely on that, because this may change any time.

686814ea

build: update travis to a new branch name · c85a75aa
Konstantin Osipov authored 7 years ago

c85a75aa
test: ensure the vy_cache_iterator stops on not existing keys · 47c4f8a9
Vladislav Shpilevoy authored 7 years ago
```
Closes #2789
```
47c4f8a9

security: Prohibit to drop super role · b4c4a606

imarkov authored 7 years ago

* Create constant SUPER - id of super role
* Forward the constant to box.schema
* Add checks on drop super role

Closes #3084

b4c4a606

Feb 11, 2018

vinyl: implement space.bsize, index.bsize, and index.len · f3ca517d

Vladimir Davydov authored 7 years ago

 - space.bsize returns the size of user data stored in the space.
   It is the sum of memory.bytes and disk.bytes as reported by
   the primary index.

 - index.bsize returns the size of memory used for indexing data.
   It is the sum of memory.index_size, disk.index_size, and
   disk.bloom_size as reported by index.info. For secondary indexes
   we also add the size of binary data stored on disk (disk.bytes),
   because it is only needed to build the index.

 - index.len returns the total number of rows stored in the index.
   It is the sum of memory.rows and disk.rows as reported by
   index.info. Note, it may be greater than the number of tuples
   stored in the space, because it includes DELETE and UPDATE
   statements.

Closes #2863
Closes #3056

f3ca517d

vinyl: report size of memory used for indexing data in index.info · eea5967b

Vladimir Davydov authored 7 years ago

This patch adds the following statistics to index.info:

 - memory.index_size - size of memory tree extents
 - cache.index_size - size of cache tree extents
 - disk.index_size - size of page index
 - disk.bloom_size - size of bloom filters

eea5967b

httpc: allow to use unix socket as connection endpoint · 04e75f2c

Konstantin Belyavskiy authored 7 years ago

This patch adds a new connection option to http client, 'unix_socket'.
The option specifies the path to the unix socket to use as connection
endpoint instead of TCP:

  httpc = require('http.client')
  httpc.request('GET', 'http://localhost/index.html', nil,
                {unix_socket = '/var/run/docker.sock'})

The option is supported only if tarantool was built with libcurl 7.40.0
or newer. For older versions, an attempt to use the option will result
in a Lua exception.

Suggested and first implemented by @rosik.
The test was refactored by @locker.

Closes #3040

04e75f2c

Feb 10, 2018

Make box.once() wait until instance enters rw mode · 33980fc5
Vladimir Davydov authored 7 years ago
```
It will help resolve box.once() conflicts in case master is rw
and replica is ro.

Closes #2537
```
33980fc5

Add Lua helpers to wait for server to switch to/from ro mode · 1d45d7b4

Vladimir Davydov authored 7 years ago

This patch adds two new Lua function, box.ctl.wait_ro() and
box.ctl.wait_rw(), that block the current fiber until the
server switches to read-only or read-write mode, respectively.
Both functions take the timeout as an optional argument.

Needed for #2537

1d45d7b4

Fix compilation with ENABLE_BACKTRACE=OFF · ddb6f0b5

Vladimir Davydov authored 7 years ago

  src/lua/init.c: In function ‘tarantool_panic_handler’:
  src/lua/init.c:321:2: error: implicit declaration of function ‘print_backtrace’ [-Werror=implicit-function-declaration]
    print_backtrace();
    ^~~~~~~~~~~~~~~

  src/lua/fiber.c:244:1: error: ‘lbox_fiber_statof_bt’ defined but not used [-Werror=unused-function]
   lbox_fiber_statof_bt(struct fiber *f, void *cb_ctx)
   ^~~~~~~~~~~~~~~~~~~~

ddb6f0b5

Feb 08, 2018

txn: fix rollback in sub statement · 6b49134d

Vladimir Davydov authored 7 years ago

There are two issues in the rollback code:

 - txn_rollback_stmt() rollbacks the current autocommit transaction even
   if it is called from a sub-statement. As a result, if a sub-statement
   (i.e. a statement called from a before_replace or on_replace trigger)
   fails (e.g. due to a conflict), it will trash the current transaction
   leading to a bad memory access upon returning from the trigger.

 - txn_begin_stmt() calls txn_rollback_stmt() on failure even if it did
   not instantiate the statement. So if it is called from a trigger and
   fails (e.g. due to nesting limit), it may trash the parent statement,
   again leading to a crash.

Fix them both and add some tests.

Closes #3127

6b49134d

alter: do not require index rebuild to clear uniqueness · 7528303c

Vladimir Davydov authored 7 years ago

Obviously, there's no point in rebuilding an index if all we do is
relaxing the uniqueness property. This will also allow us to clear
the uniqueness flag for vinyl indexes, which do not support rebuild.

Note, a memtx tree index stores a pointer to either cmp_def or key_def
depending on whether the index is unique. Hence to clear the uniqueness
flag without rebuilding the index, we need to update this pointer. To do
that, we add a new index virtual method, update_def.

Closes #2449

7528303c

index: remove unused C++ wrappers · dea88836
Vladimir Davydov authored 7 years ago

dea88836

Feb 06, 2018

replication: allow to rebootstrap replica from read-only master · 8b08ec59

Vladimir Davydov authored 7 years ago

If an instance is read-only, an attempt to join a new replica to it will
fail with ER_READONLY, because joining a replica to a cluster implies
registration in the _cluster system space. However, if the replica is
already registered, which is the case if it is being rebootstrapped with
the same uuid (see box.cfg.instance_uuid), the record corresponding to
the replica is already present in the _cluster space and hence no write
operation is required. Still, rebootstrap fails with the same error.

Let's rearrange the access checks to make it possible to rebootstrap a
replica from a read-only master provided it has the same uuid.

Closes #3111

8b08ec59

vinyl: don't check key uniqueness if indexed fields are not updated · ab726031

Vladimir Davydov authored 7 years ago

We can save a lookup in a secondary index on update if indexed fields
are not modified. The extra check comes for free as we have a bit mask
of all updated fields.

Closes #2980

ab726031

replication: fix cluster node rebootstrap · 4e62423e

Vladimir Davydov authored 7 years ago

When a tarantool instance starts for the first time (the local directory
is empty), it chooses the peer with the lowest UUID as the bootstrap
master. As a result, one cannot reliably rebootstrap a cluster node
(delete all local files and restart): if the node happens to have the
lowest UUID in the cluster after restart, it will assume that it is the
leader of a new cluster and bootstrap locally, splitting the cluster in
two.

To fix this problem, let's always give preference to peers with a higher
vclock when choosing a bootstrap master and only fall back on selection
by UUID if two or more peers have the same vclock. To achieve that, we
need to introduce a new iproto request type for fetching the current
vclock of a tarantool instance (we cannot squeeze the vclock in the
greeting, because the latter is already packed). The new request type is
called IPROTO_REQUEST_VOTE so that in future it can be reused for a more
sophisticated leader election algorithm. It has no body and does not
require authentication. In reply to such a request, a tarantool instance
will send IPROTO_OK and its current vclock. If the version of the master
is >= 1.7.7, an applier will send IPROTO_REQUEST_VOTE to fetch the
master's vclock before trying to authenticate. The vclock will then be
to determine the node to bootstrap from.

Closes #3108

4e62423e

Cleanup xrow.h · e1d0946b

Vladimir Davydov authored 7 years ago

No functional changes, just a trivial cleanup:

 - Move all C functions inside extern "C" section.
 - Rename xrow_decode_join to xrow_decode_join_xc.
 - Make XXX_xc wrappers around XXX functions.

e1d0946b

Feb 05, 2018

applier: do not print 'authenticated' message if connecting as guest · 674c1058

Vladimir Davydov authored 7 years ago

Before commit 2788dc1b ("Add APPLIER_READY state") we only printed
the 'authenticated' message to the log in case credentials were set in
the replication URI. The commit changed that: now we print the message
even in case of guest connections, when applier does not send the AUTH
command to the master at all. As a result if guest connections are not
permitted by the master, the applier will keep printing 'authenticated'
after every unsuccessful attempt to subscribe. This is misleading. Let
us revert back to the behavior we had before commit 2788dc1b.

Closes #3113

674c1058

Feb 02, 2018

Get rid of README and Dockerfile for Alpine Linux · 99ca8d1c

Konstantin Nazarov authored 7 years ago

As there is now support for Alpine Linux in packpack, there is no
longer any need in a custom Dockerfile builder.

99ca8d1c

Add -dev, -doc and -dbg packages for Alpine Linux · 8d5cbe66

Konstantin Nazarov authored 7 years ago

This patch is to get in line with the Alpine support in packpack:

- don't rely on git, and use a source package instead
- add subpackages with debug symbols, documentation and headers
- don't build tarantool 3 times in a row

8d5cbe66