Commits · f3ca517dd59ad9ceff8216147507ca2a3f4bc03f · core / tarantool

Feb 11, 2018

vinyl: implement space.bsize, index.bsize, and index.len · f3ca517d

 - space.bsize returns the size of user data stored in the space.
   It is the sum of memory.bytes and disk.bytes as reported by
   the primary index.

 - index.bsize returns the size of memory used for indexing data.
   It is the sum of memory.index_size, disk.index_size, and
   disk.bloom_size as reported by index.info. For secondary indexes
   we also add the size of binary data stored on disk (disk.bytes),
   because it is only needed to build the index.

 - index.len returns the total number of rows stored in the index.
   It is the sum of memory.rows and disk.rows as reported by
   index.info. Note, it may be greater than the number of tuples
   stored in the space, because it includes DELETE and UPDATE
   statements.

Closes #2863
Closes #3056

f3ca517d

vinyl: report size of memory used for indexing data in index.info · eea5967b

Vladimir Davydov authored 7 years ago

This patch adds the following statistics to index.info:

 - memory.index_size - size of memory tree extents
 - cache.index_size - size of cache tree extents
 - disk.index_size - size of page index
 - disk.bloom_size - size of bloom filters

eea5967b

httpc: allow to use unix socket as connection endpoint · 04e75f2c

Konstantin Belyavskiy authored 7 years ago

This patch adds a new connection option to http client, 'unix_socket'.
The option specifies the path to the unix socket to use as connection
endpoint instead of TCP:

  httpc = require('http.client')
  httpc.request('GET', 'http://localhost/index.html', nil,
                {unix_socket = '/var/run/docker.sock'})

The option is supported only if tarantool was built with libcurl 7.40.0
or newer. For older versions, an attempt to use the option will result
in a Lua exception.

Suggested and first implemented by @rosik.
The test was refactored by @locker.

Closes #3040

04e75f2c

Feb 10, 2018

Make box.once() wait until instance enters rw mode · 33980fc5
Vladimir Davydov authored 7 years ago
```
It will help resolve box.once() conflicts in case master is rw
and replica is ro.

Closes #2537
```
33980fc5

Add Lua helpers to wait for server to switch to/from ro mode · 1d45d7b4

Vladimir Davydov authored 7 years ago

This patch adds two new Lua function, box.ctl.wait_ro() and
box.ctl.wait_rw(), that block the current fiber until the
server switches to read-only or read-write mode, respectively.
Both functions take the timeout as an optional argument.

Needed for #2537

1d45d7b4

Fix compilation with ENABLE_BACKTRACE=OFF · ddb6f0b5

Vladimir Davydov authored 7 years ago

  src/lua/init.c: In function ‘tarantool_panic_handler’:
  src/lua/init.c:321:2: error: implicit declaration of function ‘print_backtrace’ [-Werror=implicit-function-declaration]
    print_backtrace();
    ^~~~~~~~~~~~~~~

  src/lua/fiber.c:244:1: error: ‘lbox_fiber_statof_bt’ defined but not used [-Werror=unused-function]
   lbox_fiber_statof_bt(struct fiber *f, void *cb_ctx)
   ^~~~~~~~~~~~~~~~~~~~

ddb6f0b5

Feb 08, 2018

txn: fix rollback in sub statement · 6b49134d

Vladimir Davydov authored 7 years ago

There are two issues in the rollback code:

 - txn_rollback_stmt() rollbacks the current autocommit transaction even
   if it is called from a sub-statement. As a result, if a sub-statement
   (i.e. a statement called from a before_replace or on_replace trigger)
   fails (e.g. due to a conflict), it will trash the current transaction
   leading to a bad memory access upon returning from the trigger.

 - txn_begin_stmt() calls txn_rollback_stmt() on failure even if it did
   not instantiate the statement. So if it is called from a trigger and
   fails (e.g. due to nesting limit), it may trash the parent statement,
   again leading to a crash.

Fix them both and add some tests.

Closes #3127

6b49134d

alter: do not require index rebuild to clear uniqueness · 7528303c

Vladimir Davydov authored 7 years ago

Obviously, there's no point in rebuilding an index if all we do is
relaxing the uniqueness property. This will also allow us to clear
the uniqueness flag for vinyl indexes, which do not support rebuild.

Note, a memtx tree index stores a pointer to either cmp_def or key_def
depending on whether the index is unique. Hence to clear the uniqueness
flag without rebuilding the index, we need to update this pointer. To do
that, we add a new index virtual method, update_def.

Closes #2449

7528303c

index: remove unused C++ wrappers · dea88836
Vladimir Davydov authored 7 years ago

dea88836

Feb 06, 2018

replication: allow to rebootstrap replica from read-only master · 8b08ec59

Vladimir Davydov authored 7 years ago

If an instance is read-only, an attempt to join a new replica to it will
fail with ER_READONLY, because joining a replica to a cluster implies
registration in the _cluster system space. However, if the replica is
already registered, which is the case if it is being rebootstrapped with
the same uuid (see box.cfg.instance_uuid), the record corresponding to
the replica is already present in the _cluster space and hence no write
operation is required. Still, rebootstrap fails with the same error.

Let's rearrange the access checks to make it possible to rebootstrap a
replica from a read-only master provided it has the same uuid.

Closes #3111

8b08ec59

vinyl: don't check key uniqueness if indexed fields are not updated · ab726031

Vladimir Davydov authored 7 years ago

We can save a lookup in a secondary index on update if indexed fields
are not modified. The extra check comes for free as we have a bit mask
of all updated fields.

Closes #2980

ab726031

replication: fix cluster node rebootstrap · 4e62423e

Vladimir Davydov authored 7 years ago

When a tarantool instance starts for the first time (the local directory
is empty), it chooses the peer with the lowest UUID as the bootstrap
master. As a result, one cannot reliably rebootstrap a cluster node
(delete all local files and restart): if the node happens to have the
lowest UUID in the cluster after restart, it will assume that it is the
leader of a new cluster and bootstrap locally, splitting the cluster in
two.

To fix this problem, let's always give preference to peers with a higher
vclock when choosing a bootstrap master and only fall back on selection
by UUID if two or more peers have the same vclock. To achieve that, we
need to introduce a new iproto request type for fetching the current
vclock of a tarantool instance (we cannot squeeze the vclock in the
greeting, because the latter is already packed). The new request type is
called IPROTO_REQUEST_VOTE so that in future it can be reused for a more
sophisticated leader election algorithm. It has no body and does not
require authentication. In reply to such a request, a tarantool instance
will send IPROTO_OK and its current vclock. If the version of the master
is >= 1.7.7, an applier will send IPROTO_REQUEST_VOTE to fetch the
master's vclock before trying to authenticate. The vclock will then be
to determine the node to bootstrap from.

Closes #3108

4e62423e

Cleanup xrow.h · e1d0946b

Vladimir Davydov authored 7 years ago

No functional changes, just a trivial cleanup:

 - Move all C functions inside extern "C" section.
 - Rename xrow_decode_join to xrow_decode_join_xc.
 - Make XXX_xc wrappers around XXX functions.

e1d0946b

Feb 05, 2018

applier: do not print 'authenticated' message if connecting as guest · 674c1058

Vladimir Davydov authored 7 years ago

Before commit 2788dc1b ("Add APPLIER_READY state") we only printed
the 'authenticated' message to the log in case credentials were set in
the replication URI. The commit changed that: now we print the message
even in case of guest connections, when applier does not send the AUTH
command to the master at all. As a result if guest connections are not
permitted by the master, the applier will keep printing 'authenticated'
after every unsuccessful attempt to subscribe. This is misleading. Let
us revert back to the behavior we had before commit 2788dc1b.

Closes #3113

674c1058

Feb 02, 2018

Get rid of README and Dockerfile for Alpine Linux · 99ca8d1c

Konstantin Nazarov authored 7 years ago

As there is now support for Alpine Linux in packpack, there is no
longer any need in a custom Dockerfile builder.

99ca8d1c

Add -dev, -doc and -dbg packages for Alpine Linux · 8d5cbe66

Konstantin Nazarov authored 7 years ago

This patch is to get in line with the Alpine support in packpack:

- don't rely on git, and use a source package instead
- add subpackages with debug symbols, documentation and headers
- don't build tarantool 3 times in a row

8d5cbe66

replication: reconnect applier on master rebootstrap · 71b33405

Vladimir Davydov authored 7 years ago

If one node of a cluster is rebootstrapped (i.e. restarted from an
empty directory with the same configuration), other replicas will
never try to reconnect to it - the appliers will simply stop with
the ER_REPLICASET_UUID_MISMATCH error. The only way to fix this is
reconfigure replication on all other nodes.

Let's fix this problem by reassigning an applier to a new replica
in case its UUID mismatches the UUID of the replica it is currently
assigned to.

Cannot write a test, because rebootstrap is unreliable - see #3108.

Closes #3112

71b33405

applier: stop sending ACKs if master closed socket · dfb48d4d

Vladimir Davydov authored 7 years ago

If the master closes its end of the socket when there are still unread
rows available for the replica to apply, we will get tons of EPIPE error
messages at the replica's side, emitted every time it attempts to send
an ACK back to the master (i.e. one per each row left in the socket):

  main/107/applierw/ sio.cc:303 !> SystemError writev(2), called on fd 12, aka 127.0.0.1:50852: Broken pipe

To avoid that, let's make the applier writer fiber (the one that sends
ACKs) exit immediately if it receives EPIPE error while trying to send
an ACK.

Closes #2945

dfb48d4d

test: fix a sporadically failing net.box.test (long call test). · 51a9108e
Konstantin Osipov authored 7 years ago

51a9108e

Fix force_recovery on empty xlog · be558f20

Konstantin Belyavskiy authored 7 years ago

* Fix force_recovery behaviour on empty xlog files and ones with corrupted
  header.
* Add a test
* Update xlog-py/empty.test.py, since corrupted xlog no longer leads
  to a broken startup.

Closes #3026, #3076

be558f20

access: revert part of Ilya's patch for create,drop ACL · 28bf71bc

Konstantin Osipov authored 7 years ago

For backward compatibility, automatically grant CREATE, DROP
ACL to all users who have READ and WRITE access.

Our automatic upgrade script automatically grants CREATE and
ALTER to users with READ/WRITE access on universe, but this is
insufficient, since new users could be created after upgrade.

Follow up on gh-945  and gh-3089.

28bf71bc

security: Add create, drop, alter privilege support · 82123356

IlyaMarkovMipt authored 7 years ago

* Add privileges Create, Drop, Alter on universe support.
* Fix super role behavior, allowing users with
  this role to drop any objects.

Relates #945
Closes #3089

82123356

fio: Read with empty len parameter · df2387b3

IlyaMarkovMipt authored 7 years ago

* Add possibility to use file:read without len parameter.
In this case, whole file will be read.

Closes #2925

df2387b3

iproto: change IPROTO_NOP code from 11 to 12 · 48601fa1

Vladimir Davydov authored 7 years ago

11 was initially used for SQL EXECUTE in 1.8, but 1.7 commit
b73030f2 ("iproto: add IPROTO_NOP request type") reassigned
it to NOP so after the merge SQL EXECUTE landed at 12, which
broke connectors. Let's shift NOP to 12 and move EXECUTE back
to 11. This is OK as 1.7.7 which introduced the new iproto type
hasn't been officially released yet.

48601fa1

Feb 01, 2018
- Remove is_mount from fio module · 66c60ae1
  Kirill Yukhin authored 7 years ago
  
  fio.is_mount() routine is not working properly on Docker, since it uses non-transparent incremental filesystem and hence each new file has new device id which in turns means for fio.is_mount() that its parent is actually mount. But it is not. Remove the routine and corresponding test entries.
  66c60ae1
- test: check that connection does not leak if there is long call · 32f1c348
  Vladimir Davydov authored 7 years ago
  
  - Start a long call, which runs forever - Close the connection - Stop the fiber running the long call - Check that the connection does not leak, box.session.on_disconnect trigger is called once the fiber has been stopped Suggested by @kostja Follow-up #946
  32f1c348
- Merge branch '1.7-next' of github.com:tarantool/tarantool into 1.7-next · 6caf3911
  Konstantin Osipov authored 7 years ago
  
  6caf3911
- applier: add missing comments · 2a5a94cc
  Konstantin Osipov authored 7 years ago
  
  2a5a94cc
- Merge remote-tracking branch 'github/1.7' into 1.7-next · d8240c50
  Roman Tsisyk authored 7 years ago
  
  d8240c50
- Travis CI: use Debian Stretch for tests · 9f35a129
  Roman Tsisyk authored 7 years ago
  
  Try to fix coverage.
  9f35a129
Jan 31, 2018

replication: introduce orphan mode · dfd3071f

Vladimir Davydov authored 7 years ago

This patch modifies the replication configuration procedure so as to
fully conform to the specification presented in #2958. In a nutshell,
now box.cfg() tries to synchronize all connected replicas before
returning. If it fails to connect enough replicas to form a quorum, it
leaves the server in a degraded 'orphan' mode, which is basically
read-only. More details below.

First of all, it's worth mentioning that we already have 'orphan' status
in Tarantool (between 'loading' and 'hot_standby'), but it has nothing
to do with replication. Actually, it's unclear why it was introduced in
the first place so we agreed to silently drop it.

We assume that a replica is synchronized if its lag is not greater than
the value of new configuration option box.cfg.replication_sync_lag.
Otherwise a replica is considered to be syncing and has "sync" status.
If replication_sync_lag is unset (nil) or set to TIMEOUT_INFINITY, then
a replica skips the "sync" state and switches to "follow" immediately.
The default value of replication_sync_lag is 10 seconds, but it is
ignored (assumed to be inf) in case the master is running tarantool
older than 1.7.7, which does not send heartbeat messages.

If box.cfg() is called for the very first time (bootstrap) for a given
instance, then

 1. It tries to connect to all configured replicas for as long as it
    takes (replication_timeout isn't taken into account). If it fails to
    connect to at least one replica, bootstrap is aborted.

 2. If this is a cluster bootstrap and the current instance turns out to
    be the new cluster leader, then it performs local bootstrap and
    switches to 'running' state and leaves box.cfg() immediately.

 3. Otherwise (i.e. if this is bootstrap of a slave replica), then it
    bootstraps from a remote master and then stays in 'orphan' state
    until it synchronizes with all replicas before switching to
    'running' state and leaving box.cfg().

If box.cfg() is called after bootstrap, in order to recover from the
local storage, then

 1. It recovers the last snapshot and xlogs stored in the local
    directory.

 2. Then it switches to 'orphan' mode and tries to connect to at least
    as many replicas as specified by box.cfg.replication_connect_quorum
    for a time period which is a multiple of box.cfg.replication_timeout
    (4x). If it fails, it doesn't abort, but leaves box.cfg() in
    'orphan' mode. The state will switch to 'running' asynchronously as
    soon as the instance has synced with 'replication_connect_quorum'
    replicas.

 3. If it managed to connect to enough replicas to form a quorum at step
    2, it synchronizes with them: box.cfg() doesn't return until at
    least 'replication_connect_quorum' replicas have been synchronized.

If box.cfg() is called after recovery to reconfigure replication, then
it tries to connect to all specified replicas within a time period which
is a multiple of box.cfg.replication_timeout (4x). The value of
box.cfg.replication_connect_quorum isn't taken into account, neither is
the value of box.cfg.replication_sync_lag - box.cfg() returns as soon as
all configured replicas have been connected.

Just like any other status, the new one is reflected by box.info.status.

Suggested by @kostja

Follow-up #2958
Closes #999

dfd3071f

Fix order of function attributes for RB tree in replication · 4e0729cb

Kirill Yukhin authored 7 years ago

rb_gen used incorrect order of function attributes: sttic
MAYBE_UNUSED, which caused fails while compiling w/ Clang.
Change order of mentioned attributes.

4e0729cb

Merge remote-tracking branch 'origin/1.6' into 1.7 · cce7548c
Roman Tsisyk authored 7 years ago

cce7548c
Travis CI: use Ubuntu Xenial for tests · e67342c3
Roman Tsisyk authored 7 years ago

e67342c3
Travis CI: update distributions · d4a73c92
Roman Tsisyk authored 7 years ago
```
- Remove old versions of Fedora and Ubuntu
- Add Fedora 26 and Fedora 27
```
d4a73c92

Jan 30, 2018

security: Change checks on usage access · 9e30f895

IlyaMarkovMipt authored 7 years ago

* Add following behavior:
Owner of object can't utilize her own objects if she has not usage
access.
* Change access checks of space, sequence, function objects
Similar checks of other objects are performed in alter.cc.

Closes gh-3089

9e30f895

net.box: Fix typo · a0681659
IlyaMarkovMipt authored 7 years ago
```
* Fix typo in net_box.lua in rare error case
```
a0681659

fix: Broken compilation with gcc 4.6 · 459d63fc

imarkov authored 7 years ago

* Delete contructor delegation in ClientError
* Move code body from one contructor to another

459d63fc

relay: send heartbeat on subscribe if replica is uptodate · f0892e5e

Vladimir Davydov authored 7 years ago

Currently, a realy sends a heartbeat message to the replica only if
there was no WAL events for 'replication_timeout' seconds. As a result,
a replica that happens to be uptodate on subscribe will not update the
lag until the timeout passes, which may delay configuration. Let's make
relay send a heartbeat message right after subscribe in case the replica
is uptodate.

f0892e5e

replication: add helpers to set and clear replica applier · 85310417

Vladimir Davydov authored 7 years ago

These operations are going to become more complicated than just setting
a pointer so let's introduce helpers for them.

85310417