Commits · 86947566529bcdb51a139764f60f3c3c8fc0c7fa · core / tarantool

Dec 07, 2017

gh-2142 (lua_atpanic, print trace): review fixes · 86947566
Konstantin Osipov authored 7 years ago
```
* better error messages
* print trace
```
86947566

lua: set lua_atpanic with custom handler · 51d56f84

Ilya authored 7 years ago

Set lua_atpanic with handler which prints line and file
where panic occured

But in some cases luajit doesn't call this function, though
program fails

Investigation shows that it happens inside unwind library

51d56f84

applier: use fiber_cond instead of fiber_channel · b4bf3fa0

Vladimir Davydov authored 7 years ago

Apart from being an overkill, using a fiber_channel object for notifying
about applier state changes is actually unsafe, because fiber_channel
methods involve memory allocations and hence may fail due to OOM while
not methods using them expect this. For instance, applier_disconnect()
should never fail, but it actually may as it calls fiber_channel_put()
via applier_set_state().

To avoid unpredictable behavior caused by unhandled exceptions, let's
switch to plain and simple fiber_cond.

b4bf3fa0

Dec 06, 2017

Improve "too long WAL write" message · 43ba81b4

Vladimir Davydov authored 7 years ago

Report the LSN and the number of rows that caused the delay so that
the admin can find the problematic record in the xlog. Example:

  too long WAL write: 3 rows at LSN 65: 0.003 sec

Closes #2743

43ba81b4

schema: revoke role priveleges when dropping an object · bad0484d

Vladimir Davydov authored 7 years ago

Currently, only user privileges are revoked. If an object has a role
privilege, an attempt to drop it will fail:

  > box.space.test:drop()
  ---
  - error: User '6' is not found
  ...

Fix it and add a test case.

Closes #2710

bad0484d

iproto: remove iproto_write_error_blocking() · 4dac37a6

Konstantin Osipov authored 7 years ago

Prefer performance over correctness :(
iproto_write_error_blocking() may block entire server on a malformed
request. A malformed request can come only from an incorrect
client driver or an attacker. Remove the attack vector by
not attempting to respond to an incorrect request in blocking mode.

Minor code cleanup.

4dac37a6

vinyl: fix leak of transactions in dead fibers · 51fcffb7

Vladislav Shpilevoy authored 7 years ago

In a case of a fiber death, an active vinyl transaction in it is
not rolled back or commited. It just leaks. Fix it.

Closes #2983

51fcffb7

Drop IpcChannelGuard · 9487dfb2

Vladimir Davydov authored 7 years ago

Although a variable of this type is defined in appliers_connect_all(),
it doesn't seem to be used anywhere in this function. Looks like we
forgot to delete it after reworking the applier implementation. Delete
it now.

9487dfb2

transactions: store struct request on stack · 22cb77e2

Vladislav Shpilevoy authored 7 years ago

There is no reason to allocate it on a region. Struct request is not
stored longer than a single statement. In process_rw struct request is
used only to read its attributes. The request itself is not stored
anywhere in transaction between multiple statements.

22cb77e2

replication: use xstream_write instead of xstream_write_xc · 145c9c7e

Vladislav Shpilevoy authored 7 years ago

Instead of try { xstream_write_xc(...); } catch (Exception *e) {...}
do if (xstream_write(...) != 0) { /* process error. */ }

It is more clear, than including half of the function into
try-catch.

145c9c7e

Dec 05, 2017
- replication: add remote peer to box.info.replication · 43910285
  Ilya authored 7 years ago
  
  * add peer uri to box.info.replication[x].upstream Closes #2689
  43910285
Dec 04, 2017

iproto: fix gh-2575, crash on a batch of malformed packets · 9fda6cf7

Konstantin Osipov authored 7 years ago

In case of a malformed packet or other error, do not
append to output buffer directly, but pump the message
through iproto thread.

This is a pre-requisite for a fix for gh-946.

9fda6cf7

box: allow to specify instance and replica set uuid · f0200c1c

Vladimir Davydov authored 7 years ago

Add new box configuration options:

  box.cfg.instance_uuid
  box.cfg.replicaset_uuid

Both options take a uuid as a string. If set, they force tarantool to
set the specified UUIDs for instance and/or replica set on bootstrap.
On recovery, they are used to check UUIDs stored in the snapshot against
the ones specified by the user - on mismatch recovery is aborted.

Closes #2967

f0200c1c

test: fix xlog/gh1433 spurious failure · eb7bb884

Vladimir Davydov authored 7 years ago

Since commit ccd451eb ("Always touch snapshpot in checkpoint_daemon")
xlog/checkpoint_daemon leaves the checkpoint_interval set to 30 ms,
which can result in box.snapshot() failure in subsequence tests, as
the one shown below. Fix this.

  xlog/big_tx.test.lua                                    [ pass ]
  xlog/checkpoint_daemon.test.lua                         [ pass ]
  xlog/errinj.test.lua                                    [ disabled ]
  xlog/gh1433.test.lua                                    [ fail ]
  Test failed! Result content mismatch:
  --- xlog/gh1433.result  Fri Dec  1 17:09:56 2017
  +++ xlog/gh1433.reject  Fri Dec  1 17:15:49 2017
  @@ -17,5 +17,5 @@
   ...
   box.snapshot()
   ---
  -- ok
  +- error: Snapshot is already in progress
   ...

eb7bb884

netbox: use msgpack.decode_unchecked instead of msgpack.ibuf_decode · 9a590b7d
Vladimir Davydov authored 7 years ago
```
msgpack.ibuf_decode is deprecated.
```
9a590b7d

Extend msgpack Lua API · 68c21397

Vladimir Davydov authored 7 years ago

 - Allow to pass a C buffer to msgpack.decode(). Syntax:

     buf = buffer.ibuf()
     ...
     obj, rpos = msgpack.decode(buf.rpos, buf:size())

 - Introduce a version of msgpack.decode() that doesn't check the
   supplied msgpack - msgpack.decode_unchecked(). It has the same
   signature as msgpack.decode() except if called on a C buffer it
   doesn't require the buffer size. It is supposed to supplant
   msgpack.ibuf_decode() over time.

 - Allow to store encoded objects in a user-supplied ibuf. Syntax:

     buf = buffer.ibuf()
     len = msgpack.encode(obj, buf)

   ('len' is the number of bytes stored in the buffer)

 - Add tests.

Closes #2755

68c21397

Dec 01, 2017

vinyl: optimize select in case secondary key is updated frequently · 7f1d1ee4

Vladimir Davydov authored 7 years ago

If a space has secondary keys, an update operation generates a REPLACE
in the primary key and a DELETE + REPLACE in each secondary key. We
don't need a DELETE in the primary key, because a field indexed by the
primary key cannot be updated so a REPLACE is enough to update the tuple
stored in the index. On the contrary, a field indexed by a secondary key
can be updated so we need a DELETE to remove the old tuple from the
index.

As a result, if a field indexed by a secondary key gets updated often
(e.g. the user frequently calls space:update({x}, {{'+', 2, 1}}) on a
space with a secondary index over field #2), a lot of DELETE statements
will be generated. The DELETE statements won't be compacted until major
compaction so a range select over a secondary index may take long,
because it will have to iterate over all those useless DELETEs.

In fact, the REPLACE generated by an update operation can be safely
substituted with an INSERT in a secondary index. INSERT + DELETE are
annihilated on dump/compaction so that would solve the problem.
Unfortunately, we can't substitute REPLACE with INSERT immediately on
update, because statements are shared between primary and secondary
indexes in memory and we can't use an INSERT in a primary index in case
of update (see above). However, it is OK to turn REPLACE generated by an
update in a secondary key to an INSERT on dump/compaction. We just need
a way to identify such REPLACE statements somehow.

In contrast to normal REPLACEs, a REPLACE statement generated by an
update operation usually has a column mask. There's only one exception:
if an update operation updates all secondary keys, the column mask isn't
stored (vy_stmt_column_mask() returns UINT64_MAX). This is done for the
sake of memory usage minimization, but it doesn't seem to make much
sense: first, updates that touch all secondary indexes should be rare;
second, we save only 8 bytes per statement. Let's remove this
optimization and store column mask in REPLACE statements generated by
update operations unconditionally and use this information in the write
iterator to turn REPLACEs into INSERTs.

See #2875

7f1d1ee4

vinyl: annihilate INSERT+DELETE pairs on compaction · 9fdd066c

Vladimir Davydov authored 7 years ago

The idea is simple: if the oldest statement for a given key among all
write iterator's sources is an INSERT, there is either no statements
for this key in older runs or the most recent statement is a DELETE;
in either case we can drop all leading DELETEs from this key's final
history and turn the first REPLACE if any into an INSERT.

Note, if the oldest statement is NOT an INSERT, but the first statement
in the output happens to be an INSERT, we must convert it to a REPLACE
otherwise we risk mistakenly optimizing new DELETEs for this key on the
next compaction.

Closes #2875

9fdd066c

vinyl: preserve INSERT statements when squashing txv · 3b28cdd9

Vladimir Davydov authored 7 years ago

If the first statement for a particular key in a transaction write set
is INSERT, there is either no committed statements for this key or the
last committed statement is DELETE so we can

 - drop the final statement if it is a DELETE
 - turn the final statement into INSERT if it is a REPLACE

Needed for #2875

3b28cdd9

vinyl: introduce INSERT statement type · 0e5744e3

Vladimir Davydov authored 7 years ago

Use IPROTO_INSERT instead of IPROTO_REPLACE if the previous statement
for the same key is either absent or has type IPROTO_DELETE. This will
allow us to annihilate INSERT+DELETE pairs on compaction (currently, we
have to produce DELETE unless it's major compaction).

For now INSERT statements are produced only in the following cases:
 - space:insert() is called - see vy_insert();
 - space:upsert() is called, the space has secondary indexes, and
   the previous statement for the inserted key is either absent or
   has type DELETE - see vy_upsert();
 - space:replace() is called, the space has secondary indexes, and
   the previous statement for the inserted key is either absent or
   has type DELETE - see vy_replace_impl().

No special interpretation of INSERT statement has been added yet - they
are handled just like REPLACE statements. INSERT+DELETE annihilation
will be introduced by the following patches.

Needed for #2875

0e5744e3

Fix some flaky tests · 6f33c8a6

Vladimir Davydov authored 7 years ago

 - box/alter_limits.test.lua

   The test uses the following code to drop all indexes except primary:

     for k, v in pairs (s.index) do if v.id ~= 0 then v:drop() end end

   Since 's.index' is reinitialized by 'v:drop()', iteration over the
   table using 'pairs' may result in unpredictable behavior. Presumably,
   it is responsible for the following test failure:

     --- box/alter_limits.result     Fri Dec  1 08:30:16 2017
     +++ box/alter_limits.reject     Fri Dec  1 08:32:54 2017
     @@ -399,25 +399,22 @@
      -- unknown index type
      index = s:create_index('test', { type = 'nosuchtype' })
      ---
     -- error: Unsupported index type supplied for index 'test' in space 'test'
     +- error: 'Can''t create or modify index ''test'' in space ''test'': index id too big'

   Let's use a simple for-loop instead of 'pairs' here.

 - box/ddl.test.lua

   Right before the spurious failure shown below, the test executes the
   following line of code:

     box.begin() box.internal.collation.create('test2', 'ICU', 'ru_RU') box.commit()

   If the interpreter doesn't yield before proceeding to the next line,
   the transaction started at this line won't get aborted or committed,
   resulting in the test failure:

     --- box/ddl.result	Tue Oct 17 10:53:32 2017
     +++ box/ddl.reject	Fri Oct 20 11:16:17 2017
     @@ -295,10 +295,11 @@
      ...
      box.internal.collation.create('test', 'ICU', 'ru_RU')
      ---
     +- error: Space _collation does not support multi-statement transactions
      ...

   Fix this by explicitly aborting the transaction on the next line.

 - vinyl/ddl.test.lua

   The test checks that a vinyl transaction is aborted by a concurrent
   DDL using the following piece of code:

     ch = fiber.channel(1)
     s = box.schema.space.create('test', { engine = 'vinyl' })
     pk = s:create_index('primary', { parts = { 1, 'uint' } })
     sk = s:create_index('sec', { parts = { 2, 'uint' } })
     box.begin()
     s:replace({1, 2, 3})
     s:replace({4, 5, 6})
     s:replace({7, 8, 9})
     s:upsert({10, 11, 12}, {})
     _ = fiber.create(function () s:drop() ch:put(true) end)
     box.commit()
     ch:get()

   If the fiber performing table drop happens to yield before calling
   's:drop()', the transaction won't be aborted:

     --- vinyl/ddl.result	Wed Nov 29 06:14:09 2017
     +++ vinyl/ddl.reject	Wed Nov 29 06:19:09 2017
     @@ -650,7 +650,6 @@
      ...
      box.commit()
      ---
     -- error: Transaction has been aborted by conflict
      ...
      ch:get()
      ---

    Obviously, we should wait for the DDL fiber to complete before
    trying to commit the transaction so just swap 'ch:get()' and
    'box.commit()' here.

6f33c8a6

vinyl: improve logging · f69907db

Vladimir Davydov authored 7 years ago

 - Use diag_log() or error_log() instead of printing error->errmsg
   whenever possible as they refer to the place where the error was
   initially raised, which may be useful.

 - Do not include the description of a transaction we failed to flush
   and so left in the buffer to be flushed later - this information is
   generally useless. Just print a warning that we failed to flush the
   vylog buffer to draw attention of the admin to possible IO problems.
   Consequently, remove the error code from vy_log_tx_try_commit().

 - Use vy_log_tx_try_commit() for writing VY_RUN_DISCARD and FORGET
   records. Currently, we use vy_log_tx_commit(), because at the time
   these methods were initially implemented, vy_log_tx_try_commit()
   simply didn't exist. Let's use vy_log_tx_try_commit() and remove the
   error messages emitted on vy_log_tx_commit() failure.

 - If we failed to load the vylog and failed replication or garbage
   collection or backup, print what exactly we failed to do. The error
   message is already printed by vy_recovery_new() so don't include it
   in the output.

 - Remove error message from gc_run() as engine callbacks already print
   error messages, no point to duplicate them. Actually, we usually
   print error messages from engine callbacks rather than engine
   wrappers.

 - Improve dump/checkpoint logging. Print when checkpoint started, when
   it ended, how many bytes were dumped, how long it took.

 - Print names of files written by worker threads (.run, .index). To
   differentiate messages from different worker threads, give unique
   names to them: vinyl.worker.0, 1, and so on. We already name reader
   threads in the same fashion.

 - Print names of files removed by the vinyl garbage collection
   callback. Also, makes messages look exactly like the ones printed by
   xdir_collect_garbage() (used by memtx, wal).

 - Use VERBOSE level for printing the message about scheduling a
   background fiber for squashing UPSERTs. There shouldn't be many of
   them so this should be fine.

 - Use INFO level for 'rebuilding index for ...' messages (currently,
   it's WARN for no reason). Also, print the file name of the index/data
   file we failed to load (currently, it may be omitted on certain
   errors, making identification of a missing or corrupted file nearly
   impossible).

 - Print the name of the run file we failed to read a page from to the
   log. Also include the page offset and size to the output. Full file
   path is not printed, because it's difficult to procure it from the
   error path, but I guess it's OK as each run file has a unique name
   and so can be easily found by its name. I guess it addresses #1972.

f69907db

Merge branch '1.7' into 1.7-next · 8d00e0e3
Konstantin Osipov authored 7 years ago

8d00e0e3
gh-2966: initialize admin universe access at bootstrap · 2db448b2
Georgy Kirichenko authored 7 years ago
```
Initialize admin user access to universe access on instance start.
```
2db448b2

Nov 30, 2017
- Merge branch '1.7' into 1.7-next · 85749481
  Roman Tsisyk authored 7 years ago
  
  85749481
Nov 29, 2017

replication: fix insane lag after reconnect · cd17b77f

Vladimir Davydov authored 7 years ago

Applier must not update the replication lag upon receiving a packet
containing an error, because those packets don't have timestamps.

Closes #2965

cd17b77f

log: don't ignore message field · 487555e0

Ilya authored 7 years ago

Remove `message` field from blacklist, because JSON and plain
loggers are mutually exclusive.

Fixes #2923

487555e0

Nov 27, 2017

digest: add pbkdf2 hashing · 93980aef
Ilya authored 7 years ago
```
* Add pbkdf2 hashing API
* Wrapper of OpenSSL

Closes #2874
```
93980aef

vinyl: fix crash in vy_read_iterator_restore_mem · 44b4d7ec

Vladimir Davydov authored 7 years ago

vy_read_iterator_restore_mem() is called after a yield caused by a disk
read to restore the position of the iterator over the active in-memory
tree. It assumes that if a statement inserted into the active in-memory
tree during the yield is equal to the statement at which the read
iterator is positioned now (curr_stmt) by key but is older in terms of
LSN, then the iterator must be positioned at a txw statement. However,
the iterator could be positioned at an uncommitted statement stored in
the cache before the yield. We don't restore the cache iterator after a
yield, so if the new statement has been committed, its LSN will be less
than the LSN of the uncommitted statement stored in the cache although
it is indeed newer. This results in an assertion failure:

vy_read_iterator.c:421: vy_read_iterator_restore_mem: Assertion `itr->curr_src == itr->txw_src' failed.

To fix this, let's modify the code checking if the iterator should be
repositioned to the active in-memory tree (mem_src) after a yield:
instead of comparing statement LSNs, let's reposition the iterator
unless it is currently positioned at a txw statement as it is the only
case when curr_stmt can be newer than the newly inserted statement.

Closes #2926

44b4d7ec

make coll_cmp_f attibutes to be const · 4eeab69a

khatskevich authored 7 years ago

  It is not very important change, but implementing collations in SQL required
to either delete const modificator from some functions in SQL or make
coll_cmp_f arguments to be const. The second option were chosen because it
makes more sense.

4eeab69a

rename func coll_cache_find -> coll_by_id · dcda29c9

khatskevich authored 7 years ago

  This renaming is important, because searching collations by name is needed
for 1.8 branch. In 1.8 there are two functions for searching collation:
  - coll_cache_id
  - coll_cache_name

  Renaming is also important because new format is closer to similar functions
in other modules.

dcda29c9

Extend fio Lua API · a0fcaa88

Vladimir Davydov authored 7 years ago

In order to use fio in conjunction with ibuf, we need to extend read(),
pread(), write(), pwrite() so that they can take a C buffer instead of
a Lua string. The syntax is as follows:

  read(size) -> str
  read(buf, size) -> len

  pread(size, offset) -> str
  pread(buf, size, offset) -> len

  write(str)
  write(buf, size)

  pwrite(str, offset)
  pwrite(buf, size, offset)

See #2755

a0fcaa88

Nov 21, 2017
- Merge branch '1.7' into 1.7-next · 29d71633
  Konstantin Osipov authored 7 years ago
  
  29d71633
- Clear applier input on disconnect · 4e960f7f
  Georgy Kirichenko authored 7 years ago
  
  On disconnect applier input buffer can contain some unparsed data, clear this. Fixes #2932
  4e960f7f
Nov 18, 2017

tap: add ':isfunction()' test function (#2928) · 7e1bf350

Daniil Lemenkov authored 7 years ago

Implement ':isfunction()' test function  in Tap testing framework
that checks whether type of an argument is 'function'.

Closes #1859

7e1bf350

Nov 17, 2017

vinyl: fix LSN assignment for indexes received during initial join · 8c48aa47

Vladimir Davydov authored 7 years ago

Since commit 7d67ec8a ("box: generate LSNs for rows received during
initial join"), we assign fake, monotonically growing LSNs to all
records received during initial join, because Vinyl requires all indexes
to have a unique LSN for identification in vylog. The problem is there's
a number of records (about 60) that have LSN 0 on the master - these are
bootstrap records. So if there are N such records, the index that has
LSN X on the master will have LSN (N + X) on the replica if sent during
initial join. But there may be another index with the same LSN on the
master (i.e. N + X). If this index is sent on final join or subscribe
stage, the replica will fail to make a checkpoint or recover, spitting
an error message similar to the one below:

coio vy_log.c:2002 E> failed to process vylog record: create_index{index_lsn=68, space_id=12345, key_def=[0, 'unsigned'], }
main/101/replica vy_log.c:1446 E> ER_INVALID_VYLOG_FILE: Invalid VYLOG file: Duplicate index id 68
main/101/replica vy_log.c:2117 E> failed to load `./00000000000000000000.vylog'
main/101/replica F> failed to create a checkpoint

Presently, the problem is 100% reproducible by

test_run.py --conf vinyl engine/replica_join

To fix it, let's revert the aforementioned commit and instead assign
fake LSNs only to vinyl records, not to all records received during
initial join - after all it's vinyl that needs this, not memtx.

Actually, we already use the same technique for DML records in vinyl -
see vy_join_ctx::lsn - except there we assign fake LSNs on the master's
side. This was done for historical reasons - we could as well assign
them on the replica, but that would require some refactoring done later.
So let's use the same fake LSN counter for both cases.

8c48aa47

vinyl: improve logging in vylog · b0f063a0

Vladimir Davydov authored 7 years ago

 - Log debug messages at VERBOSE log level. Currently, the only way to
   debug a vylog failure in production is enabling DEBUG log level, but
   that basically floods the log with tons of not so important messages.
   There are not that many vylog messages so it should be OK to log them
   with say_verbose().

 - Report the filename on failure to load or save a vylog in order to
   simplify identification of the corrupted file.

 - Log some critical errors, such as error processing a vylog record or
   failure to flush the vylog for recovery.

 - Add sentinel messages for log rotation, saving, and loading, logged
   at VERBOSE level. This will help matching dumped log records to high
   level operations that emitted them.

 - Remove debug logging from the vy_recovery callback, which is invoked
   by vy_recovery_iterate() and vy_recovery_load_index(), as it is a
   responsibility of the caller to write log messages there, not of
   the vylog internal implementation. Besides, dumping all replayed
   records there is not really necessary as they are dumped while vylog
   is loaded anyway (see vy_recovery_new()), which should be enough for
   debugging.

b0f063a0

vinyl: skip uniqueness check during recovery · e0c76280

Vladimir Davydov authored 7 years ago

During recovery we apply rows that were successfully applied either
locally before restart or on the master so conflicts are impossible.
We already skip the uniqueness check for primary indexes in this case.
Let's skip it for secondary indexes as well.

Closes #2099

e0c76280

vinyl: discard tautological DELETEs on compaction · a6f45d87

Vladimir Davydov authored 7 years ago

The write iterator never discards DELETE statements referenced by a read
view unless it is major compaction. However, a DELETE is useless in case
it is preceded by another DELETE for the same key. Let's skip such
tautological DELETEs. It is not only a useful optimization on its own -
it will also help us annihilate INSERT+DELETE pairs on compaction.

Needed for #2875

a6f45d87

Nov 16, 2017

iproto: separate implementation iproto_msg and iproto_connection · 18be1329

Konstantin Osipov authored 7 years ago

This file used to be smaller so not having extra forward
declarations seemed to be reasonable. Now it's time to split the
code of iproto_msg and iproto_connection. No semantical changes.

18be1329