Skip to content
Snippets Groups Projects
  1. Jul 09, 2019
    • Oleg Babin's avatar
      fio: introduce utime function · 6e393aca
      Oleg Babin authored
      Closes #4323
      
      @TarantoolBot document
      Title: fio.utime
      
      fio.utime (filepath [, atime [, mtime]])
      Set access and modification times of a file.
      The first argument is the filename, the second argument (atime) is
      the access time, and the third argument (mtime) is
      the modification time. Both times are provided in seconds since the epoch.
      If the modification time is omitted, the access time provided is used;
      if both times are omitted, the current time is used.
      6e393aca
    • Vladimir Davydov's avatar
      txn: run on_rollback triggers on txn_abort · 6ac597db
      Vladimir Davydov authored
      When a memtx transaction is aborted on yield, it isn't enough to
      rollback individual statements - we must also run on_rollback triggers,
      otherwise changes done to the schema by an aborted DDL transaction will
      be visible to other fibers until an attempt to commit it is made.
      6ac597db
  2. Jul 08, 2019
    • Vladimir Davydov's avatar
      txn: fix execution order of commit triggers · 01343264
      Vladimir Davydov authored
      Both commit and rollback triggers are currently added to the list head.
      As a result, they are both run in the reverse order. This is correct for
      rollback triggers, because this matches the order in which statements
      that added the triggers are rolled back, but this is wrong for commit
      triggers. For example, suppose we create a space and then create an
      index for it in the same transaction. We expect that on success we first
      run the trigger that commits the space and only then the trigger that
      commits the index, not vice versa. That said, reverse the order of
      commit triggers in the scope of preparations for transactional DDL.
      01343264
    • Vladimir Davydov's avatar
      vinyl: don't sync WAL on space alter if not necessary · 27aba00b
      Vladimir Davydov authored
      Changes done to an altered space while a new index is being built or
      the format is being checked are propagated via an on_replace trigger.
      The problem is there may be transactions that started before the alter
      request. Their working set can't be checked so we simply abort them.
      We can't abort transactions that have reached WAL so we also call
      wal_sync() to flush all pending WAL requests. This is a yielding
      operation and we call it even if there's no transactions that need
      to be flushed. As a result, vinyl space alter yields unconditionally,
      even if the space is empty and there is no pending transactions
      affecting it. This prevents us from implementing transactional DDL.
      Let's call wal_sync() only if there's actually at least one pending
      transaction affecting the altered space and waiting for WAL.
      27aba00b
    • Serge Petrenko's avatar
      decimal: expose decimal type to lua. · 3ab387a4
      Serge Petrenko authored
      Add a decimal library to lua.
      
      Part of #692
      
      @TarantoolBot document
      Title: Document decimal module in lua.
      
      First of all, you have to require the package via
      `decimal = require('decimal')`
      Now you can construct decimals via `new` method.
      Decimals may be constructed from lua numbers, strings, unsigned and
      signed 64 bit integers.
      Decimal is a fixed-point type with maximum 38 digits of precision. All
      the calculations are exact, so, be careful when constructing decimals
      from lua numbers: they may hold only 15 decimal digits of precision.
      You are advised to construct decimals from strings, since strings
      represent decimals exactly, and vice versa.
      
      ```
      a = decimal.new(123e-7)
      b = decimal.new('123.456')
      c = decimal.new('123.456e2')
      d = decimal.new(123ULL)
      e = decimal.new(2)
      ```
      The allowed operations are addition, subtraction, division,
      multiplication and power. If at least one of the operands is decimal,
      decimal operations are performed. The other operand may be either
      decimal or string, containing a number representation, or a lua number.
      
      Operations only fail on an overflow, i.e. when result exceeds 10^38 - 1.
      This includes division by zero. In these cases an error `Operation
      failed` is raised.
      Underflow is also possible, when precision needed to store the exact
      result exceeds 38 digits. Underflow is not an error. When an underflow
      happens, the result is rounded to 38 digits of precision.
      
      ```
      a = decimal.new(123e-7)
      b = decimal.new('123.456')
      c = decimal.new('123.456e2')
      d = decimal.new(123ULL)
      e = decimal.new(2)
      ```
      ```
      tarantool> a + b
      ---
      - '123.456012300000000'
      ...
      
      tarantool> c - d
      ---
      - '12222.6'
      ...
      
      tarantool> c / b
      ---
      - '100'
      ...
      
      tarantool> d * d
      ---
      - '15129'
      ...
      
      tarantool> d ^ 2
      ---
      - '15129'
      ...
      
      tarantool> 2 ^ d
      ---
      - '10633823966279326983230456482242756608'...
      
      tarantool> e ^ d
      ---
      - '10633823966279326983230456482242756608'
      ...
      ```
      The following math functions are also supported:
      log10, ln, exp, sqrt. When specified as
      `decimal.opname()`, operations may be performed on
      strings and lua numbers.
      ```
      f = decimal.new(100)
      
      tarantool> decimal.log10(f)
      ---
      - '2'
      ...
      
      tarantool> decimal.sqrt(f)
      ---
      - '10'
      ...
      
      tarantool> e2 = decimal.exp(2)
      ---
      ...
      
      tarantool> decimal.ln(e2)
      ---
      - '2.0000000000000000000000000000000000000'
      ...
      
      There are also `abs` and `tostring` methods, and an unary minus
      operator, which are pretty self-explanatory.
      
      ```
      tarantool> a = decimal.new('-5')
      ---
      ...
      
      tarantool> a
      ---
      - '-5'
      ...
      
      tarantool> decimal.abs(a)
      ---
      - '5'
      ...
      
      tarantool> -a
      ---
      - '5'
      ...
      
      tostring(a)
      ---
      - '-5'
      ...
      
      ```
      
      `decimal.precision`, `decimal.scale` and `decimal.round` :
      The first two methods return precision, i.e. decimal digits in
      number representation, and scale, i.e. decimal digits after the decimal
      point in the number representation.
      `decimal.round` rounds the number to the given scale.
      ```
      tarantool> a = decimal.new('123.456789')
      ---
      ...
      
      tarantool> decimal.precision(a)
      ---
      - 9
      ...
      
      tarantool> decimal.scale(a)
      ---
      - 6
      ...
      
      tarantool> decimal.round(a, 4)
      ---
      - '123.4568'
      ...
      
      ```
      
      Comparsions: `>`, `<`, `>=`, `<=`, `==` are also legal and work as
      expected. You may compare decimals with lua numbers or strings. In that
      case comparsion will happen after the values are converted to decimal
      type.
      3ab387a4
    • Serge Petrenko's avatar
      lua/utils: add a function to register FFI metatypes. · 4ca39537
      Serge Petrenko authored
      A ffi metatype has a CTypeID, which can be used to push cdata of the
      type on the lua stack, and has an associated metatable, automatically
      applied to every created member of the type.
      This allows the behavior similar to pushing userdata and assigning a
      metatable to it.
      
      Needed for #692
      4ca39537
    • Serge Petrenko's avatar
      decimal: fix string formatting on construction from double · f64481de
      Serge Petrenko authored
      Use printf "%g" option instead of "%f" to trim traling zeros in such
      cases:
      decimal_from_double(1) -> '1.000000000000000' -> decimal_from_string()
      Now it should be
      decimal_from_double(1) -> '1' ->decimal_from_string()
      
      Follow-up 6d62c6c1
      f64481de
    • Serge Petrenko's avatar
      decimal: diallow infinity and NaN entirely. · db27d470
      Serge Petrenko authored
      While arithmetic operations do not return infinities or NaNs, it is
      possbile to construct an invalid decimal value from strings 'Infinity',
      'NaN' and similar. Some decimal mathematic functions may also result in
      an infinity, say, ln(0) yields '-Infinity'.
      So, add checks that the number is not a NaN or infinity after each
      operation, so that the operation either returns an error, or a valid
      finite decimal number.
      
      Follow-up 6d62c6c1
      db27d470
    • Serge Petrenko's avatar
      decimal: fix ln hang on values between ~ 0.9 and 1.1 · e0d4a5dc
      Serge Petrenko authored
      Turns out decNumberLn hangs when result is subnormal, according to the
      current context settings. To fix this, reset minimal allowed exponent
      to a smaller value during the ln operation and round the result afterwards.
      
      Follow-up 6d62c6c1
      e0d4a5dc
    • Vladimir Davydov's avatar
      vinyl: fix vy_range_update_compaction_priority hang · 75dc3e64
      Vladimir Davydov authored
      Under certain circumstances vy_slice_new() may create an empty slice,
      e.g. on range split:
      
         |------------------ Slice ---------------|
                               |---- Run -----|
                           +
                        split key
         |---- Slice 1 ----||------ Slice 2 ------|
               ^^^^^^^
                Empty
      
      vy_range_update_compaction_priority() uses the size of the last slice in
      a range as a base for LSM tree level sizing. If the slice size happens
      to be 0, it will simply hang in an infinite loop. Fix this potential
      hang by using 1 if the last slice size is 0.
      75dc3e64
    • Konstantin Osipov's avatar
  3. Jul 05, 2019
    • Vladislav Shpilevoy's avatar
      swim: push-pull dissemination · 3fb2b875
      Vladislav Shpilevoy authored
      SWIM in the original paper says, that dissemination time of an
      event is O(log(N)), where N is size of the cluster. It is true,
      when both ping and ack messages carry dissemination and
      anti-entropy. Before this patch it wasn't so - only regular
      pings were carrying something.
      
      After this patch the SWIM module has true exponential
      dissemination speed.
      
      Closes #4253
      3fb2b875
    • Vladislav Shpilevoy's avatar
      swim: speed-up empty payloads cluster bootstrap · f30309de
      Vladislav Shpilevoy authored
      One another place consuming most of the tests start up time is
      useless dissemination of an empty payload, which can be skipped
      in fact.
      
      Consider a cluster of 300 nodes. Each one of them are
      interconnected manually, and now a test wants to wait for a
      stabilization, when there are no events. On such a cluster it
      happens for ~200 round steps till there are no any single event.
      
      This is not about big packets, or log() TTD. There may be a few
      events, may be more, but when a test wants the cluster to be
      clean, it needs to wait for all the events being done.
      
      This patch abuses the fact, that empty payloads can be compared
      for free, no any single memcmp. If both new and the old payload
      are empty, then nothing to disseminate.
      
      It could help in a real cluster too, if initially there are no
      payloads.
      
      Needed for #4253
      f30309de
    • Vladislav Shpilevoy's avatar
      swim: speed-up tests · 7446ed21
      Vladislav Shpilevoy authored
      With following patches some of the tests will work much slower
      due to significantly increased size of the most of packets.
      
      This commit tries to smooth it by
      
          * Turning off verbose logs in unit tests;
          * Using much more light version of UUID comparator.
      
      According to the profiler these places increase speed in a
      couple of times, and at the same time they are simple.
      
      Needed for #4253
      7446ed21
    • Vladislav Shpilevoy's avatar
      test: redo some swim tests using error injections · a0d6ac29
      Vladislav Shpilevoy authored
      There were tests relying on certain content of SWIM messages.
      After next patches these conditions won't work without an
      explicit intervention with error injections.
      
      The patchset moves these tests to separate release-disabled
      files.
      
      Part of #4253
      a0d6ac29
    • Vladislav Shpilevoy's avatar
      swim: sadly remove cache · 679dea4e
      Vladislav Shpilevoy authored
      SWIM sends basically the same message during a round. There was
      a microoptimization so as not to reassemble the message on each
      step. Now it is getting harder to support that island of
      perfectionism, because
      
          * Soon all the messages will carry all the sections,
            including indirect messages. Their body is smaller, so it
            is not possible to maintain one cached message without
            reducing its maximal size;
      
          * In big-clusters even without any changes a cached message
            would need to be rebuilt. This is because anti-entropy
            section won't help much unless it is being changed
            frequent enough;
      
          * In big clusters changes happen often enough to invalidate
            the cached message constantly, unless SWIM would had
            maintained what members are included into the cache, and
            which are not. Then change of a member, not included into
            the message, would not affect the cache. But it would
            complicate the code too much.
      
      Part of #4253
      679dea4e
    • Vladislav Shpilevoy's avatar
      swim: be suspicious when add new member · 506d1878
      Vladislav Shpilevoy authored
      The previous commit solves one important problem with too long
      event dissemination. Events could for too long time occupy the
      whole UDP packet. Now they live log() time, but 'dead' and 'left'
      members were bound to TTD. Such members were deleted after TTD
      is 0.
      
      Now they are deleted to early. Cluster nodes too early forget
      about dead ones, and nodes not aware of death of the latters, can
      accidentally resurrect them via anti-entropy. Cluster nodes need
      to be suspicious when someone tells them to add a new not dead
      member.
      
      This patch makes SWIM add a new member in two cases only: manually
      and if an ACK was received from it. A new member can't be added
      indirectly via events and anti-entropy anymore. Instead, a ping is
      sent to the members who are said to be new and alive. If ACK is
      received directly from them, then they are added.
      
      The patch does not affect updates. They are still indirect,
      because if something has updated in an existing member, then it
      is definitely alive.
      
      Part of #4253
      506d1878
    • Vladislav Shpilevoy's avatar
      swim: disseminate event for log(cluster_size) steps · 0ec29b2f
      Vladislav Shpilevoy authored
      Before the patch there was a problem of events and anti-entropy
      starvation, when a cluster generates so many events, that they
      consume the whole UDP packet. A packet fits up to 26 events. If
      during the event storm something important happens, that event is
      likely to be lost, and not disseminated until the storm is over.
      
      Sadly, there is no way to prevent a storm, but it can be made
      much shorter. For that the patch makes TTD of events logarithmic
      instead of linear of cluster size.
      
      According to the SWIM paper and to experiments the logarithm is
      really enough. Linear TTD was a redundant overkill.
      
      When events live shorter, it does not solve a problem of the
      events starvation - still some of them can be lost in case of a
      storm. But it frees some space for anti-entropy, which can finish
      dissemination of lost events.
      
      Experiments in a simulation of a cluster with 100 nodes showed,
      that a failure dissemination happened in ~110 steps if there is
      a storm. Basically, no dissemination at all.
      
      After the patch it is ~20 steps. So it is logarithmic as it
      should be, although with a bigger constant than without a storm.
      
      Part of #4253
      0ec29b2f
    • Serge Petrenko's avatar
      lua/trigger: cleanup lua stack after trigger run · febacc4b
      Serge Petrenko authored
      This patch adds a stack cleanup after a trigger is run and its return
      values, if any, have been read.
      
      This problem was found in a case when on_schema_init trigger set an
      on_replace trigger on a space, and the trigger ran during recovery.
      This lead to Lua stack overflows for the aforementioned reasons.
      
      Closes #4275
      febacc4b
    • Vladimir Davydov's avatar
      Replace schema lock with fine-grained locking · e5c4ce75
      Vladimir Davydov authored
      Now, as we don't need to take the schema lock for checkpointing, it is
      only used to synchronize concurrent space modifications (drop, truncate,
      alter). Actually, a global lock is a way too heavy means to achieve this
      goal, because we only care about forbidding concurrent modifications of
      the same space while concurrent modifications of different spaces should
      work just fine. So this patch replaces the global schema lock with per
      space locking.
      
      A space lock is held while alter_space_do() is in progress so as to make
      sure that while AlterSpaceOp::prepare() is performing a potentially
      yielding operation, such as building a new index, the space struct
      doesn't get freed from under our feet. Note, the lock is released right
      after index build is complete, before the transaction is committed to
      WAL, so if the transaction is non-yielding it can modify the space again
      in the next statement (this is impossible now, but will be done in the
      scope of the transactional DDL feature).
      
      If alter_space_do() sees that the space is already locked it bails out
      and throws an error. This should be fine, because long-lasting operation
      involving schema change, such as building an index, are rare and only
      performed under the supervision of the user so throwing an error rather
      than waiting seems to be adequate.
      
      Removal of the schema lock allows us to remove latch_steal() helper and
      on_begin_stmt txn trigger altogether, as they were introduced solely to
      support locking.
      
      This is a prerequisite for transactional DDL, because it's unclear how
      to preserve the global schema lock while allowing to combine several DDL
      statements in the same transaction.
      e5c4ce75
    • Vladimir Davydov's avatar
      vinyl: don't yield while logging index creation · d9fc5dc1
      Vladimir Davydov authored
      Currently, we always log a vinyl index creation in the vylog file
      synchronously, i.e. wait for the write to complete successfully. This
      makes any index creation a yielding operation, even if the target space
      is empty. To implement transactional DDL for non-yielding statements, we
      need to eliminate yields in this case. We can do that by simply using
      vy_log_try_commit() instead of vy_log_commit() for logging index
      creation, because we can handle a missing VY_LOG_PREPARE_INDEX record
      during recovery - the code was left since before commit dd0827ba
      ("vinyl: log new index before WAL write on DDL") which split index
      creation into PREPARE and COMMIT stages so all we need to do is slightly
      modify the test.
      
      The reason why I'm doing this now, in the series removing the schema
      lock, is that removal of the schema lock without making space truncation
      non-yielding (remember space truncation basically drops and recreates
      all indexes) may result in a failure while executing space.truncate()
      from concurrent fibers, which is rather unexpected. In particular, this
      is checked by engine/truncate.test.lua. So to prevent the test failure
      once the schema lock is removed (see the next patch), let's make empty
      index creation non-yielding right now.
      d9fc5dc1
    • Vladimir Davydov's avatar
      Don't take schema lock for checkpointing · 94de0a08
      Vladimir Davydov authored
      Memtx checkpointing proceeds as follows: first we open iterators over
      primary indexes of all spaces and save them to a list, then we start
      a thread that uses the iterators to dump space contents to a snap file.
      To avoid accessing a freed tuple, we put the small allocator to the
      delayed free mode. However, this doesn't prevent an index from being
      dropped so we also take the schema lock to lock out any DDL operation
      that can potentially destroy a space or an index. Note, vinyl doesn't
      need this lock, because it implements index reference counting under
      the hood.
      
      Actually, we don't really need to take a lock - instead we can simply
      postpone index destruction until checkpointing is complete, similarly
      to how we postpone destruction of individual tuples. We even have all
      the infrastructure for this - it's delayed garbage collection. So this
      patch tweaks it a bit to delay the actual index destruction to be done
      after checkpointing is complete.
      
      This is a step forward towards removal of the schema lock, which stands
      in the way of transactional DDL.
      94de0a08
  4. Jul 04, 2019
  5. Jul 03, 2019
    • Kirill Shcherbatov's avatar
      box: introduce VARBINARY field type · 59de57d2
      Kirill Shcherbatov authored
      A new VARBINARY field type would be useful for SQL type system.
      
      Closes #4201
      Needed for #4206
      
      @TarantoolBot document
      Title: new varbinary field type
      
      Introduced a new field type varbinary to represent mp_bin values.
      The new type varbinary may be used in format or index definition.
      
      Example:
      s = box.schema.space.create('withdata')
      s:format({{"b", "varbinary"}})
      pk = s:create_index('pk', {parts = {1, "varbinary"}})
      59de57d2
  6. Jul 01, 2019
  7. Jun 28, 2019
    • Vladislav Shpilevoy's avatar
      swim: default generation is timestamp · b6b72013
      Vladislav Shpilevoy authored
      Generation is supposed to be a persistent counter to distinguish
      between different installations of the same SWIM instance. By
      default it was set to 0, which was quite unsafe.
      
      Kostja proposed an easy and bright solution - generation could be
      set to timestamp by default. In such a case on each restart it is
      almost 100% will be different.
      
      Follow up #4280
      b6b72013
    • Vladislav Shpilevoy's avatar
      swim: fix inability to set generation only · ea1e9192
      Vladislav Shpilevoy authored
      swim.new() is declared as allowed to be called before swim:cfg().
      But in fact swim.new({generation = ...}) didn't work because
      after generation extraction the empty config {} was passed to
      swim:cfg() and led to an error.
      
      The patch allows to call swim.new() with generation only, as well
      as without parameters at all.
      
      Follow up #4280
      ea1e9192
    • Vladislav Shpilevoy's avatar
      swim: fix a dangerous yield in ffi.gc · 98f29645
      Vladislav Shpilevoy authored
      FFI can't survive yields. A yield in ffi.C.func() leads to a
      crash; yield in ffi.gc is not documented as allowed. Yield in any
      GC function leads to garbage collector stuck until the yield is
      finished.
      
      This patch makes SWIM GC callback non-yielding. Now yielding
      swim_delete() is called in a separate fiber created in GC
      callback, but started at the end of event loop only.
      
      Follow up #3234
      98f29645
    • Vladislav Shpilevoy's avatar
      swim: fix a leak when a trigger is installed · eb403598
      Vladislav Shpilevoy authored
      SWIM wraps user triggers to prepare arguments. The wrapper
      function kept a reference to SWIM object, and prevented its
      automatic deletion at GC.
      
      The patch makes this reference weak.
      
      Follow up #4250
      eb403598
Loading