Skip to content
Snippets Groups Projects
  1. Apr 18, 2019
    • Vladislav Shpilevoy's avatar
      swim: drop incarnation_inc parameter from update() routines · c9eb277b
      Vladislav Shpilevoy authored
      Update_addr and update_payload need to increment member's
      incarnation when it is self. For that they used a special
      parameter incarnation_inc set in 1 for self and in 0 for others.
      
      It was used to encapsulate incarnation update + event scheduling
      on member attribute updates, but on the other hand it broke
      another encapsulation level - there should not be exceptions for
      'self' in these functions.
      
      This patch makes incarnation increment explicit in the places
      where 'self' is updated.
      c9eb277b
    • Vladislav Shpilevoy's avatar
      swim: introduce payload · dfc4ce43
      Vladislav Shpilevoy authored
      Payload is arbitrary user data disseminated over the cluster
      along with other member attributes.
      
      Part of #3234
      dfc4ce43
    • Vladislav Shpilevoy's avatar
      swim: rename TTL to TTD · 7df5f63f
      Vladislav Shpilevoy authored
      TTL is time-to-live and it slightly confuses when is said about a
      member's attribute. Status_ttl looks like after this value gets
      0 the status is deleted or is no longer valid.
      
      TTD is more precise definition for these counters and is expanded
      as time-to-disseminate.
      7df5f63f
    • Vladislav Shpilevoy's avatar
      swim: do not rebuild packet meta multiple times · d251efe7
      Vladislav Shpilevoy authored
      Before the patch there were 2 cases when an unchanged packet was
      rebuilt partially on each send:
      
        - cached round message's meta section was rebuilt on each
          EV_WRITE event in swim_scheduler_on_output() function;
      
        - broadcast message's meta section was rebuilt too even though
          its content does not depend on a broadcast interface.
      
      The third case appears with indirect pings patch which aggravates
      meta building business by routing and packet forwarding. When a
      packet needs to be forwarded farther, its meta is built in a
      special manner preserving the route before EV_WRITE appears, and
      on_output should not touch that meta.
      
      This patch adds a check preventing unnecessary meta rebuilds.
      Besides, the check and the meta building code are moved into a
      dedicated function out of swim_scheduler_on_output() - it allows
      to completely split logic of packing a message and sending it.
      Separated logic helps a lot when indirect pings are introduced.
      
      Part of #3234
      d251efe7
    • Vladislav Shpilevoy's avatar
      swim: fix a bug with invalidation of round msg in fly · 7a1a8b1c
      Vladislav Shpilevoy authored
      SWIM works in rounds, each split in steps. On step SWIM sends a
      round message. The message is not changed between steps of one
      round usually, and is cached so as not to rebuild it without
      necessity. But when something changes in one member's attributes,
      or in the member table, the message is invalidated to be rebuilt
      on a next step. Invalidation resets the cached packet.
      
      But it leads to a bug, when a round message is already scheduled
      to be sent, however is not actually sent and invalidated in fly.
      Such a message on a next EV_WRITE event will be sent as an empty
      packet, which obviously makes no sense.
      
      On the other hand it would be harmful to cancel the invalidated
      packet if it is in fly, because during frequent changes the
      instance will not send anything.
      
      There are no a test case, because empty packets does not break
      anything, but still they are useless. And, as it is said above,
      such invalidation would prevent sending any round messages when
      there are lots of updates.
      
      Follow up for cf0ddeb8
      (swim: keep encoded round message cached)
      7a1a8b1c
    • Vladislav Shpilevoy's avatar
      swim: extract binary ip/port into a separate struct · b1c77743
      Vladislav Shpilevoy authored
      At this moment there are two binary structures in the SWIM
      protocol carrying an address: swim_member_passport and
      swim_meta_header_bin - one address in each. This code duplication
      was not formidable enough to stimulate creation of a separate
      address structure.
      
      But forthcoming indirect messages protocol extensions will add 2
      new cases of encoding a binary address. It triggered this patch
      to reduce code duplication.
      
      Part of #3234
      b1c77743
    • Vladislav Shpilevoy's avatar
      swim: move sockaddr_in checkers to swim_proto.h · fbccee1d
      Vladislav Shpilevoy authored
      There are several places where it is necessary to check if a
      sockaddr_in is nullified, and to compare a couple of addresses.
      Some of them are in swim_proto.c, and more are coming in indirect
      SWIM messages patch. The patch moves the checkers into
      swim_proto.h so as to be usable from anywhere in SWIM.
      
      Also minor renames are made alongside. 'sockaddr_in' is too long
      to use in each related function's name, and is replaced with
      'inaddr' by analogue with the standard library.
      
      Part of #3234
      fbccee1d
    • Vladimir Davydov's avatar
      memtx: cancel checkpoint thread at exit · d95608e4
      Vladimir Davydov authored
      If a tarantool instance exits while checkpointing is in progress, the
      memtx checkpoint thread, which writes the snap file, can access already
      freed data resulting in a crash. Let's fix this the same way we did for
      relay and vinyl threads - simply cancel the thread forcefully and wait
      for it to terminate.
      
      Closes #4170
      d95608e4
    • Vladimir Davydov's avatar
      Improve box.stat.net output · 14673a71
      Vladimir Davydov authored
       - Add REQUESTS.current to report the number of requests currently in
         flight, because it's useful for understanding whether we need to
         increase box.cfg.net_msg_max.
       - Add REQUESTS.{rps,total}, because knowing the number of requests
         processed per second can come in handy for performance analysis.
       - Add CONNECTIONS.{rps,total} that show the number of connections
         opened per second and total. Those are not really necessary, but
         without them the output looks kinda lopsided.
      
      Closes #4150
      
      @TarantoolBot document
      Title: Document new box.stat.net fields
      
      Here's the list of the new fields:
      
       - `CONNECTIONS.rps` - number of connections opened per second recently
         (for the last 5 seconds).
       - `CONNECTIONS.total` - total number of connections opened so far.
       - `REQUESTS.current` - number of requests in flight (this is what's
         limited by `box.cfg.net_msg_max`).
       - `REQUESTS.rps` - number of requests processed per second recently
         (for the last 5 seconds).
       - `REQUESTS.total` - total number of requests processed so far.
      
      `CONNECTIONS.rps`, `CONNECTIONS.total`, `REQUESTS.rps`, `REQUESTS.total`
      are reset by `box.stat.reset()`.
      
      Example of the new output:
      ```
      ---
      - SENT:
          total: 5344924
          rps: 840212
        CONNECTIONS:
          current: 60
          rps: 148
          total: 949
        REQUESTS:
          current: 17
          rps: 1936
          total: 12139
        RECEIVED:
          total: 240882
          rps: 38428
      ...
      ```
      14673a71
  2. Apr 17, 2019
  3. Apr 16, 2019
    • Vladimir Davydov's avatar
      vinyl: fix crash if space is dropped while space.get is reading from it · 75f03a50
      Vladimir Davydov authored
      In contrast to vinyl_iterator, vinyl_index_get doesn't take a reference
      to the LSM tree while reading from it. As a result, if the LSM tree is
      dropped in the meantime, vinyl_index_get will crash. Fix this issue by
      surrounding vy_get with vy_lsm_ref/unref.
      
      Closes #4109
      75f03a50
    • Vladimir Davydov's avatar
      vinyl: fix crash during index build · ccd46a27
      Vladimir Davydov authored
      To propagate changes applied to a space while a new index is being
      built, we install an on_replace trigger. In case the on_replace
      trigger callback fails, we abort the DDL operation.
      
      The problem is the trigger may yield, e.g. to check the unique
      constraint of the new index. This opens a time window for the DDL
      operation to complete and clear the trigger. If this happens, the
      trigger will try to access the outdated build context and crash:
      
       | #0  0x558f29cdfbc7 in print_backtrace+9
       | #1  0x558f29bd37db in _ZL12sig_fatal_cbiP9siginfo_tPv+1e7
       | #2  0x7fe24e4ab0e0 in __restore_rt+0
       | #3  0x558f29bfe036 in error_unref+1a
       | #4  0x558f29bfe0d1 in diag_clear+27
       | #5  0x558f29bfe133 in diag_move+1c
       | #6  0x558f29c0a4e2 in vy_build_on_replace+236
       | #7  0x558f29cf3554 in trigger_run+7a
       | #8  0x558f29c7b494 in txn_commit_stmt+125
       | #9  0x558f29c7e22c in box_process_rw+ec
       | #10 0x558f29c81743 in box_process1+8b
       | #11 0x558f29c81d5c in box_upsert+c4
       | #12 0x558f29caf110 in lbox_upsert+131
       | #13 0x558f29cfed97 in lj_BC_FUNCC+34
       | #14 0x558f29d104a4 in lua_pcall+34
       | #15 0x558f29cc7b09 in luaT_call+29
       | #16 0x558f29cc1de5 in lua_fiber_run_f+74
       | #17 0x558f29bd30d8 in _ZL16fiber_cxx_invokePFiP13__va_list_tagES0_+1e
       | #18 0x558f29cdca33 in fiber_loop+41
       | #19 0x558f29e4e8cd in coro_init+4c
      
      To fix this issue, let's recall that when a DDL operation completes,
      all pending transactions that affect the altered space are aborted by
      the space_invalidate callback. So to avoid the crash, we just need to
      bail out early from the on_replace trigger callback if we detect that
      the current transaction has been aborted.
      
      Closes #4152
      ccd46a27
    • Cyrill Gorcunov's avatar
      fiber: Unify sizeof operator · a6b443b0
      Cyrill Gorcunov authored
      We use sizeof as a function in most of the overall code,
      fix this nit.
      a6b443b0
    • Roman Khabibov's avatar
      tarantoolctl: raise error when box.cfg isn't called · 2b387d1c
      Roman Khabibov authored
      Added a check whether box.cfg() is called within an instance
      file. If box.cfg() is missed, point a user the reason of a
      fail explicitly.
      
      Before this commit the error was look so:
      
      /usr/bin/tarantoolctl:541: attempt to index a nil value
      
      Closes #3953
      2b387d1c
  4. Apr 12, 2019
    • Vladislav Shpilevoy's avatar
      test: introduce new SWIM packet filter by component names · a0016971
      Vladislav Shpilevoy authored
      In the next patch on payloads it is wanted to drop only packets
      containing certain sections such as anti-entropy, dissemination.
      New SWIM test transport filters allow to implement this with
      ease.
      
      Part of #3234
      a0016971
    • Vladislav Shpilevoy's avatar
      test: generalize SWIM fake descriptor filters · 6ac68cc4
      Vladislav Shpilevoy authored
      At this moment SWIM test harness implements its own fake file
      descriptor table, which is used unawares by the real SWIM code.
      Each fake fd has send and recv queues, can delay and drop
      packets with a certain probability. But it is going to be not
      enough for new tests.
      
      It is wanted to be able to drop packets with a specified content,
      from and to a specified direction. For that the patch implements
      a filtering mechanism. Each fake fd now has a list of filters,
      applied one by one to each packet. If at least on filter wants to
      drop a packet, then it is dropped. The filters know packet
      content and direction: outgoing or incomming.
      
      Now only one filter exists - drop rate. It existed even before
      the patch, but now it is ported on the new API.
      
      Part of #3234
      6ac68cc4
    • Vladislav Shpilevoy's avatar
      swim: factor out 'update' part of swim_member_upsert() · 1eb82afc
      Vladislav Shpilevoy authored
      Move 'update' logic into a separate function, because in the next
      commits it is going to become more complicated due to payload
      introduction, and it would be undesirable to clog the upsert()
      function with payload-specific code.
      
      Part of #3234
      1eb82afc
    • Vladislav Shpilevoy's avatar
      swim: replace event_bin and member_bin with the passport · 17f895ed
      Vladislav Shpilevoy authored
      Event_bin and member_bin binary packet structures were designed
      separately for different purposes. Initially the event_bin was
      thought having the same fields as passport + optional old UUID +
      optional payload. On the other hand, member_bin was supposed to
      store the passport + mandatory payload.
      
      But old UUID was cut off in favour of another way of UUID update.
      And payload appeared to be optional in both anti-entropy and
      dissemination. It means, that member_bin and event_bin are not
      needed anymore as separate structures. This commit replaces them
      with the passport completely.
      
      Part of #3234
      17f895ed
    • Vladislav Shpilevoy's avatar
      swim: factor out MP_BIN decoding from swim_decode_uuid · 61b0bd5a
      Vladislav Shpilevoy authored
      The new function is swim_decode_bin(), and is going to be used
      to safely decode payloads - arbitrary binary data disseminated
      alongside with all the other SWIM member attributes.
      
      Part of #3234
      61b0bd5a
    • Alexander Turenko's avatar
      net.box: fix 'unique' index flag in net.box schema · f1f6433b
      Alexander Turenko authored
      Before this commit it always returns false.
      
      Fixes #4091.
      f1f6433b
    • Cyrill Gorcunov's avatar
      fiber: Define constants for reserved fids · 4bb7b332
      Cyrill Gorcunov authored
      Opencoded constants are not good for long time
      support, make it named one. Moreover there was
      a typo in comment, fid = 0 is reserved as well.
      4bb7b332
    • Cyrill Gorcunov's avatar
      fiber: Drop unused FIBER_CALL_STACK · 9e7b460c
      Cyrill Gorcunov authored
      The constant is leftover from 08585902
      9e7b460c
    • Serge Petrenko's avatar
      test: rework box/on_shutdown test · c0260529
      Serge Petrenko authored
      The test is flaky under high load (e.g. when is run in parallel with a
      lot of workers). Make it less dependent on arbitrary timeouts to improve
      stability.
      
      Part of #4134
      c0260529
    • Serge Petrenko's avatar
      test: extract on_shutdown tests from box/misc · 70ea9998
      Serge Petrenko authored
      This part of the test is flaky when tests are run in parallel, besides,
      it is quite big on its own, so extract it into a separate file to add
      more flexibility in running tests and to make finding problems easier.
      
      Part of #4134
      70ea9998
    • Nikita Pettik's avatar
      sql: increment rowcount of FK alteration · 3bd13cf5
      Nikita Pettik authored
      Before this patch SQL statement which involves FK constraints creation
      or drop didn't increment rowcount:
      
      box.execute("ALTER TABLE t ADD CONSTRAINT fk1 FOREIGN KEY (b) REFERENCES parent (a);")
      ---
      - rowcount: 0
      ...
      
      This patch fixes this misbehaviour: accidentally VDBE was forgotten to
      enable counting changes during ALTER TABLE ADD/DROP constraint.
      
      Closes #4130
      3bd13cf5
    • Cyrill Gorcunov's avatar
      fiber: Drop redundant memset call · 264beb8b
      Cyrill Gorcunov authored
      When we allocate new fiber we are clearing the whole
      structure right after, so no need to call memset again,
      coro context is already full of zeros.
      
      Note the coro context is close to 1K size and redundat
      memset here is really a penalty.
      264beb8b
    • avtikhon's avatar
      test: disable flaky performance test · 962c2cae
      avtikhon authored
      Disabled wal_off/iterator_lt_gt.test.lua test due to performance
      test need to be reorganized into separate mode at the standalone
      host. Currently this test doesn't show any issue, but breaks the
      testing some time, with errors like:
      
      [010] wal_off/iterator_lt_gt.test.lua                                 [ fail ]
      [010]
      [010] Test failed! Result content mismatch:
      [010] --- wal_off/iterator_lt_gt.result	Fri Apr 12 10:30:43 2019
      [010] +++ wal_off/iterator_lt_gt.reject	Fri Apr 12 10:36:30 2019
      [010] @@ -79,7 +79,9 @@
      [010]  ...
      [010]  too_longs
      [010]  ---
      [010] -- []
      [010] +- - 'Some of the iterators takes too long to position: 0.074278'
      [010] +  - 'Some of the iterators takes too long to position: 0.11786'
      [010] +  - 'Some of the iterators takes too long to position: 0.053848'
      [010]  ...
      [010]  s:drop()
      [010]  ---
      [010]
      [010] Last 15 lines of Tarantool Log file [Instance "wal"][/tarantool/test/var/010_wal_off/wal.log]:
      
      See #2539
      962c2cae
    • Konstantin Osipov's avatar
      iproto: reduce effects of input buffer fragmentation on large cfg.readahead · a58b9bb8
      Konstantin Osipov authored
      When cfg.readahead is large, iproto_reset_input() has a tendency to
      leave all input buffers large enough for a long time. On the other hand,
      the input buffer is not recycled until its maximal size is reached.
      This leaves to a case when we keep shifting the read position towards
      the end of the buffer, fragmenting memory and growing it to readahead
      size, even if input packets and batches are actually small.
      
      Suggested by Alexander Turenko.
      a58b9bb8
    • Vladimir Davydov's avatar
      vinyl: improve dump start/stop logging · 363ab8e6
      Vladimir Davydov authored
      When initiating memory dump, print how much memory is going to be
      dumped, expected dump rate, ETA, and recent write rate. Upon dump
      completion, print observed dump rate in addition to dump size and
      duration. This should help debugging stalls on memory quota.
      
      Example:
      
       | 2019-04-12 12:03:25.092 [30948] main/115/lua I> dumping 39659424 bytes, expected rate 6.0 MB/s, ETA 6.3 s, recent write rate 4.2 MB/s
       | 2019-04-12 12:03:25.101 [30948] main/106/vinyl.scheduler I> 512/1: dump started
       | 2019-04-12 12:03:25.102 [30948] vinyl.dump.0/104/task I> writing `./512/1/00000000000000000008.run'
       | 2019-04-12 12:03:26.487 [30948] vinyl.dump.0/104/task I> writing `./512/1/00000000000000000008.index'
       | 2019-04-12 12:03:26.547 [30948] main/106/vinyl.scheduler I> 512/1: dump completed
       | 2019-04-12 12:03:26.551 [30948] main/106/vinyl.scheduler I> 512/0: dump started
       | 2019-04-12 12:03:26.553 [30948] vinyl.dump.0/105/task I> writing `./512/0/00000000000000000010.run'
       | 2019-04-12 12:03:28.026 [30948] vinyl.dump.0/105/task I> writing `./512/0/00000000000000000010.index'
       | 2019-04-12 12:03:28.100 [30948] main/106/vinyl.scheduler I> 512/0: dump completed
       | 2019-04-12 12:03:28.100 [30948] main/106/vinyl.scheduler I> dumped 33554332 bytes in 3.0 s, rate 10.6 MB/s
      363ab8e6
    • Vladimir Davydov's avatar
      vinyl: account statements skipped on read · 779fa706
      Vladimir Davydov authored
      After we retrieve a statement from a secondary index, we always do
      a lookup in the primary index to get the full tuple corresponding to
      the found secondary key. It may turn out that the full tuple doesn't
      match the secondary key, which means the key was overwritten, but
      the DELETE statement hasn't been propagated yet (aka deferred DELETE).
      Currently, there's no way to figure out how often this happens as all
      tuples read from an LSM tree are accounted under 'get' counter.
      
      So this patch splits 'get' in two: 'get', which now accounts only
      tuples actually returned to the user, and 'skip', which accounts
      skipped tuples.
      779fa706
  5. Apr 11, 2019
    • Vladislav Shpilevoy's avatar
      test: add srand(time(NULL)) to swim unit tests · 3de1456f
      Vladislav Shpilevoy authored
      Appeared, that it is not called. But probably it should be, in
      order to catch more errors.
      3de1456f
    • Vladimir Davydov's avatar
      vinyl: take into account primary key lookup in latency accounting · b5734069
      Vladimir Davydov authored
      Currently, latency accounting and warning lives in vy_point_lookup and
      vy_read_iterator_next. As a result, we don't take into account full by
      partial tuple lookup in it while it can take quite a while, especially
      if there are lots of deferred DELETE statements we have to skip. So this
      patch moves latency accounting to the upper level, namely to vy_get and
      vinyl_iterator_{primary,secondary}_next.
      
      Note, as a side effect, now we always print full tuples to the log on
      "too long" warning. Besides, we strip LSN and statement type as those
      don't make much sense anymore.
      b5734069
    • Vladimir Davydov's avatar
      box: account index.pairs in box.stat.SELECT · 7275ad6b
      Vladimir Davydov authored
      box.stat.SELECT accounts index.get and index.select, but not
      index.pairs, which is confusing since pairs() may be used even
      more often than select() in a Lua application.
      7275ad6b
    • Vladislav Shpilevoy's avatar
      swim: keep encoded round message cached · cf0ddeb8
      Vladislav Shpilevoy authored
      During a SWIM round a message is being handed out consisting of
      at most 4 sections. Parts of the message change rarely along with
      a member attribute update, or with removal of a member. So it is
      possible to cache the message and send it during several round
      steps in a row. Or even do not rebuild it the whole round.
      
      Part of #3234
      cf0ddeb8
    • Konstantin Osipov's avatar
      sql: as a temporary hack, coerce typeof() return values with nosql types · cdde8aea
      Konstantin Osipov authored
      SQL is still using a sqlite legacy enum and not enum field_type from
      NoSQL to identify types. This creates a mess with type identification,
      when the original column/literal type is lost during expression
      evaluation.
      Until we have proper type arithmetics and preserve field_type in
      expressions, coerce the string return value of typeof() functions, which
      queries SQL expression value type, with the closest nosql type name.
      
      Rename:
          real -> number
          text -> string
          blob -> scalar
      cdde8aea
    • Vladislav Shpilevoy's avatar
      swim: fix typos in the code · 003e0ff6
      Vladislav Shpilevoy authored
      After turning on a spell checker there were found lots of typos.
      The commit fixes them.
      003e0ff6
    • Vladislav Shpilevoy's avatar
      test: fix SWIM test number · 10b4401c
      Vladislav Shpilevoy authored
      During merge it was accidentally set to too low number.
      
      Follow up 8fe05fdd
      (swim: expose ping broadcast API)
      10b4401c
    • Vladislav Shpilevoy's avatar
      swim: expose ping broadcast API · 8fe05fdd
      Vladislav Shpilevoy authored
      The previous commit has introduced an API to broadcast SWIM
      packets. This commit harnesses it in orider to allow user to do
      initial discovery in a cluster, when member tables are empty, and
      UUIDs aren't ready at hand.
      
      Part of #3234
      8fe05fdd
Loading