Skip to content
Snippets Groups Projects
  1. Jul 26, 2018
  2. Jul 22, 2018
    • Vladimir Davydov's avatar
      replication: unregister replica with gc if deleted from cluster · ea28a925
      Vladimir Davydov authored
      When a replica is removed from the cluster table, the corresponding
      replica struct isn't destroyed unless both the relay and the applier
      attached to it are stopped, see replica_clear_id(). Since replica struct
      is a holder of the garbage collection state, this means that in case an
      evicted replica has an applier or a relay that fails to exit for some
      reason, garbage collection will hang.
      
      A relay thread stops as soon as the replica it was started for receives
      a row that tries to delete it from the cluster table (because this isn't
      allowed by the cluster space trigger, see on_replace_dd_cluster()).
      If a replica isn't running, the corresponding relay can't run as well,
      because writing to a closed socket isn't allowed. That said, a relay
      can't block garbage collection.
      
      An applier, however, is deleted only when replication is reconfigured.
      So if a replica that was evicted from the cluster was configured as a
      master, its replica struct will hang around blocking garbage collection
      for as long as the replica remains in box.cfg.replication. This is what
      happens in #3546.
      
      Fix this issue by forcefully unregistering a replica with the garbage
      collector when it is deleted from the cluster table. This is OK as it
      won't be able to resubscribe and so we don't need to keep WALs for it
      any longer. Note, the relay thread may still be running when a replica
      is deleted from the cluster table, in which case we can't unregister it
      with the garbage collector right away, because the relay may need to
      access the garbage collection state. In such a case, leave the job to
      replica_clear_relay, which is called as soon as the relay thread exits.
      
      Closes #3546
      ea28a925
  3. Jul 17, 2018
    • Kirill Shcherbatov's avatar
      net.box: fix invalid index:count() with iterator · 25b9f0f0
      Kirill Shcherbatov authored
      Net.box didn't pass options containing iterator to
      server side.
      There were also invalid results for two :count tests in
      net.box.result file.
      
      Thanks @ademenev for contributing problem and help with
      problem locating.
      
      Closes #3262.
      25b9f0f0
  4. Jul 16, 2018
  5. Jul 12, 2018
    • Kirill Shcherbatov's avatar
      third-party: update libyaml submodule · aeabe633
      Kirill Shcherbatov authored
      Need to update tests as with fixup in upstrem
      commit baf636a74b4b6d055d93e2d01366d6097eb82d90
      Author: Tina Müller <cpan2@tinita.de>
      Date:   Thu Jun 14 19:27:04 2018 +0200
      
      The closing single quote needs to be indented...
      if it's on its own line.
      
      Closes #3275.
      aeabe633
  6. Jul 10, 2018
    • Kirill Shcherbatov's avatar
      app: fix parsing integers with exponent in json · f9f89acb
      Kirill Shcherbatov authored
      Now it is possible to specify a number in exponential
      form via all formats allowed by json standard.
      json.decode('{"remained_amount":2.0e+3}')
      json.decode('{"remained_amount":2.0E+3}')
      json.decode('{"remained_amount":2e+3}')
      json.decode('{"remained_amount":2E+3}')     <-- fixed
      
      Closes #3514.
      f9f89acb
  7. Jul 09, 2018
    • Serge Petrenko's avatar
      Do not update schema_version on space:truncate(). · 2407e389
      Serge Petrenko authored
      Schema version is used by both clients and internal modules to check
      whether there vere any updates in spaces and indices. While clients
      only need to be notified when there is a noticeable change, e.g.
      space is removed, internal components also need to be notified when
      something like space:truncate() happens, because even though this
      operation doesn't change space id or any of its indices, it creates a
      new space object, so all the pointers to the old object have to be updated.
      Currently both clients and internals share the same schema version, which
      leads to unnecessary updates on the client side.
      
      Fix this by implementing 2 separate counters for internal and public use:
      schema_state gets updated on every change, including recreation of the same
      space object, while schema_version is updated only when there are noticable
      changes for the clients. Introduce a new AlterOp to alter.cc to update
      public schema_version.
      Now all the internals reference schema_state, while all the clients use
      schema_version. box.iternal.schema_version() returns schema_version
      (the public one).
      
      Closes: #3414
      2407e389
  8. Jul 05, 2018
    • Kirill Shcherbatov's avatar
      lib/bitset: rename bitset structs · befd4ee1
      Kirill Shcherbatov authored
      Fixed FreeBSD build: there were conflicting types bitset
      declared in lib/bitset and _cpuset.h that is the part of
      pthread_np.h used on FreeBSD.
      
      Resolves #3046.
      befd4ee1
    • Kirill Yukhin's avatar
      test: fix box-tap/cfg.test · 965ada65
      Kirill Yukhin authored
      After read-only flag is dropped, a test space
      is created successfully and on next launch creation
      will fail since it is not droppped.
      Drop the space.
      
      Closes #3507
      965ada65
  9. Jul 04, 2018
    • Serge Petrenko's avatar
      Fix nested calls to box.session.su() · 566e066c
      Serge Petrenko authored
      box.session.su() set effective user to user
      after its execution, which made nested calls
      to it not work. Fixed this by saving current
      effective user and recovering from the save
      after sudo execution. This opened up a bug in
      box.schema.user.drop(): it has unnecessary
      check for privelege PRIV_REVOKE, which never
      gets granted to anyone but admin. Also fixed
      this by adding one extra box.session.su() call.
      
      Closes #3090, #3492
      566e066c
  10. Jul 03, 2018
    • Konstantin Osipov's avatar
      memtx: vocally abort a transaction in case of implicit yield · 131121c9
      Konstantin Osipov authored
      Before this patch, memtx would silently roll back a multi-statement
      transaction on yield, switching the session to autocommit mode.
      
      It would do nothing in case yield happened in a sub-statement
      in auto-commit mode.
      
      This could lead to nasty/painful to debug side-effects in
      malformed Lua programs.
      
      Fix by adding a special transaction state - aborted, and enter
      this state in case of implicit yield.
      
      Check for what happens when a sub-statement yields.
      Check that yield trigger is removed by a rollback.
      
      Fixes gh-2631
      Fixes gh-2528
      131121c9
  11. Jun 29, 2018
  12. Jun 28, 2018
    • Ilya Markov's avatar
      http: Fix parse long headers names · 3d121dd4
      Ilya Markov authored
      Bug: During parsing http headers, long headers names are truncated
      to zero length, but values are not ignored.
      
      Fix this with adding parameter  max_header_name_length to http request.
      If header name is bigger than this value, header name is truncated to
      this length. Default value of max_header_name_length is 32.
      
      Do some refactoring with renaming long names in http_parser.
      
      Closes #3451
      3d121dd4
    • Ilya Markov's avatar
      http: Remove parsed status line from headers · 139aa814
      Ilya Markov authored
      Bug: Header parser validates http status line and besides saving http
      status, saves valid characters to header name, which is wrong.
      
      Fix this with skipping status line after validation without saving it as
      a header.
      
      In scope of #3451
      139aa814
    • Vladimir Davydov's avatar
      xdir: remove inprogress files after restart · f41aac61
      Vladimir Davydov authored
      If tarantool is stopped while writing a snapshot or a vinyl run file,
      inprogress files will never be removed. Fix this by collecting those
      files on recovery completion.
      
      Original patch by @IlyaMarkovMipt. Reworked by @locker.
      
      Closes #3406
      f41aac61
    • Konstantin Osipov's avatar
      test: update test results · aaa9bdbe
      Konstantin Osipov authored
      A minor follow up on the fix for gh-3452 (http.client timeout bug)
      aaa9bdbe
    • Ilya Markov's avatar
      http.client: Fix waiting after received result · 7dcc8b42
      Ilya Markov authored
      Current implementation of http.client relies on fiber_cond which is set
      after the request was registered and doesn't consider the fact that
      response may be handled before the set of fiber_cond.
      
      So we may have the following situation:
      1. Register request in libcurl(curl_multi_add_handle in curl_execute).
      2. Receive and process response, fiber_cond_signal on cond_var which no
      one waits.
      3. fiber_cond_wait on cond which is already signaled. Wait until timeout
      is fired.
      
      In this case user have to wait timeout, though data was received
      earlier.
      
      Fix this with adding extra flag in_progress to curl_request struct.
      Set this flag true before registering request in libcurl and set it
      false when request is finished before fiber_cond_signal.
      When in_progress flag is false, don't wait on cond variable.
      
      Add 1 error injection.
      
      Closes #3452
      7dcc8b42
  13. Jun 27, 2018
  14. Jun 25, 2018
    • Vladimir Davydov's avatar
      socket: fix race between unix tcp server stop and start · 80d379ee
      Vladimir Davydov authored
      If called on a unix socket, bind(2) creates a new file, see unix(7).
      When we stop a unix tcp server, we should remove that file. Currently,
      we do it from the tcp server fiber, after the server loop is broken,
      which happens when the socket is closed, see tcp_server_loop(). This
      opens a time window for another tcp server to reuse the same path:
      
          main fiber                  tcp server loop
          ----------                  ---------------
      
          -- Start a tcp server.
          s = socket.tcp_server('unix/', sock_path, ...)
          -- Stop the server.
          s:close()
      
                                      socket_readable? => no, break loop
      
          -- Start a new tcp server. Use the same path as before.
          -- This function succeeds, because the socket is closed
          -- so tcp_server_bind_addr() will clean up by itself.
          s = socket.tcp_server('unix/', sock_path, ...)
      
           tcp_server_bind
            tcp_server_bind_addr
             socket_bind => EADDRINUSE
             tcp_connect => ECONNREFUSED
             -- Remove dead unix socket.
             fio.unlink(addr.port)
             socket_bind => success
      
                                      -- Deletes unix socket used
                                      -- by the new server.
                                      fio.unlink(addr.port)
      
      In particular, the race results in sporadic failures of app-tap/console
      test, which restarts a tcp server using the same file path.
      
      To fix this issue, let's close the socket after removing the socket
      file. This is absolutely legit on any UNIX system, and this eliminates
      the race shown above, because a new server that tries to bind on the
      same path as the one already used by a dying server will not receive
      ECONNREFUSED until the socket fd is closed and hence the file is
      removed.
      
      A note about the app-tap/console test. After this patch is applied,
      socket.close() takes a little longer for unix tcp server, because it
      yields twice, once for removing the socket file and once for closing the
      socket file descriptor. As a result, on_disconnect() trigger left from
      the previous test case has time to run after session.type() check.
      Actually, those triggers have already been tested and we should have
      cleared them before proceeding to the next test case. So instead of
      adding two new on_disconnect checks to the test plan, let's clear the
      triggers before session.type() test case and remove 3 on_connect and 5
      on_auth checks from the test plan.
      
      Closes #3168
      80d379ee
    • Vladislav Shpilevoy's avatar
      iproto: protect from false-correct size in msg header · c6951c92
      Vladislav Shpilevoy authored
      Consider this packet:
      
          msgpack = require('msgpack')
          data = msgpack.encode(18400000000000000000)..'aaaaaaa'
      
      Tarantool interprets 18400000000000000000 as size of a coming
      iproto request, and tries with no any checks to allocate buffer
      of such size. It calculates needed capacity like this:
      
          capacity = start_value;
          while (capacity < size)
              capacity *= 2;
      
      Here it is possible that on i-th iteration 'capacity' < 'size',
      but 'capacity * 2' overflows 64 bits and becomes < 'size' again,
      so this loop never ends and occupies 100% CPU.
      
      Strictly speaking overflow has undefined behavior. On the
      original system it led to nullifying 'capacity'.
      
      Such size is improbable as a real packet gabarits, but can appear
      as a result of parsing of some invalid packet, first bytes of
      which accidentally appears to be valid MessagePack uint. This is
      how the bug emerged on the real system.
      
      Lets restrict the maximal packet size to 2GB.
      
      Closes #3464
      c6951c92
  15. Jun 14, 2018
  16. Jun 01, 2018
    • Vladimir Davydov's avatar
      vinyl: fix compaction vs checkpoint race resulting in invalid gc · b25e3168
      Vladimir Davydov authored
      The callback invoked upon compaction completion uses checkpoint_last()
      to determine whether compacted runs may be deleted: if the max LSN
      stored in a compacted run (run->dump_lsn) is greater than the LSN of the
      last checkpoint (gc_lsn) then the run doesn't belong to the last
      checkpoint and hence is safe to delete, see commit 35db70fa ("vinyl:
      remove runs not referenced by any checkpoint immediately").
      
      The problem is checkpoint_last() isn't synced with vylog rotation - it
      returns the signature of the last successfully created memtx snapshot
      and is updated in memtx_engine_commit_checkpoint() after vylog is
      rotated. If a compaction task completes after vylog is rotated but
      before snap file is renamed, it will assume that compacted runs do not
      belong to the last checkpoint, although they do (as they have been
      appended to the rotated vylog), and delete them.
      
      To eliminate this race, let's use vylog signature instead of snap
      signature in vy_task_compact_complete().
      
      Closes #3437
      b25e3168
  17. May 31, 2018
    • Vladimir Davydov's avatar
      vinyl: fix false-positive assertion at exit · ff02157f
      Vladimir Davydov authored
      latch_destroy() and fiber_cond_destroy() are basically no-op. All they
      do is check that latch/cond is not used. When a global latch or cond
      object is destroyed at exit, it may still have users and this is OK as
      we don't stop fibers at exit. In vinyl this results in the following
      false-positive assertion failures:
      
        src/latch.h:81: latch_destroy: Assertion `l->owner == NULL' failed.
      
        src/fiber_cond.c:49: fiber_cond_destroy: Assertion `rlist_empty(&c->waiters)' failed.
      
      Remove "destruction" of vy_log::latch to suppress the first one. Wake up
      all fibers waiting on vy_quota::cond before destruction to suppress the
      second one. Add some test cases.
      
      Closes #3412
      ff02157f
  18. May 29, 2018
  19. May 25, 2018
  20. May 24, 2018
    • Georgy Kirichenko's avatar
      replication: add strict ordering for appliers operating in a full mesh · edd76a2a
      Georgy Kirichenko authored
      In some cases when an applier processing yielded, other applier might
      start some conflicting operation and break replication and database
      consistency.
      Now applier locks a per-server-id latch before processing a transaction.
      This guarantees that there is only one applier request for each server
      in progress at each given moment.
      
      The problem was very rare until full mesh topologies in vinyl
      became a commonplace.
      
      Fixes gh-3339
      edd76a2a
  21. May 22, 2018
    • Konstantin Belyavskiy's avatar
      replication: fix bug with read-only replica as a bootstrap leader · 77098294
      Konstantin Belyavskiy authored
      Another broken case. Adding a new replica to cluster:
      +		if (replica->applier->remote_is_ro &&
      +		    replica->applier->vclock.signature == 0)
      In this case we may got an ER_READONLY, since signature is not 0.
      So leader election now has two phases:
       1. To select among read-write replicas.
       2. If no such found, try old algorithm for backward compatibility
          (case then all replicas exist in cluster table).
      
      Closes #3257
      77098294
  22. May 17, 2018
  23. May 15, 2018
  24. May 08, 2018
    • Ilya Markov's avatar
      socket: Fix socket test · 2b973c05
      Ilya Markov authored
      In sequential launch of app-tap/console.test, tests failed with "User
      exists" and binding errors.
      
      Make sockets path relative.
      Add users cleanup.
      
      Relates #3168
      2b973c05
  25. May 07, 2018
    • Georgy Kirichenko's avatar
      Don't try to lock a ddl latch in a multistatement tx · c7012534
      Georgy Kirichenko authored
      Any ddl is prohibited in a multistatement transaction, there is no
      reason to try to lock a ddl latch in tis case. Locking for already
      locked latch will cause an yield and a silent transaction rollback, and
      this will crash or assert tarantool server.
      
      Fixes #2783
      c7012534
  26. May 03, 2018
    • Vladislav Shpilevoy's avatar
      digest: fix error in base64 encode options · 6e1ac12e
      Vladislav Shpilevoy authored
      Any option of base64 leads to urlsafe encoding. It is wrong, and
      caused by incorrect flag checking. Fix it.
      
      Closes #3358
      6e1ac12e
    • Konstantin Osipov's avatar
      iproto: follow up patch for the fix for blocked connection · 1dcdc98e
      Konstantin Osipov authored
      * rename request_limit.test.lua to net_msg_max.test.lua
      * make net_msg_max.test.lua stable (courtesy of @Gerold103)
      * exclude disconnect messages from iproto_msg_max limit
      * add a separate warning for throttling based on readahead buffer overflow
      1dcdc98e
    • Vladislav Shpilevoy's avatar
      iproto: connection could block forever after a CALL request · f4d66dae
      Vladislav Shpilevoy authored
      Starting with 1.9, CALL request which yields releases
      the intput buffer in net thread before CALL is complete.
      A release trigger is fired when the CALL fiber yields.
      
      The problem is that by default the input socket is not
      included into poll() list of the event loop: thanks to an
      optimization by @kostja for strict request/response scenario,
      the socket is included into poll() list only after the response
      is sent to the client. Thus, the following could happen:
      
      * a client sends a long-polling request
      * the request yields and maybe never finishes
      * the socket is not being read until the long-polling request
        is finished
      
      The patch is to explicitly feed EV_READ event to the event
      loop on the client socket whenever we release the input buffer
      for a long-polling request.
      
      We may remove iproto_resume() from net_discard_input() along
      with this patch since iproto_resume() will be called by
      iproto_connection_on_input().
      f4d66dae
  27. Apr 18, 2018
    • Ilya Markov's avatar
      wal: Update request header after sequence update · 41589229
      Ilya Markov authored
      When tuple in insert/replace request has NULL value
      in the field incremented by sequence,
      request body is changed, NULL is replaced by value taken from
      sequence.
      But request header is not updated.
      So Redo log, which takes body from header if header exists,
      writes the old version of request to wal.
      
      Fixed this with updating header value after handling the sequence.
      
      Closes #3247
      41589229
Loading