Skip to content
Snippets Groups Projects
  1. May 20, 2020
    • Sergey Bronnikov's avatar
      test: mute broken tests on OpenBSD · c244c0b2
      Sergey Bronnikov authored
      Part of #4967
      c244c0b2
    • Sergey Bronnikov's avatar
      say: fix compilation on OpenBSD · 64308484
      Sergey Bronnikov authored
      - define macros LOG_MAKEPRI() on OpenBSD as it is absent
      - replace sigtimedwait() by sigwait() as latter is unsupported on OpenBSD
      
      Part of #4967
      64308484
    • Sergey Bronnikov's avatar
      Include libgen.h when building on OpenBSD · fd653200
      Sergey Bronnikov authored
      Part of #4967
      fd653200
    • Sergey Bronnikov's avatar
      sql: use mremap() on OpenBSD · 43ffbba6
      Sergey Bronnikov authored
      Part of #4967
      43ffbba6
    • Sergey Bronnikov's avatar
      Fix building of tt_pthread_attr_getstack() on OpenBSD · 15601e09
      Sergey Bronnikov authored
      Part of #4967
      15601e09
    • Sergey Bronnikov's avatar
      build: skip linking with -ldl on OpenBSD · 291c392a
      Sergey Bronnikov authored
      OpenBSD includes DL library in a base system
      
      Part of #4967
      291c392a
    • Sergey Bronnikov's avatar
      build: test pthread_stackseg_np() on OpenBSD · 378e48aa
      Sergey Bronnikov authored
      Part of #4967
      378e48aa
    • Sergey Bronnikov's avatar
      build: introduce openbsd build target · 0b516c10
      Sergey Bronnikov authored
      Part of #4967
      0b516c10
    • Alexander Turenko's avatar
      test: popen: fix popen test 'hang' under test-run · 0afba959
      Alexander Turenko authored
      killpg() on Mac OS may don't deliver a signal to a process: it seems
      that there is a race when a process is just forked. It means that
      popen_handle:close() may leave a process alive, when `opts.setsid` and
      `opts.group_signal` are set.
      
      There is simple reproducer, which does not leave alive `sleep 120`
      processes on Linux, but does it on Mac OS (within three-four runs in
      rows):
      
       | #include <signal.h>
       | #include <unistd.h>
       | #include <fcntl.h>
       |
       | int
       | main()
       | {
       | 	char *child_argv[] = {
       | 		"/bin/sh",
       | 		"-c",
       | 		"sleep 120",
       | 		NULL,
       | 	};
       | 	pid_t pid;
       | 	int fd[2];
       | 	pipe(fd);
       | 	fcntl(fd[0], F_SETFD, FD_CLOEXEC);
       | 	fcntl(fd[1], F_SETFD, FD_CLOEXEC);
       |
       | 	if ((pid = fork()) == 0) {
       | 		/* Child. */
       | 		close(fd[0]);
       | 		setpgrp();
       | 		for (int i = 0; i < 10; ++i) {
       | 			/* Proceed with killpg. */
       | 			if (i == 5)
       | 				close(fd[1]);
       | 			if (fork() == 0) {
       | 				/* Child. */
       | 				execve("/bin/sh", child_argv, NULL);
       | 			}
       | 		}
       | 	} else {
       | 		/* Parent. */
       | 		close(fd[1]);
       | 		char c;
       | 		read(fd[0], &c, 1);
       | 		killpg(pid, SIGKILL);
       | 	}
       | 	return 0;
       | }
      
      Compile it (`cc test.c -o test`) and run several times:
      
      $ for i in $(seq 1 1000); do                     \
          echo $i;                                     \
          ./test                                       \
          && ps -o pid,pgid,command -ax | grep [s]leep \
          && break;                                    \
      done
      
      This is the reason why `sleep 120` process may be alive even when the
      whole test passes.
      
      test-run captures stdout and stderr of a 'core = app' test and waits EOF
      on them. If a child process inherit one of them or both, the fd is still
      open for writing and so EOF situation will not appear until `sleep 120`
      will exit.
      
      This commit doesn't try to overcome the root of the problem, but close
      stdout and stderr for the child process that may not be killed / exited
      in time.
      
      Aside of this, updated found Mac OS peculiars in API comments of C and
      Lua popen modules.
      
      Fixes #4938
      
      @TarantoolBot document
      Title: popen: add note re group signaling on Mac OS
      
      Copyed from the popen_handle:signal() updated description:
      
      > Note: Mac OS may don't deliver a signal to a process in a group when
      > opts.setsid and opts.group_signal are set. It seems there is a race
      > here: when a process is just forked it may be not signaled.
      
      Copyed from the popen_handle:close() updated description:
      
      > Details about signaling:
      >
      > <...>
      > - There are peculiars in group signaling on Mac OS,
      >   @see popen_handle:signal() for details.
      
      Follows up https://github.com/tarantool/doc/issues/1248
      0afba959
    • Alexander Turenko's avatar
      popen: fix access to freed memory after :close() · 01211e8e
      Alexander Turenko authored
      popen_delete() always frees a handle memory even when it reports a
      failure to send SIGKILL, see [1]. We should reflect this contract in
      popen_handle:close() and mark the handle as closed despite
      popen_delete() return value.
      
      There are cases, when killpg() fails with EPERM on Mac OS, so
      popen_delete() reports a failure. See [1] for more information.
      
      [1]: 01657bfb ('popen: always free
      resources in popen_delete()')
      
      Fixes #4995
      01211e8e
  2. May 19, 2020
    • Vladislav Shpilevoy's avatar
      session: remove box.session.push() 'sync' · a38c861e
      Vladislav Shpilevoy authored
      Closes #4689
      
      @TarantoolBot document
      Title: box.session.push() 'sync' is deprecated
      
      box.session.push() had two parameters - data to push and 'sync'.
      The sync was a request ID with which the out of bound data should
      be pushed into a socket.
      
      This was introduced as a workaround for #3450, and is useless
      since its resolution.
      
      A user anyway can't push to different sessions, where that
      parameter could be useful. And pushing into requests of the same
      session, on the contrary, is something not really needed anywhere,
      not portable to non-binary session types (console, background),
      and is just dangerous since it is easy to add a bug here.
      
      The patch removes the parameter. Now there will be thrown a
      'Usage' error at attempt to use 'sync' parameter. In version 2.4
      it is deprecated, prints warnings into logs, but still works. In
      2.5 it is removed completely.
      a38c861e
    • Kirill Yukhin's avatar
      small: revert previous bump · 61d2cb64
      Kirill Yukhin authored
      Revert 03790ac5 commit in small library part.
      61d2cb64
  3. May 18, 2020
  4. May 15, 2020
    • Vladislav Shpilevoy's avatar
      test: make app-tap/init_script produce less diff · 7f20272e
      Vladislav Shpilevoy authored
      When a new option is added, app-tap/init_script
      outputs big diff. Because all options are printed with
      ordinal indexes. Addition of a new option changes
      indexes of all options after the new one.
      
      The patch removes indexes from the output making diff
      smaller, when a new option is added.
      7f20272e
    • Alexander V. Tikhonov's avatar
      gitlab-ci: disable perf tesing in scheduled runs · 413b563e
      Alexander V. Tikhonov authored
      Release branches should be regularly run using gitlab-ci pipeline
      schedules:
        https://gitlab.com/tarantool/tarantool/pipeline_schedules
      It will help to detect flaky issues. But there is no need to rerun
      too long running performance testing, to block it in schedules the
      option 'schedules' in 'except:' field was set.
      
      Part of #4974
      413b563e
    • Alexander V. Tikhonov's avatar
      gitlab-ci: set OSX to full testing · 95364f02
      Alexander V. Tikhonov authored
      Set all test suites at OSX testing.
      
      Close #4818
      95364f02
    • Serge Petrenko's avatar
      replication: remove unnecessary errors on replicating from an anonymous instance · caf73913
      Serge Petrenko authored
      Since the anonymous replica implementation, it was forbidden to
      replicate (join/subscribe/register) from anonymous instances.
      Actually, only joining and register should be banned, since an anonymous
      replica isn't able to register its peer in _cluster anyway.
      
      Let's allow other anonymous replicas, but not the normal ones, to subscribe to
      an anonymous replica.
      Also remove unnecessary ER_UNSUPPORTED errors from box_process_join()
      and box_process_register() for anonymous replicas. These cases are
      covered with ER_READONLY checks later on, since anonymous replicas must
      be read-only.
      
      Note, this patch doesn't allow normal instances to subscribe to
      anonymous ones. Even though it is technically possible, it may bring
      more problems than profits. Let's allow it later on if there's an explicit
      demand.
      
      Closes #4696
      caf73913
    • Serge Petrenko's avatar
      replication: add box.info.replication_anon · 4e5d60d8
      Serge Petrenko authored
      Closes #4900
      
      @TarantoolBot document
      Title: add new field to box.info: replication_anon
      
      It is now possible to list all the anonymous replicas following the
      instance with a call to `box.info.replication_anon()`
      The output is similar to the one produced by `box.info.replication` with
      an exception that anonymous replicas are indexed by their uuid strings
      rather then server ids, since server ids have no meaning for anonymous
      replicas.
      
      Note, that when you issue a plain `box.info.replication_anon`, the only
      info returned is the number of anonymous replicas following the current
      instance. In order to see the full stats, you have to call
      `box.info.replication_anon()`. This is done to not overload the `box.info`
      output with excess info, since there may be lots of anonymous replicas.
      Example:
      
      ```
      tarantool> box.info.replication_anon
      ---
      - count: 2
      ...
      
      tarantool> box.info.replication_anon()
      ---
      - 3a6a2cfb-7e47-42f6-8309-7a25c37feea1:
          id: 0
          uuid: 3a6a2cfb-7e47-42f6-8309-7a25c37feea1
          lsn: 0
          downstream:
            status: follow
            idle: 0.76203499999974
            vclock: {1: 1}
        f58e4cb0-e0a8-42a1-b439-591dd36c8e5e:
          id: 0
          uuid: f58e4cb0-e0a8-42a1-b439-591dd36c8e5e
          lsn: 0
          downstream:
            status: follow
            idle: 0.0041349999992235
            vclock: {1: 1}
      ...
      
      ```
      
      Note, that anonymous replicas hide their lsn from the others, so
      anonymous replica lsn will always be reported as zero, even if anonymous
      replicas perform some local space operations.
      To know the anonymous replica's lsn, you have to issue `box.info.lsn` on
      it.
      4e5d60d8
  5. May 12, 2020
    • Nikita Pettik's avatar
      vinyl: drop wasted runs in case range recovery fails · 32f59756
      Nikita Pettik authored
      
      If recovery process fails during range restoration, range itself is
      deleted and recovery is assumed to be finished as failed (in case of
      casual i.e. not forced recovery). During recovery of particular range,
      runs to be restored are refed twice: once when they are created at
      vy_run_new() and once when they are attached to slice. This fact is
      taken into consideration and after all ranges are recovered: all runs of
      lsm tree are unrefed so that slices own run resources (as a result, when
      slice is to be deleted its runs unrefed and deleted as well). However, if
      range recovery fails, range is dropped alongside with already recovered
      slices. This leads to unrefing runs - this is not accounted. To sum up
      recovery process below is a brief schema:
      
      foreach range in lsm.ranges {
        vy_lsm_recover_range(range) {
          foreach slice in range.slices {
            // inside recover_slice() each run is refed twice
            if vy_lsm_recover_slice() != 0 {
              // here all already restored slices are deleted and
              // corresponding runs are unrefed, so now they have 1 ref.
              range_delete()
            }
          }
        }
      }
      foreach run in lsm.runs {
        assert(run->refs > 1)
        vy_run_unref(run)
      }
      
      In this case, unrefing such runs one more time would lead to their
      destruction. On the other hand, iteration over runs may turn out to
      be unsafe, so we should use rlist_foreach_entry_safe(). Moreover, we
      should explicitly clean-up these runs calling vy_lsm_remove_run().
      
      Reviewed-by: default avatarVladislav Shpilevoy <vshpilevoi@mail.ru>
      
      Closes #4805
      32f59756
    • Nikita Pettik's avatar
      errinj: introduce delayed injection · 9d4ac029
      Nikita Pettik authored
      
      With new macro ERROR_INJECT_COUNTDOWN it is possible to delay error
      injection by iparam value: injection will be set only after iparam
      times the path is executed. For instance:
      
      void
      foo(int i)
      {
      	/* 2 is delay counter. */
      	ERROR_INJECT_COUNTDOWN(ERRINJ_FOO, {
      		 printf("Error injection on %d cycle!\n", i);
      		});
      }
      
      void
      boo(void)
      {
      	for (int i = 0; i < 10; ++i)
      		foo(i);
      }
      
      box.error.injection.set('ERRINJ_FOO', 2)
      
      The result is "Error injection on 2 cycle!". This type of error
      injection can turn out to be useful to set injection in the middle of
      query processing. Imagine following scenario:
      
      void
      foo(void)
      {
      	int *fds[10];
      	for (int i = 0; i < 10; ++i) {
      		fds[i] = malloc(sizeof(int));
      		if (fds[i] == NULL)
      			goto cleanup;
      	}
      cleanup:
      	free(fds[0]);
      }
      
      "cleanup" section obviously contains error and leads to memory leak.
      But using means of casual error injection without delay such situation
      can't be detected: OOM can be set only for first cycle iteration and in
      this particular case no leaks take place.
      
      Reviewed-by: default avatarVladislav Shpilevoy <vshpilevoi@mail.ru>
      9d4ac029
    • Nikita Pettik's avatar
      vinyl: add test on failed write iterator during compaction · cb062017
      Nikita Pettik authored
      vy_task_write_run() is executed in auxiliary thread (dump or
      compaction). Write iterator is created and used inside this function.
      Meanwhile, creating/destroying tuples in these threads does not change
      reference counter of corresponding  tuple formats (see vy_tuple_delete()
      and vy_stmt_alloc()). Without cleaning up write iterator right in
      write_iterator_start() after fail, this procedure takes place in
      vy_task_compaction_abort() or vy_task_dump_abort().  These *_abort()
      functions in turn are executed in the main thread.  Taking this into
      consideration, tuple might be allocated in aux. thread and deleted in
      the main thread. As a result, format reference counter might decrease,
      whereas it shouldn't change (otherwise tuple format will be destroyed
      before all tuples of this format are gone).
      
      Fortunately, clean-up of write iterator in another thread was found only
      on 1.10 branch, master branch already contains fix but lacks test
      (2f17c929). So let's introduce test with following scenario:
      
      1. run compaction process;
      2. add one or more slice sources in vy_write_iterator_start():
      corresponding slice_stream structures obtain newly created tuples
      in vy_slice_stream_next();
      3. the next call of vy_write_iterator_add_src() fails due to OOM,
      invalid run file or whatever;
      4. if write_iterator_start() didn't provide clean-up of sources, it
      would take place in vy_task_dump_abort() which would be executed in
      the main thread;
      5. now format reference counter would be less than it was before
      compaction.
      
      Closes #4864
      cb062017
    • Nikita Pettik's avatar
      vinyl: clean-up unprocessed read views in *_build_read_views() · 521a6fbd
      Nikita Pettik authored
      vy_write_iterator->read_views[i].history objects are allocated on
      region (see vy_write_iterator_push_rv()) during building history of the
      given key. However, in case of fail of vy_write_iterator_build_history()
      region is truncated but pointers to vy_write_history objects are not
      nullified. As a result, they may be accessed (for instance while
      finalizing write_iterator object in  vy_write_iterator_stop) which in
      turn may lead to crash, segfaul or disk formatting. The same may happen
      if vy_read_view_merge() fails during processing of read view array.
      Let's clean-up those objects in case of error takes place.
      
      Part of #4864
      521a6fbd
  6. May 08, 2020
    • HustonMmmavr's avatar
      static build: dockerfile entrypoint set to exec form · d3a7dd17
      HustonMmmavr authored
      According to dockerfile reference, there are two forms of specifying
      entrypoint: exec and shell. Exec form is preferrable and  allows use
      this image in scripts.
      
      Close #4960
      d3a7dd17
    • Alexander V. Tikhonov's avatar
      gitlab-ci: add Catalina OSX 10.15 · 76157ef6
      Alexander V. Tikhonov authored
      Added Catalina OSX 10.15 to gitlab-ci testing and removed OSX 10.13,
      due to decided to have only 2 last major releases, for now it is
      10.14 and 10.15 OSX versions. Also changed the commit job for branches
      from 10.14 to 10.15 OSX version.
      
      Additional cleanup for 'box_return_mp' and 'box_session_push',
      added API_EXPORT which defines nothrow, compiler warns or errors
      depending on the build options.
      
      Part of #4885
      Close #4873
      76157ef6
    • Alexander V. Tikhonov's avatar
      test: mark tests as fragile in a test's configs · faf7e482
      Alexander V. Tikhonov authored
      Fragiled flaky tests from parallel runs to avoid
      of flaky fails in regular testing:
      
        box-py/snapshot.test.py                ; gh-4514
        replication/misc.test.lua              ; gh-4940
        replication/skip_conflict_row.test.lua ; gh-4958
        replication-py/init_storage.test.py    ; gh-4949
        vinyl/stat.test.lua                    ; gh-4951
        xlog/checkpoint_daemon.test.lua        ; gh-4952
      
      Part of #4953
      faf7e482
    • Oleg Piskunov's avatar
      gitlab-ci: keep perf results as gitlab-ci artifacts · eeb501ec
      Oleg Piskunov authored
      Gitlab-ci pipeline modified in order to keep
      performance results into gitlab-ci artifacts.
      
      Closes #4920
      eeb501ec
    • Georgy Kirichenko's avatar
      wal: simplify rollback · a4f4adeb
      Georgy Kirichenko authored
      Here is a summary on how and when rollback works in WAL.
      
      Disk write failure can cause rollback. In that case the failed and
      all next transactions, sent to WAL, should be rolled back.
      Together. Following transactions should be rolled back too,
      because they could make their statements based on what they saw in
      the failed transaction. Also rollback of the failed transaction
      without rollback of the next ones can actually rewrite what they
      committed.
      
      So when rollback is started, *all* pending transactions should be
      rolled back. However if they would keep coming, the rollback would
      be infinite. This means to complete a rollback it is necessary to
      stop sending new transactions to WAL, then rollback all already
      sent. In the end allow new transactions again.
      
      Step-by-step:
      
      1) stop accepting all new transactions in WAL thread, where
      rollback is started. All new transactions don't even try to go to
      disk. They added to rollback queue immediately after arriving to
      WAL thread.
      
      2) tell TX thread to stop sending new transactions to WAL. So as
      the rollback queue would stop growing.
      
      3) rollback all transactions in reverse order.
      
      4) allow transactions again in WAL thread and TX thread.
      
      The algorithm is long, but simple and understandable. However
      implementation wasn't so easy. It was done using a 4-hop cbus
      route. 2 hops of which were supposed to clear cbus channel from
      all other cbus messages. Next two hops implemented steps 3 and 4.
      Rollback state of the WAL was signaled by checking internals of a
      preallocated cbus message.
      
      The patch makes it simpler and more straightforward. Rollback
      state is now signaled by a simple flag, and there is no a hack
      about clearing cbus channel, no touching attributes of a cbus
      message. The moment when all transactions are stopped and the last
      one has returned from WAL is visible explicitly, because the last
      sent to WAL journal entry is saved.
      
      Also there is a single route for commit and rollback cbus
      messages now, called tx_complete_batch(). This change will come
      in hand in scope of synchronous replication, when WAL write won't
      be enough for commit. And therefore 'commit' as a concept should
      be washed away from WAL's code gradually. Migrate to solely txn
      module.
      a4f4adeb
    • Roman Khabibov's avatar
      console: check on_shutdown() before exit · c7341a3d
      Roman Khabibov authored
      Add check that on_shutdown() triggers were called before exit,
      because in case of EOF or Ctrl+D (no signals) they were ignored.
      
      Closes #4703
      c7341a3d
  7. May 07, 2020
    • Nikita Pettik's avatar
      vinyl: init all vars before cleanup in vy_lsm_split_range() · 4dcba1b5
      Nikita Pettik authored
      If vy_key_from_msgpack() fails in vy_lsm_split_range(), clean-up
      procedure is called. However, at this moment struct vy_range *parts[2]
      is not initialized ergo contains garbage and access to this structure
      may result in crash, segfault or disk formatting. Let's move
      initialization of mentioned variables before call of
      vy_lsm_split_range().
      
      Part of #4864
      4dcba1b5
  8. May 01, 2020
  9. Apr 30, 2020
Loading