From 44401529a2253cbf00583a3680ae01f0cb74693c Mon Sep 17 00:00:00 2001
From: Magomed Kostoev <m.kostoev@tarantool.org>
Date: Thu, 26 Oct 2023 16:35:22 +0300
Subject: [PATCH] fiber: make the concurrent fiber_join safer

Prior to this patch a bunch of illegal conditions was possible:
1. The joinability of a fiber could be changed while the fiber is
   being joined by someone. This could lead to double recycling:
   the first one happened on the fiber finish, and the second one
   in the fiber join.
2. The joinability of a dead joinable fiber could be altered, this
   led to inability jo join the dead fiber and free its resources.
3. A running fiber could be joined concurrently by two or more
   fibers, so the fiber could be recycled more than once (once
   per each concurrent join).
4. A dead recycled fiber could be made joinable and joined leading
   to the double recycle.

Fixed these issues by adding a new FIBER_JOIN_BEEN_INVOKED flag: now
the `fiber_set_joinable` and `fiber_join_timeout` functions detect
the double join. Because of the API limitations both of them panic
when an invalid condition is met:
- The `fiber_set_joinable` was not designed to report errors.
- The `fiber_join_timeout` can't raise any error unless a timeout
  is met, because the `fiber_join` users don't expect to receive
  any error from this function at all (except the one generated
  by the joined fiber).

It's still possible that a fiber join is performed on a struct which
has been recycled and, if the new fiber is joinable too, this can't
be detected. The current fiber API does not allow to fix this, so
this is to be the user's responsibility, they should be warned about
the fact the double join to the same fiber is illegal.

Closes #7562

@TarantoolBot document
Title: `fiber_join`, `fiber_join_timeout` and `fiber_set_joinable`
behave differently now.

`fiber_join` and `fiber_join_timeout` now panic in case if double
join of the given fiber is detected.

`fiber_set_joinable` now panics if the given fiber is dead or is
joined already. This prevents some amount of error conditions that
could happen when using the API in an unexpected way, including:
- Making a dead joinable fiber non-joinable could lead to a memory
  leak: one can't join the fiber anymore.
- Making a dead joinable fiber joinable again is a sign of attempt
  to join the fiber later. That means the fiber struct may be joined
  later, when it's been recycled and reused. This could lead to a
  very hard to debug double join.
- Making an alive joined fiber non-joinable would lead to the double
  free: once on the fiber function finish, and secondly in the active
  fiber join finish. Risks of making it joinable are described above.
- Making a dead and recycled fiber joinable allowed to join the fiber
  once again leading to a double free.

Any given by the API `struct fiber` should only be joined once. If a
fiber is joined after the first join on it has finished the behavior
is undefined: it can either be a panic or an incidental join to a
totally foreign fiber.
---
 changelogs/unreleased/gh-7562-fiber_join.md |  6 ++
 src/lib/core/fiber.c                        | 61 +++++++++++++++++++++
 src/lib/core/fiber.h                        | 16 +++++-
 test/unit/fiber.cc                          | 14 +++++
 test/unit/fiber.result                      |  1 +
 5 files changed, 97 insertions(+), 1 deletion(-)
 create mode 100644 changelogs/unreleased/gh-7562-fiber_join.md

diff --git a/changelogs/unreleased/gh-7562-fiber_join.md b/changelogs/unreleased/gh-7562-fiber_join.md
new file mode 100644
index 0000000000..f951843e1c
--- /dev/null
+++ b/changelogs/unreleased/gh-7562-fiber_join.md
@@ -0,0 +1,6 @@
+## bugfix/core
+
+* Fixed a possible inconsistent state entering if fibers are joined incorrectly.
+  Now the `fiber_set_joinable` function panics if the fiber is dead or joined
+  already. The `fiber_join` and `fiber_join_timeout` functions now panic on a
+  double join if it is possible to detect it (gh-7562).
diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c
index 362afc06e8..5fa2be2126 100644
--- a/src/lib/core/fiber.c
+++ b/src/lib/core/fiber.c
@@ -664,6 +664,56 @@ fiber_set_joinable(struct fiber *fiber, bool yesno)
 {
 	if (fiber == NULL)
 		fiber = fiber();
+
+	/*
+	 * The C fiber API does not allow to report any error to the caller. At
+	 * the same time using the function in some conditions is unsafe. So in
+	 * case if such a condition is met the function panics.
+	 *
+	 * Here're the conditions the function may be called in:
+	 * 1. The fiber is not joinable and hasn't finished yet. We're good to
+	 *    change the fiber's joinability.
+	 * 2. The fiber is not joinable and is finished. That means the fiber
+	 *    is recycled already, so this is a use after free. This case is
+	 *    impossible from the Lua world, but can be done with C API, so it
+	 *    is to be prohibited.
+	 * 3. The fiber is joinable, not finished and not joined. We're good to
+	 *    change its joinability.
+	 * 4. The fiber is joinable, not finished, but joined already. Making
+	 *    the fiber non-joinable will lead to a double free: one in the
+	 *    fiber_loop, the second one in the active fiber_join_timeout.
+	 *    Making it joinable is a sign of attempt to join it later, so we
+	 *    can expect the second join in the future - detect it now, while
+	 *    we can do it, because in the future the fiber struct may be
+	 *    reused and it will be much more difficult to find the root cause
+	 *    of the double join (or we won't be able to detect it at all).
+	 * 5. The fiber is joinable, finished and not joined. The fiber should
+	 *    be joined in order to be recycled, so making it non-joinable can
+	 *    lead to a fiber leak and to be prohibited. The point about making
+	 *    it joinable from the statement #4 is applied here too.
+	 * 6. The fiber is joinable, finished and joined. In order to avoid
+	 *    introducing a new fiber flag, a joined fiber becomes non-joinable
+	 *    on death (see the fiber_join_timeout implementation), so this
+	 *    condition transforms into #2. And it's a use after free too.
+	 *
+	 * Simplifying:
+	 *   OK:
+	 *     1: !is_joinable && !is_dead
+	 *     3: is_joinable && !is_dead && !is_joined
+	 *   NOT OK:
+	 *     2: !is_joinable && is_dead
+	 *     4: is_joinable && !is_dead && is_joined
+	 *     5: is_joinable && is_dead && !is_joined
+	 *     6. is_joinable && is_dead && is_joined
+	 *
+	 *  Othervice, OK: !is_dead && (!is_joinable || !is_joined)
+	 *  So NOT OK: is_dead || (is_joinable && is_joined)
+	 *  So NOT OK: is_dead || is_joined, because joined implies joinable.
+	 */
+	if ((fiber->flags & FIBER_IS_DEAD) != 0 ||
+	    (fiber->flags & FIBER_JOIN_BEEN_INVOKED) != 0)
+		panic("%s on a dead or joined fiber detected", __func__);
+
 	if (yesno == true)
 		fiber->flags |= FIBER_IS_JOINABLE;
 	else
@@ -732,6 +782,12 @@ fiber_join_timeout(struct fiber *fiber, double timeout)
 	if ((fiber->flags & FIBER_IS_JOINABLE) == 0)
 		panic("the fiber is not joinable");
 
+	if ((fiber->flags & FIBER_JOIN_BEEN_INVOKED) != 0)
+		panic("join of a joined fiber detected");
+
+	/* Prohibit joining the fiber and changing its joinability. */
+	fiber->flags |= FIBER_JOIN_BEEN_INVOKED;
+
 	if (!fiber_is_dead(fiber)) {
 		double deadline = fiber_clock() + timeout;
 		while (!fiber_wait_on_deadline(fiber, deadline) &&
@@ -752,6 +808,11 @@ fiber_join_timeout(struct fiber *fiber, double timeout)
 	}
 	assert((fiber->flags & FIBER_IS_RUNNING) == 0);
 	assert((fiber->flags & FIBER_IS_JOINABLE) != 0);
+	/*
+	 * This line is here just to make the following statement true: if
+	 * a fiber is dead and is not joinable, that means it's recycled.
+	 * So we don't have to introduce a new FIBER_IS_RECYCLED flag.
+	 */
 	fiber->flags &= ~FIBER_IS_JOINABLE;
 
 	/* Move exception to the caller */
diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h
index 1059af3483..104c21b1c2 100644
--- a/src/lib/core/fiber.h
+++ b/src/lib/core/fiber.h
@@ -170,6 +170,11 @@ enum {
 	 * This flag idicates, if the fiber can be killed from the Lua world.
 	 */
 	FIBER_IS_SYSTEM         = 1 << 8,
+	/**
+	 * Someone has joined the fiber already. So this fiber can't be joined
+	 * once again nor can its joinability be changed.
+	 */
+	FIBER_JOIN_BEEN_INVOKED = 1 << 9,
 	FIBER_DEFAULT_FLAGS	= 0
 };
 
@@ -346,6 +351,11 @@ fiber_set_cancellable(bool yesno);
 
 /**
  * Set fiber to be joinable (false by default).
+ * The fiber must not be joined already nor dead.
+ *
+ * @pre the fiber is not dead (panic if not).
+ * @pre the fiber is not joined yet (panic if not).
+ *
  * \param fiber to (un)set the joinable property.
  *              If set to NULL, the current fiber is used.
  * \param yesno status to set
@@ -357,7 +367,9 @@ fiber_set_joinable(struct fiber *fiber, bool yesno);
  * Wait until the fiber is dead and then move its execution
  * status to the caller.
  * The fiber must not be detached (@sa fiber_set_joinable()).
- * @pre FIBER_IS_JOINABLE flag is set.
+ *
+ * @pre FIBER_IS_JOINABLE flag is set (panic if not).
+ * @pre the fiber is not joined yet (panic if not).
  *
  * \param f fiber to be woken up
  * \return fiber function ret code
@@ -372,7 +384,9 @@ fiber_join(struct fiber *f);
  * Return fiber execution status to the caller or -1
  * if timeout exceeded and set diag.
  * The fiber must not be detached @sa fiber_set_joinable()
+ *
  * @pre FIBER_IS_JOINABLE flag is set.
+ * @pre the fiber is not joined yet (panic if not).
  *
  * \param f fiber to be woken up
  * \param timeout time during which we wait for the fiber completion
diff --git a/test/unit/fiber.cc b/test/unit/fiber.cc
index 2e938d6584..103dad797c 100644
--- a/test/unit/fiber.cc
+++ b/test/unit/fiber.cc
@@ -191,6 +191,20 @@ fiber_join_test()
 	fiber_cancel(fiber);
 	fiber_join(fiber);
 
+	note("Can change the joinability in safe cases.");
+	fiber = fiber_new_xc("alive_not_joinable", noop_f);
+	/* Non-joinable not dead fiber.  */
+	fiber_set_joinable(fiber, true);
+	fail_unless((fiber->flags & FIBER_IS_JOINABLE) != 0);
+	/* Joinable not dead and not joined fiber. */
+	fiber_set_joinable(fiber, false);
+	fail_unless((fiber->flags & FIBER_IS_JOINABLE) == 0);
+	/* The same as the first case , just to be sure. */
+	fiber_set_joinable(fiber, true);
+	fail_unless((fiber->flags & FIBER_IS_JOINABLE) != 0);
+	fiber_wakeup(fiber);
+	fiber_join(fiber);
+
 	footer();
 }
 
diff --git a/test/unit/fiber.result b/test/unit/fiber.result
index 24b7883798..9cd48c333d 100644
--- a/test/unit/fiber.result
+++ b/test/unit/fiber.result
@@ -12,6 +12,7 @@ OutOfMemory: Failed to allocate 42 bytes in allocator for exception
 # exception propagated
 # cancel dead has started
 # by this time the fiber should be dead already
+# Can change the joinability in safe cases.
 	*** fiber_join_test: done ***
 	*** fiber_stack_test ***
 # normal-stack fiber not crashed
-- 
GitLab