From 44401529a2253cbf00583a3680ae01f0cb74693c Mon Sep 17 00:00:00 2001 From: Magomed Kostoev <m.kostoev@tarantool.org> Date: Thu, 26 Oct 2023 16:35:22 +0300 Subject: [PATCH] fiber: make the concurrent fiber_join safer Prior to this patch a bunch of illegal conditions was possible: 1. The joinability of a fiber could be changed while the fiber is being joined by someone. This could lead to double recycling: the first one happened on the fiber finish, and the second one in the fiber join. 2. The joinability of a dead joinable fiber could be altered, this led to inability jo join the dead fiber and free its resources. 3. A running fiber could be joined concurrently by two or more fibers, so the fiber could be recycled more than once (once per each concurrent join). 4. A dead recycled fiber could be made joinable and joined leading to the double recycle. Fixed these issues by adding a new FIBER_JOIN_BEEN_INVOKED flag: now the `fiber_set_joinable` and `fiber_join_timeout` functions detect the double join. Because of the API limitations both of them panic when an invalid condition is met: - The `fiber_set_joinable` was not designed to report errors. - The `fiber_join_timeout` can't raise any error unless a timeout is met, because the `fiber_join` users don't expect to receive any error from this function at all (except the one generated by the joined fiber). It's still possible that a fiber join is performed on a struct which has been recycled and, if the new fiber is joinable too, this can't be detected. The current fiber API does not allow to fix this, so this is to be the user's responsibility, they should be warned about the fact the double join to the same fiber is illegal. Closes #7562 @TarantoolBot document Title: `fiber_join`, `fiber_join_timeout` and `fiber_set_joinable` behave differently now. `fiber_join` and `fiber_join_timeout` now panic in case if double join of the given fiber is detected. `fiber_set_joinable` now panics if the given fiber is dead or is joined already. This prevents some amount of error conditions that could happen when using the API in an unexpected way, including: - Making a dead joinable fiber non-joinable could lead to a memory leak: one can't join the fiber anymore. - Making a dead joinable fiber joinable again is a sign of attempt to join the fiber later. That means the fiber struct may be joined later, when it's been recycled and reused. This could lead to a very hard to debug double join. - Making an alive joined fiber non-joinable would lead to the double free: once on the fiber function finish, and secondly in the active fiber join finish. Risks of making it joinable are described above. - Making a dead and recycled fiber joinable allowed to join the fiber once again leading to a double free. Any given by the API `struct fiber` should only be joined once. If a fiber is joined after the first join on it has finished the behavior is undefined: it can either be a panic or an incidental join to a totally foreign fiber. --- changelogs/unreleased/gh-7562-fiber_join.md | 6 ++ src/lib/core/fiber.c | 61 +++++++++++++++++++++ src/lib/core/fiber.h | 16 +++++- test/unit/fiber.cc | 14 +++++ test/unit/fiber.result | 1 + 5 files changed, 97 insertions(+), 1 deletion(-) create mode 100644 changelogs/unreleased/gh-7562-fiber_join.md diff --git a/changelogs/unreleased/gh-7562-fiber_join.md b/changelogs/unreleased/gh-7562-fiber_join.md new file mode 100644 index 0000000000..f951843e1c --- /dev/null +++ b/changelogs/unreleased/gh-7562-fiber_join.md @@ -0,0 +1,6 @@ +## bugfix/core + +* Fixed a possible inconsistent state entering if fibers are joined incorrectly. + Now the `fiber_set_joinable` function panics if the fiber is dead or joined + already. The `fiber_join` and `fiber_join_timeout` functions now panic on a + double join if it is possible to detect it (gh-7562). diff --git a/src/lib/core/fiber.c b/src/lib/core/fiber.c index 362afc06e8..5fa2be2126 100644 --- a/src/lib/core/fiber.c +++ b/src/lib/core/fiber.c @@ -664,6 +664,56 @@ fiber_set_joinable(struct fiber *fiber, bool yesno) { if (fiber == NULL) fiber = fiber(); + + /* + * The C fiber API does not allow to report any error to the caller. At + * the same time using the function in some conditions is unsafe. So in + * case if such a condition is met the function panics. + * + * Here're the conditions the function may be called in: + * 1. The fiber is not joinable and hasn't finished yet. We're good to + * change the fiber's joinability. + * 2. The fiber is not joinable and is finished. That means the fiber + * is recycled already, so this is a use after free. This case is + * impossible from the Lua world, but can be done with C API, so it + * is to be prohibited. + * 3. The fiber is joinable, not finished and not joined. We're good to + * change its joinability. + * 4. The fiber is joinable, not finished, but joined already. Making + * the fiber non-joinable will lead to a double free: one in the + * fiber_loop, the second one in the active fiber_join_timeout. + * Making it joinable is a sign of attempt to join it later, so we + * can expect the second join in the future - detect it now, while + * we can do it, because in the future the fiber struct may be + * reused and it will be much more difficult to find the root cause + * of the double join (or we won't be able to detect it at all). + * 5. The fiber is joinable, finished and not joined. The fiber should + * be joined in order to be recycled, so making it non-joinable can + * lead to a fiber leak and to be prohibited. The point about making + * it joinable from the statement #4 is applied here too. + * 6. The fiber is joinable, finished and joined. In order to avoid + * introducing a new fiber flag, a joined fiber becomes non-joinable + * on death (see the fiber_join_timeout implementation), so this + * condition transforms into #2. And it's a use after free too. + * + * Simplifying: + * OK: + * 1: !is_joinable && !is_dead + * 3: is_joinable && !is_dead && !is_joined + * NOT OK: + * 2: !is_joinable && is_dead + * 4: is_joinable && !is_dead && is_joined + * 5: is_joinable && is_dead && !is_joined + * 6. is_joinable && is_dead && is_joined + * + * Othervice, OK: !is_dead && (!is_joinable || !is_joined) + * So NOT OK: is_dead || (is_joinable && is_joined) + * So NOT OK: is_dead || is_joined, because joined implies joinable. + */ + if ((fiber->flags & FIBER_IS_DEAD) != 0 || + (fiber->flags & FIBER_JOIN_BEEN_INVOKED) != 0) + panic("%s on a dead or joined fiber detected", __func__); + if (yesno == true) fiber->flags |= FIBER_IS_JOINABLE; else @@ -732,6 +782,12 @@ fiber_join_timeout(struct fiber *fiber, double timeout) if ((fiber->flags & FIBER_IS_JOINABLE) == 0) panic("the fiber is not joinable"); + if ((fiber->flags & FIBER_JOIN_BEEN_INVOKED) != 0) + panic("join of a joined fiber detected"); + + /* Prohibit joining the fiber and changing its joinability. */ + fiber->flags |= FIBER_JOIN_BEEN_INVOKED; + if (!fiber_is_dead(fiber)) { double deadline = fiber_clock() + timeout; while (!fiber_wait_on_deadline(fiber, deadline) && @@ -752,6 +808,11 @@ fiber_join_timeout(struct fiber *fiber, double timeout) } assert((fiber->flags & FIBER_IS_RUNNING) == 0); assert((fiber->flags & FIBER_IS_JOINABLE) != 0); + /* + * This line is here just to make the following statement true: if + * a fiber is dead and is not joinable, that means it's recycled. + * So we don't have to introduce a new FIBER_IS_RECYCLED flag. + */ fiber->flags &= ~FIBER_IS_JOINABLE; /* Move exception to the caller */ diff --git a/src/lib/core/fiber.h b/src/lib/core/fiber.h index 1059af3483..104c21b1c2 100644 --- a/src/lib/core/fiber.h +++ b/src/lib/core/fiber.h @@ -170,6 +170,11 @@ enum { * This flag idicates, if the fiber can be killed from the Lua world. */ FIBER_IS_SYSTEM = 1 << 8, + /** + * Someone has joined the fiber already. So this fiber can't be joined + * once again nor can its joinability be changed. + */ + FIBER_JOIN_BEEN_INVOKED = 1 << 9, FIBER_DEFAULT_FLAGS = 0 }; @@ -346,6 +351,11 @@ fiber_set_cancellable(bool yesno); /** * Set fiber to be joinable (false by default). + * The fiber must not be joined already nor dead. + * + * @pre the fiber is not dead (panic if not). + * @pre the fiber is not joined yet (panic if not). + * * \param fiber to (un)set the joinable property. * If set to NULL, the current fiber is used. * \param yesno status to set @@ -357,7 +367,9 @@ fiber_set_joinable(struct fiber *fiber, bool yesno); * Wait until the fiber is dead and then move its execution * status to the caller. * The fiber must not be detached (@sa fiber_set_joinable()). - * @pre FIBER_IS_JOINABLE flag is set. + * + * @pre FIBER_IS_JOINABLE flag is set (panic if not). + * @pre the fiber is not joined yet (panic if not). * * \param f fiber to be woken up * \return fiber function ret code @@ -372,7 +384,9 @@ fiber_join(struct fiber *f); * Return fiber execution status to the caller or -1 * if timeout exceeded and set diag. * The fiber must not be detached @sa fiber_set_joinable() + * * @pre FIBER_IS_JOINABLE flag is set. + * @pre the fiber is not joined yet (panic if not). * * \param f fiber to be woken up * \param timeout time during which we wait for the fiber completion diff --git a/test/unit/fiber.cc b/test/unit/fiber.cc index 2e938d6584..103dad797c 100644 --- a/test/unit/fiber.cc +++ b/test/unit/fiber.cc @@ -191,6 +191,20 @@ fiber_join_test() fiber_cancel(fiber); fiber_join(fiber); + note("Can change the joinability in safe cases."); + fiber = fiber_new_xc("alive_not_joinable", noop_f); + /* Non-joinable not dead fiber. */ + fiber_set_joinable(fiber, true); + fail_unless((fiber->flags & FIBER_IS_JOINABLE) != 0); + /* Joinable not dead and not joined fiber. */ + fiber_set_joinable(fiber, false); + fail_unless((fiber->flags & FIBER_IS_JOINABLE) == 0); + /* The same as the first case , just to be sure. */ + fiber_set_joinable(fiber, true); + fail_unless((fiber->flags & FIBER_IS_JOINABLE) != 0); + fiber_wakeup(fiber); + fiber_join(fiber); + footer(); } diff --git a/test/unit/fiber.result b/test/unit/fiber.result index 24b7883798..9cd48c333d 100644 --- a/test/unit/fiber.result +++ b/test/unit/fiber.result @@ -12,6 +12,7 @@ OutOfMemory: Failed to allocate 42 bytes in allocator for exception # exception propagated # cancel dead has started # by this time the fiber should be dead already +# Can change the joinability in safe cases. *** fiber_join_test: done *** *** fiber_stack_test *** # normal-stack fiber not crashed -- GitLab