Commit 993f0f9a authored 2 years ago by Vladislav Shpilevoy

limbo: assign LSN to entries right after WAL write

LSN to a limbo entry used to be assigned either before a WAL write
when the txn was from an applier, or one event loop iteration
after a WAL write for instance's own transactions.

The latter means that the WAl write callback only did
fiber_wakeup() on a fiber which is supposed to assign the LSN
later.

That made possible a bug when a remote PROMOTE was received during
a local txn WAL write. They were written to WAL in that order. But
PROMOTE WAL write was handled *before* the txn's WAL write. That
led to a crash, because PROMOTE processing wasn't ready to the
limbo having entries without an LSN.

The reason why it happened was that the finished WAL batches, even
if are sent to WAL far from each other in time, still can be
processed in TX thread all at once, without yields. They just keep
stacking in an inter-thread queue until TX thread picks them up.
If TX thread is slow enough, the WAL batches will form bigger
"batches of batches" in this WAL->TX queue.

When it happened for a txn + PROMOTE, the txn WAL write trigger
only called fiber_wakeup() without LSN assign. The PROMOTE WAL
write trigger was applied right away without yields, didn't meet
its assumptions, and crashed.

The patch makes an LSN be assigned to a limbo entry right in WAL
write trigger. The cost of this is to store limbo entry as a
member in struct txn.

The patch only fixes the issue for PROMOTE covering the older
transaction. The not covering case is still failing, subject of
another commit.

A side effect which allowed to make the patch a bit simpler - LSN
is assigned to all limbo entries now, even to the non-synchro
ones.

The alternative solution was to create an on WAL write trigger
for each synchro transaction and store the limbo entry in its data
field. But that is more complicated. I decided it is time to add
the entry to the txn. For non-synchro transactions it shouldn't
add any cost, because the limbo entry is only touched for newly
allocated txns (which eventually will stop being allocated and
will only be reused from txn_cache); for synchro transactions on
their normal path; for all transactions on failure path.

Another alternative solution was to make limbo's latch a
read-write latch. "Readers" would be new transactions until they
finish WAL write, "writers" would be PROMOTE, DEMOTE, probably
ROLLBACK. That way a PROMOTE wouldn't start until all limbo
entries have LSNs. But that looks like an overkill. At least for
this issue.

Part of #6842

NO_DOC=Bugfix
NO_CHANGELOG=To be added later

parent 5370882f

No related branches found

No related tags found

Hide whitespace changes

Inline Side-by-side

Showing with 173 additions and 73 deletions

Please register or to comment