Skip to content
Snippets Groups Projects
Commit b5811f15 authored by Serge Petrenko's avatar Serge Petrenko Committed by Kirill Yukhin
Browse files

replication: relax split-brain checks after DEMOTE

Our txn_limbo_is_replica_outdated check works correctly only when there
is a stream of PROMOTE requests. Only the author of the latest PROMOTE
is writable and may issue transactions. No matter synchronous or
asynchronous.

So txn_limbo_is_replica_outdated assumes that everyone but the node with
the greatest PROMOTE/DEMOTE term is outdated.

This isn't true for DEMOTE requests. There is only one server which
issues the DEMOTE request, but once it's written, it's fine to accept
asynchronous transactions from everyone.

Now the check is too strict. Every time there is an asynchronous
transaction from someone, who isn't the author of the latest PROMOTE or
DEMOTE, replication is broken with ER_SPLIT_BRAIN.

Let's relax it: when limbo owner is 0, it's fine to accept asynchronous
transactions from everyone, no matter the term of their latest PROMOTE
and DEMOTE.

This means that now after a DEMOTE we will miss one case of true
split-brain: when old leader continues writing data in an obsolete term,
and the new leader first issues PROMOTE and then DEMOTE.

This is a tradeoff for making async master-master work after DEMOTE.

The completely correct fix would be to write the term the transaction
was written in with each transaction and replace
txn_limbo_is_replica_outdated with txn_limbo_is_request_outdated, so
that we decide whether to filter the request or not judging by the term
it was applied in, not by the term we seen in some past PROMOTE from the
node. This fix seems too costy though, given that we only miss one case
of split-brain at the moment when the user enables master-master
replication (by writing a DEMOTE). And in master-master there is no such
thing as a split-brain.

Follow-up #5295
Closes #7286

NO_DOC=internal chcange
parent 58f0e23d
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment