Skip to content
Snippets Groups Projects
Commit cfdc1d8c authored by Serge Petrenko's avatar Serge Petrenko Committed by Vladimir Davydov
Browse files

replication: fix bootstrap failing with ER_READONLY

When the master is just starting up it's possible for replica's JOIN
request to arrive right in time to bypass ER_LOADING check (after master
is fully recovered) but still fail due to ER_READONLY: box.cfg.read_only
is only read and set after box_cfg() (its C part) returns.

In this case the joining replica simply exits with an error and doesn't
retry JOIN.

Let's fix that. Make ER_READONLY a recoverable error and let replica
retry joining after receiving ER_READONLY.

Anonymous nodes relied on ER_READONLY to forbid replication from
anonymous to normal replicas. That check doesn't work anymore.
So introduce explicit checks banning replication from anonymous nodes.

Note, there were some alternatives to this fix.

First of all, theoretically, we could stop firing ER_LOADING later,
after box_cfg() is complete. This solution wouldn't work because it
would lead to deadlocks: the nodes would be stuck in replicaset_sync(),
because each of them rejects replication with ER_LOADING.

Another solution would be to read the real box.cfg.read_only value
earlier, in order to allow replication right after the node finishes
recovery.
This would also be bad, because we should never let a node become
writeable before box.cfg is finished. Even after local_recovery is
complete, the node should stay read-only until it synchronizes with
other replicas.

That said, neither of the two alternatives fit, so the solution with
retrying JOIN on ER_READONLY was chosen.

Since the bug is fixed, re-enable the test in which it was discovered:
replication-py/init_storage.test.py

Also, remove replication/ddl.test.lua from fragile list, since this bug
was the only reason for its fragility.

Closes #5337
Closes #6966

NO_DOC=minor bugfix

(cherry picked from commit c1c77782)
parent bdc027a4
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment