replication: fix bootstrap failing with ER_READONLY
When the master is just starting up it's possible for replica's JOIN request to arrive right in time to bypass ER_LOADING check (after master is fully recovered) but still fail due to ER_READONLY: box.cfg.read_only is only read and set after box_cfg() (its C part) returns. In this case the joining replica simply exits with an error and doesn't retry JOIN. Let's fix that. Make ER_READONLY a recoverable error and let replica retry joining after receiving ER_READONLY. Anonymous nodes relied on ER_READONLY to forbid replication from anonymous to normal replicas. That check doesn't work anymore. So introduce explicit checks banning replication from anonymous nodes. Note, there were some alternatives to this fix. First of all, theoretically, we could stop firing ER_LOADING later, after box_cfg() is complete. This solution wouldn't work because it would lead to deadlocks: the nodes would be stuck in replicaset_sync(), because each of them rejects replication with ER_LOADING. Another solution would be to read the real box.cfg.read_only value earlier, in order to allow replication right after the node finishes recovery. This would also be bad, because we should never let a node become writeable before box.cfg is finished. Even after local_recovery is complete, the node should stay read-only until it synchronizes with other replicas. That said, neither of the two alternatives fit, so the solution with retrying JOIN on ER_READONLY was chosen. Since the bug is fixed, re-enable the test in which it was discovered: replication-py/init_storage.test.py Also, remove replication/ddl.test.lua from fragile list, since this bug was the only reason for its fragility. Closes #5337 Closes #6966 NO_DOC=minor bugfix (cherry picked from commit c1c77782)
Showing
- changelogs/unreleased/gh-6966-readonly-bootstrap.md 3 additions, 0 deletionschangelogs/unreleased/gh-6966-readonly-bootstrap.md
- src/box/applier.cc 3 additions, 1 deletionsrc/box/applier.cc
- src/box/box.cc 11 additions, 0 deletionssrc/box/box.cc
- test/replication-luatest/gh_6966_readonly_bootstrap_test.lua 60 additions, 0 deletionstest/replication-luatest/gh_6966_readonly_bootstrap_test.lua
- test/replication-py/cluster.test.py 8 additions, 7 deletionstest/replication-py/cluster.test.py
- test/replication-py/init_storage.skipcond 0 additions, 2 deletionstest/replication-py/init_storage.skipcond
- test/replication/gh-5613-bootstrap-prefer-booted.result 7 additions, 4 deletionstest/replication/gh-5613-bootstrap-prefer-booted.result
- test/replication/gh-5613-bootstrap-prefer-booted.test.lua 3 additions, 3 deletionstest/replication/gh-5613-bootstrap-prefer-booted.test.lua
- test/replication/suite.ini 0 additions, 3 deletionstest/replication/suite.ini
Loading
Please register or sign in to comment