An error occurred while fetching folder content.
Serge Petrenko
authored
A successfully fetched remote instance ballot isn't updated during bootstrap procedure. This leads to a case when different instances choose different masters as their bootstrap leaders. Imagine such a situation. You start instance A without replication set up. Instance A successfully bootstraps. You also have instances B and C both with replication set up to {A, B, C} and replication_connect_quorum set to 3 You first start instance B. It doesn't proceed to choosing a leader until one of the events happens: either the replication_connect_timeout runs out, or instance C is up and starts listening on its port. B has established connection to A and fetched its ballot, with some vclock, say, {1: 1}. B retries connection to C every replication_timeout seconds. Then you start instance C. Instance C succeeds in connecting to A and B right away and bootstraps from instance A. Instance A registers C in its _cluster table. This registration is replicated to instance C. Meanwhile, instance C is trying to sync with quorum instances (which is 3), and stays in orphan mode. Now replication_timeout on instance B finally runs out. It retries a previously unsuccessful connection to C and succeeds. C sends its ballot to B with vclock = {1: 2, 2:0} (in our example), since it has already incremented it after _cluster registration. B sees that C has a greater vclock than A, and chooses to bootstrap from C instead of A. C is orphan and rejects B's attempt to join. B dies. To fix such ungentlemanlike behaviour of C, we should at least include loading status in ballot and prefer fully bootstrapped instances to the ones still syncing with other replicas. We also need to use a separate flag instead of ballot's already existent is_ro, since we still want to prefer loading instances over the ones explicitly configured to be read-only. Closes #4527
Name | Last commit | Last update |
---|