Skip to content

Feature: Failover: automatic master election in replicasets of 2 replicas

Picodata should provide high availability with replication_factor=2, which is not currently possible with the Tarantool's built-in leader election feature. This should be possible by using async replication either for the lifetime of the cluster or by downgrading a replicaset replication from sync to async when a replica of the replicaset becomes unavailable.

DoD

  • Given a replicaset of 2 replicas, exactly one of the instances should be writable at any moment. Which instance is writable is determined and set automatically.
  • If a writable instance becomes unavailable, then another instance of the replicaset should become writable in less than 15 seconds.
  • Fencing. When an instance becomes unable to receive the latest state of the global config from the Raft cluster then the instance should become read-only before another instance becomes writable. This is required to minimize lost writes and async replication conflicts.
  • When one of the instances of a replicaset of 2 instances go down and then up then the replicaset's write availability should not be interrupted for more than 15 seconds.

Notes

  • It is a client's responsibility to auto-reconnect to the active instance.
  • It is OK for the implementation to require a cluster of more than one replicaset to make this feature work.

Motivation

Most clients don't want or don't have resources for 3 replicas per replicaset. The preferred option in most cases is a classic topology of 2 replicas per replicaset: active + standby.

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information