Skip to content
Snippets Groups Projects
Commit 421a0968 authored by Serge Petrenko's avatar Serge Petrenko Committed by Kirill Yukhin
Browse files

raft: prevent disruptive elections

Raft cluster members only treat the leader as alive as long as they
preserve a direct connection to it. This means that as soon as one of
the followers loses leader connection, it will start new elections
bumping the term and forcing the existing leader to step off.

Moreover, when a server is partitioned from the rest of the cluster, it
will constantly rearrange elections by bumping the term and voting for
itself. It will never get any votes, since it is partitioned, but it
will increase its term to big numbers. Once such a server reconnects to
the rest of the cluster, it will make everyone read-only, because the
working majority will probably have a smaller term than this server.

Such elections, performed while there is a working leader, like in the
first example, or performed when there's no chance of winning, like in
the second example, are considered disruptive.

Diego Ongaro's thesis (https://github.com/ongardie/dissertation)
proposes the following solution to the problem of disruptive servers.

It's an additional election phase, called Pre-Vote.
The basic idea is as follows: only bump the term and start elections
once the quorum of peers reply OK to a Pre-Vote message.
The peers would reply OK when:
  - the candidate's log is sufficiently up to date
  - the peer doesn't see the leader

The downside of such approach is an additional request travelling to and
from the followers.

Tarantool replication architecture allows us to achieve the same goals
without issuing Pre-Vote requests. Here's what's done in this patch to
do so:
    1) Track number of live peer connections, and only start elections
       when there is a quorum of connected peers.
    2) Make every node broadcast whether it sees the leader of the
       current term or not. Every candidate collects info about who's
       seen as a leader in the current term. When at least one node sees
       a leader, the candidate doesn't start new elections.

Closes #6654

@TarantoolBot document
Title: elections: new `box.info` field and binary protocol changes

* `box.info.election` got a new field: `leader_idle`.
  When elections are enabled, it shows time in seconds since the last
  interaction with the known leader.

* `IPROTO_RAFT` request got 2 new fields:
  `IPROTO_RAFT_LEADER_ID = 0x04` - uint - the id of the current leader
  (as seen by the node which issued the request).
  `IPROTO_RAFT_IS_LEADER_SEEN = 0x05` - bool - whether the node has a
  direct connection to the leader.
parent ba19ceaa
No related branches found
No related tags found
Loading
Showing
with 333 additions and 15 deletions
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment