Skip to content
Snippets Groups Projects
Commit 3daeb759 authored by Yaroslav Dynnikov's avatar Yaroslav Dynnikov Committed by Georgy Moshkin
Browse files

fix: no join requests retrying

Join requests should be made without timeout restrictions. Otherways
it's impossible to tell retried request from instance_id collision.
Retried requests never succeed and return the "already joined" error.
parent cb488fe6
No related branches found
No related tags found
1 merge request!179test: deploy a cluster of 60 instances
......@@ -529,8 +529,24 @@ fn start_join(args: &args::Run, leader_address: String) {
failure_domains: args.failure_domains(),
};
// Arch memo.
// - There must be no timeouts. Retrying may lead to flooding the
// topology with phantom instances. No worry, specifying a
// particular `instance_id` for every instance protects from that
// flood.
// - It's fine to retry "connection refused" errors.
// - TODO renew leader_address if the current one says it's not a
// leader.
let fn_name = stringify_cfunc!(traft::node::raft_join);
let resp: traft::JoinResponse = tarantool::net_box_call_retry(&leader_address, fn_name, &req);
let resp: traft::JoinResponse = loop {
match tarantool::net_box_call_or_log(&leader_address, fn_name, &req, Duration::MAX) {
Some(resp) => break resp,
None => {
fiber::sleep(Duration::from_secs(1));
continue;
}
}
};
picolib_setup(args);
assert!(tarantool::cfg().is_none());
......
......@@ -197,6 +197,7 @@ where
tuple.into_struct::<((Res,),)>().map(|res| res.0 .0)
}
#[allow(dead_code)]
pub fn net_box_call_retry<Args, Res, Addr>(address: Addr, fn_name: &str, args: &Args) -> Res
where
Args: AsTuple,
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment