Skip to content
Snippets Groups Projects
Commit f50f0b29 authored by Vladimir Davydov's avatar Vladimir Davydov
Browse files

box: use replicaset.vclock in replica join/subscribe

Again, this is something that was introduced by commit f2bccc18
("Use WAL vclock instead of TX vclock in most places") without any
justification.

TX has its own copy of the current vclock - there's absolutely no need
to inquire it from the WAL thread. Actually, we already use TX local
vclock in box_process_vote(). No reason to treat join/subscribe any
different. Moreover, it's even harmful - there may be a gap at the end
of a WAL file, in which case WAL vclock will be slightly ahead of TX
vclock so that should a replica try to subscribe it would never finish
syncing, see #3830.

Closes #3830
parent 7439529d
No related branches found
No related tags found
No related merge requests found
...@@ -1513,7 +1513,7 @@ box_process_join(struct ev_io *io, struct xrow_header *header) ...@@ -1513,7 +1513,7 @@ box_process_join(struct ev_io *io, struct xrow_header *header)
/* Remember master's vclock after the last request */ /* Remember master's vclock after the last request */
struct vclock stop_vclock; struct vclock stop_vclock;
wal_checkpoint(&stop_vclock, false); vclock_copy(&stop_vclock, &replicaset.vclock);
/* /*
* Register the replica as a WAL consumer so that * Register the replica as a WAL consumer so that
...@@ -1540,9 +1540,7 @@ box_process_join(struct ev_io *io, struct xrow_header *header) ...@@ -1540,9 +1540,7 @@ box_process_join(struct ev_io *io, struct xrow_header *header)
say_info("final data sent."); say_info("final data sent.");
/* Send end of WAL stream marker */ /* Send end of WAL stream marker */
struct vclock current_vclock; xrow_encode_vclock_xc(&row, &replicaset.vclock);
wal_checkpoint(&current_vclock, false);
xrow_encode_vclock_xc(&row, &current_vclock);
row.sync = header->sync; row.sync = header->sync;
coio_write_xrow(io, &row); coio_write_xrow(io, &row);
} }
...@@ -1608,9 +1606,7 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header) ...@@ -1608,9 +1606,7 @@ box_process_subscribe(struct ev_io *io, struct xrow_header *header)
* and identify ourselves with our own replica id. * and identify ourselves with our own replica id.
*/ */
struct xrow_header row; struct xrow_header row;
struct vclock current_vclock; xrow_encode_vclock_xc(&row, &replicaset.vclock);
wal_checkpoint(&current_vclock, false);
xrow_encode_vclock_xc(&row, &current_vclock);
/* /*
* Identify the message with the replica id of this * Identify the message with the replica id of this
* instance, this is the only way for a replica to find * instance, this is the only way for a replica to find
......
...@@ -311,6 +311,43 @@ test_run:cmd("stop server replica") ...@@ -311,6 +311,43 @@ test_run:cmd("stop server replica")
--- ---
- true - true
... ...
-- gh-3830: Sync fails if there's a gap at the end of the master's WAL.
box.error.injection.set('ERRINJ_WAL_WRITE_DISK', true)
---
- ok
...
box.space.test:replace{123456789}
---
- error: Failed to write to disk
...
box.error.injection.set('ERRINJ_WAL_WRITE_DISK', false)
---
- ok
...
test_run:cmd("start server replica")
---
- true
...
test_run:cmd("switch replica")
---
- true
...
box.info.status -- running
---
- running
...
box.info.ro -- false
---
- false
...
test_run:cmd("switch default")
---
- true
...
test_run:cmd("stop server replica")
---
- true
...
test_run:cmd("cleanup server replica") test_run:cmd("cleanup server replica")
--- ---
- true - true
......
...@@ -156,6 +156,18 @@ box.info.ro -- false ...@@ -156,6 +156,18 @@ box.info.ro -- false
box.info.replication[1].upstream.status -- follow box.info.replication[1].upstream.status -- follow
test_run:grep_log('replica', 'ER_CFG.*') test_run:grep_log('replica', 'ER_CFG.*')
test_run:cmd("switch default")
test_run:cmd("stop server replica")
-- gh-3830: Sync fails if there's a gap at the end of the master's WAL.
box.error.injection.set('ERRINJ_WAL_WRITE_DISK', true)
box.space.test:replace{123456789}
box.error.injection.set('ERRINJ_WAL_WRITE_DISK', false)
test_run:cmd("start server replica")
test_run:cmd("switch replica")
box.info.status -- running
box.info.ro -- false
test_run:cmd("switch default") test_run:cmd("switch default")
test_run:cmd("stop server replica") test_run:cmd("stop server replica")
test_run:cmd("cleanup server replica") test_run:cmd("cleanup server replica")
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment