src/fiber.h · 039332da32b49fc29db3b7a25fa2c5503554b4f8 · core / tarantool

10 years ago

recovery: remove dead code which was chasing ghosts (attempt #2 ) · 039332da

Don't try to rename an .inprogrss file in recovery_finalize(): if
recovery_finalize() can't open it, then xdir_scan() must have
failed to open it as well, then the file can't end up in the index
in the first place.

Don't try to re-read a file 3 times (good number, eh), since this
looks like chasing a ghost. If the original cause for these
lines of code (lost) is still relevant, then a different fix will
be necessary.

The original code was apparently trying to cope with a potential
race condition between local hot standby/replication relay and WAL
writer, where WAL writer thread first creates a new WAL file, and
only aft that closes the previous one, while local hot standby
scans the directory, and then reads all files which it finds in
the directory.

Apparently, the author of the code believed that some data in the
file may not be visible to the reader process after it was written
by the writer process, and the reader could skip this data and
switch to the next file. The switch can not be postponed, since,
when reading the file, it's impossible to know whether it's an old
corrupted file or a new one.

Remove the code, since, according to Linux man page:

POSIX requires that a read(2) which can be proved to occur after a
write() has returned returns the new data. Note that not all
filesystems are POSIX conforming.

POSIX says:

Writes can be serialized with respect to other reads and writes.
If a read of file data can be proven (by any means) to occur after
a write() of the data, it must reflect that write(), even if the
calls are made by differen processes. A similar requirement
applies to multiple write operations to t same file position. This
is needed to guarantee the propagation of data fr write() calls to
subsequent read() calls. This requirement is particularly
significant for networked file systems, where some caching schemes
violate these semantics.

Note that this is specified in terms of read() and write(). The
XSI extensions readv() and writev() also obey these semantics. A
new "high-performance" write analog that did not follow these
serialization requirements would also be permitted by this
wording. This volume of IEEE 1003.1-2001 is also silent about any
effects of application-level caching (such as that done by stdio).

Reorder the events in wal_opt_rotate(): first close the old file,
and only then open a new one.

Fix a bug in replication that didn't take into account wal_dir_rescan_delay.
This fix revealed a number of serious races in recovery_follow_f,
which will be addressed separately.

039332da

History

recovery: remove dead code which was chasing ghosts (attempt #2 )

Konstantin Osipov authored 10 years ago

Don't try to rename an .inprogrss file in recovery_finalize(): if
recovery_finalize() can't open it, then xdir_scan() must have
failed to open it as well, then the file can't end up in the index
in the first place.

Don't try to re-read a file 3 times (good number, eh), since this
looks like chasing a ghost. If the original cause for these
lines of code (lost) is still relevant, then a different fix will
be necessary.

The original code was apparently trying to cope with a potential
race condition between local hot standby/replication relay and WAL
writer, where WAL writer thread first creates a new WAL file, and
only aft that closes the previous one, while local hot standby
scans the directory, and then reads all files which it finds in
the directory.

Apparently, the author of the code believed that some data in the
file may not be visible to the reader process after it was written
by the writer process, and the reader could skip this data and
switch to the next file. The switch can not be postponed, since,
when reading the file, it's impossible to know whether it's an old
corrupted file or a new one.

Remove the code, since, according to Linux man page:

POSIX requires that a read(2) which can be proved to occur after a
write() has returned returns the new data. Note that not all
filesystems are POSIX conforming.

POSIX says:

Writes can be serialized with respect to other reads and writes.
If a read of file data can be proven (by any means) to occur after
a write() of the data, it must reflect that write(), even if the
calls are made by differen processes. A similar requirement
applies to multiple write operations to t same file position. This
is needed to guarantee the propagation of data fr write() calls to
subsequent read() calls. This requirement is particularly
significant for networked file systems, where some caching schemes
violate these semantics.

Note that this is specified in terms of read() and write(). The
XSI extensions readv() and writev() also obey these semantics. A
new "high-performance" write analog that did not follow these
serialization requirements would also be permitted by this
wording. This volume of IEEE 1003.1-2001 is also silent about any
effects of application-level caching (such as that done by stdio).

Reorder the events in wal_opt_rotate(): first close the old file,
and only then open a new one.

Fix a bug in replication that didn't take into account wal_dir_rescan_delay.
This fix revealed a number of serious races in recovery_follow_f,
which will be addressed separately.

fiber.h 9.88 KiB