Skip to content
Snippets Groups Projects
  • Konstantin Osipov's avatar
    039332da
    recovery: remove dead code which was chasing ghosts (attempt #2) · 039332da
    Konstantin Osipov authored
    Don't try to rename an .inprogrss file in recovery_finalize(): if
    recovery_finalize() can't open it, then xdir_scan() must have
    failed to open it as well, then the file can't end up in the index
    in the first place.
    
    Don't try to re-read a file 3 times (good number, eh), since this
    looks like chasing a ghost. If the original cause for these
    lines of code (lost) is still relevant, then a different fix will
    be necessary.
    
    The original code was apparently trying to cope with a potential
    race condition between local hot standby/replication relay and WAL
    writer, where WAL writer thread first creates a new WAL file, and
    only aft that closes the previous one, while local hot standby
    scans the directory, and then reads all files which it finds in
    the directory.
    
    Apparently, the author of the code believed that some data in the
    file may not be visible to the reader process after it was written
    by the writer process, and the reader could skip this data and
    switch to the next file. The switch can not be postponed, since,
    when reading the file, it's impossible to know whether it's an old
    corrupted file or a new one.
    
    Remove the code, since, according to Linux man page:
    
    POSIX requires that a read(2) which can be proved to occur after a
    write() has returned returns the new data. Note that not all
    filesystems are POSIX conforming.
    
    POSIX says:
    
    Writes can be serialized with respect to other reads and writes.
    If a read of file data can be proven (by any means) to occur after
    a write() of the data, it must reflect that write(), even if the
    calls are made by differen processes. A similar requirement
    applies to multiple write operations to t same file position. This
    is needed to guarantee the propagation of data fr write() calls to
    subsequent read() calls. This requirement is particularly
    significant for networked file systems, where some caching schemes
    violate these semantics.
    
    Note that this is specified in terms of read() and write(). The
    XSI extensions readv() and writev() also obey these semantics. A
    new "high-performance" write analog that did not follow these
    serialization requirements would also be permitted by this
    wording. This volume of IEEE 1003.1-2001 is also silent about any
    effects of application-level caching (such as that done by stdio).
    
    Reorder the events in wal_opt_rotate(): first close the old file,
    and only then open a new one.
    
    Fix a bug in replication that didn't take into account wal_dir_rescan_delay.
    This fix revealed a number of serious races in recovery_follow_f,
    which will be addressed separately.
    039332da
    History
    recovery: remove dead code which was chasing ghosts (attempt #2)
    Konstantin Osipov authored
    Don't try to rename an .inprogrss file in recovery_finalize(): if
    recovery_finalize() can't open it, then xdir_scan() must have
    failed to open it as well, then the file can't end up in the index
    in the first place.
    
    Don't try to re-read a file 3 times (good number, eh), since this
    looks like chasing a ghost. If the original cause for these
    lines of code (lost) is still relevant, then a different fix will
    be necessary.
    
    The original code was apparently trying to cope with a potential
    race condition between local hot standby/replication relay and WAL
    writer, where WAL writer thread first creates a new WAL file, and
    only aft that closes the previous one, while local hot standby
    scans the directory, and then reads all files which it finds in
    the directory.
    
    Apparently, the author of the code believed that some data in the
    file may not be visible to the reader process after it was written
    by the writer process, and the reader could skip this data and
    switch to the next file. The switch can not be postponed, since,
    when reading the file, it's impossible to know whether it's an old
    corrupted file or a new one.
    
    Remove the code, since, according to Linux man page:
    
    POSIX requires that a read(2) which can be proved to occur after a
    write() has returned returns the new data. Note that not all
    filesystems are POSIX conforming.
    
    POSIX says:
    
    Writes can be serialized with respect to other reads and writes.
    If a read of file data can be proven (by any means) to occur after
    a write() of the data, it must reflect that write(), even if the
    calls are made by differen processes. A similar requirement
    applies to multiple write operations to t same file position. This
    is needed to guarantee the propagation of data fr write() calls to
    subsequent read() calls. This requirement is particularly
    significant for networked file systems, where some caching schemes
    violate these semantics.
    
    Note that this is specified in terms of read() and write(). The
    XSI extensions readv() and writev() also obey these semantics. A
    new "high-performance" write analog that did not follow these
    serialization requirements would also be permitted by this
    wording. This volume of IEEE 1003.1-2001 is also silent about any
    effects of application-level caching (such as that done by stdio).
    
    Reorder the events in wal_opt_rotate(): first close the old file,
    and only then open a new one.
    
    Fix a bug in replication that didn't take into account wal_dir_rescan_delay.
    This fix revealed a number of serious races in recovery_follow_f,
    which will be addressed separately.
fiber.h 9.88 KiB