Skip to content
Snippets Groups Projects
  • Vladimir Davydov's avatar
    b9db91e1
    xlog: fix fallocate vs read race · b9db91e1
    Vladimir Davydov authored
    posix_fallocate(), which is used for preallocating disk space for WAL
    files, increases the file size and fills the allocated space with zeros.
    The problem is a WAL file may be read by a relay thread at the same time
    it is written to. We try to handle the zeroed space in xlog_cursor (see
    xlog_cursor_next_tx()), however this turns out to be not enough, because
    transactions are written not atomically so it may occur that a writer
    writes half a transaction when a reader reads it. Without fallocate, the
    reader would stop at EOF until the rest of the transaction is written,
    but with fallocate it reads zeroes instead and thinks that the xlog file
    is corrupted while actually it is not.
    
    Fix this issue by using fallocate() with FALLOC_FL_KEEP_SIZE flag
    instead of posix_fallocate(). With the flag fallocate() won't increase
    the file size, it will only allocate disk space beyond EOF.
    
    Closes #3883
    b9db91e1
    History
    xlog: fix fallocate vs read race
    Vladimir Davydov authored
    posix_fallocate(), which is used for preallocating disk space for WAL
    files, increases the file size and fills the allocated space with zeros.
    The problem is a WAL file may be read by a relay thread at the same time
    it is written to. We try to handle the zeroed space in xlog_cursor (see
    xlog_cursor_next_tx()), however this turns out to be not enough, because
    transactions are written not atomically so it may occur that a writer
    writes half a transaction when a reader reads it. Without fallocate, the
    reader would stop at EOF until the rest of the transaction is written,
    but with fallocate it reads zeroes instead and thinks that the xlog file
    is corrupted while actually it is not.
    
    Fix this issue by using fallocate() with FALLOC_FL_KEEP_SIZE flag
    instead of posix_fallocate(). With the flag fallocate() won't increase
    the file size, it will only allocate disk space beyond EOF.
    
    Closes #3883