Skip to content
Snippets Groups Projects
  • Vladislav Shpilevoy's avatar
    f9299f24
    limbo: don't clear txn sync flag in case of fail · f9299f24
    Vladislav Shpilevoy authored
    The limbo cleared TXN_WAIT_SYNC and TXN_WAIT_ACK flags for all
    removed entries - succeeded and failed. For succeeded it is fine.
    For failed it was not.
    
    The reason is that a transaction could be rolled back after a
    successful WAL write but before its waiting fiber wakes up. Then
    on wakeup the fiber wouldn't not see TXN_WAIT_SYNC flag and assert
    that the transaction signature >= 0. It wasn't true for txns
    rolled back due to synchro-reasons like a foreign PROMOTE not
    including this transaction.
    
    The patch makes so a failed transaction keeps its TXN_WAIT_SYNC
    flag so as its owner fiber on wakeup would reach
    txn_limbo_wait_complete(), notice the bad signature, and follow
    the rollback-path.
    
    TXN_WAIT_ACK is dropped, because the transaction owner otherwise
    would try to call txn_limbo_ack() for the transaction even if the
    limbo doesn't belong to the instance anymore.
    
    An alternative solution would be to check signature value for all
    transactions even when journal_entry->res is >= 0. But that would
    slow down the common path even for non-synchro transactions.
    
    Closes #6842
    
    NO_DOC=Bugfix
    f9299f24
    History
    limbo: don't clear txn sync flag in case of fail
    Vladislav Shpilevoy authored
    The limbo cleared TXN_WAIT_SYNC and TXN_WAIT_ACK flags for all
    removed entries - succeeded and failed. For succeeded it is fine.
    For failed it was not.
    
    The reason is that a transaction could be rolled back after a
    successful WAL write but before its waiting fiber wakes up. Then
    on wakeup the fiber wouldn't not see TXN_WAIT_SYNC flag and assert
    that the transaction signature >= 0. It wasn't true for txns
    rolled back due to synchro-reasons like a foreign PROMOTE not
    including this transaction.
    
    The patch makes so a failed transaction keeps its TXN_WAIT_SYNC
    flag so as its owner fiber on wakeup would reach
    txn_limbo_wait_complete(), notice the bad signature, and follow
    the rollback-path.
    
    TXN_WAIT_ACK is dropped, because the transaction owner otherwise
    would try to call txn_limbo_ack() for the transaction even if the
    limbo doesn't belong to the instance anymore.
    
    An alternative solution would be to check signature value for all
    transactions even when journal_entry->res is >= 0. But that would
    slow down the common path even for non-synchro transactions.
    
    Closes #6842
    
    NO_DOC=Bugfix