An error occurred while fetching folder content.
Vladislav Shpilevoy
authored
When a replica subscribes, it might in the beginning try to position its reader cursor to the end of a large xlog file. Positioning inside of this file can take significant time during which the WAL reader yielded and tried to send heartbeats, but couldn't, because the relay thread wasn't communicating with the TX thread. When there are no messages from TX for too long time, the heartbeats to the replica are not being sent (commit 56571d83 ("raft: make followers notice leader hang")). The relay must communicate with the TX thread even when subscribe is just being started and opens a large xlog file. This isn't the first time when the missing heartbeats result into timeouts. See more here: - commit 30ad4a55 ("relay: yield explicitly every N sent rows"). - commit 17289440 ("recovery: make it yield when positioning in a WAL"). - commit ee6de025 ("relay: send heartbeats while reading a WAL"). Given that this is fixed fourth time, it might suggest that the relay has not the best architecture having some slight drawbacks. See more in #9968. Closes #9094 NO_DOC=bugfix
Name | Last commit | Last update |
---|