Commit de93b448 authored 4 years ago by Serge Petrenko Committed by Kirill Yukhin 4 years ago

wal: introduce limits on simultaneous writes

Since the introduction of asynchronous commit, which doesn't wait for a
WAL write to succeed, it's quite easy to clog WAL with huge amounts
write requests. For now, it's only possible from an applier, since it's
the only user of async commit at the moment.

This happens when replica is syncing with master and reads new
transactions at a pace higher than it can write them to WAL (see docbot
request for detailed explanation).

To ameliorate such behavior, we need to introduce some limit on
not-yet-finished WAL write requests. This is what this commit is trying
to do.
A new counter is added to wal writer: queue_size (in bytes) together with a
corresponding configuration setting: `wal_queue_max_size`.
The counter is increased on every new submitted request, and decreased once
the tx thread receives a confirmation that a specific request was written.

Actually, the limit is added to an abstract journal queue, but
currently works only for wal writer, since it's the only possible journal
when applier is working.

Once size reaches its maximum value, applier is blocked until
some of the write requests are finished.

The size limit isn't strict, i.e. if there's at least one free byte, the
whole write request fits and no blocking is involved.

The feature is ready for `box.commit{is_async=true}`. Once it's
implemented, it should check whether the queue is full and let the user
decide what to do next. Either wait or roll the tx back.

Closes #5536

@TarantoolBot document
Title: new configuration option: 'wal_queue_max_size'

`wal_queue_max_size` puts a limit on the amount of concurrent write requests
submitted to WAL.
`wal_queue_max_size` is measured in number of bytes to be written (0
means unlimited, which was the default behaviour before).
The option only affects replica behaviour at the moment, and defaults
to 16 megabytes. The option limits the pace at which replica reads new
transactions from master.

Here's when the option comes in handy:

Before this option was introduced such a situation could be possible:
there are 2 servers, a master and a replica, and the replica is down for
some period of time. While the replica is down, master serves requests
at a reasonable pace, possibly close to its WAL throughput limit. Once the
replica reconnects, it has to receive all the data master has piled up and
there's no limit in speed at which master sends the data to replica, and,
without the option, there was no limit in speed at which replica submitted
corresponding write requests to WAL.

This lead to a situation when replica's WAL was never in time to serve the
requests and the amount of pending requests was constantly growing.
There was no limit for memory WAL write requests take, and this clogging
of WAL write queue could even lead to replica using up all the available
memory.

Now, when `wal_queue_max_size` is set, appliers will stop reading new
transactions once the limit is reached. This will let WAL process all the
requests that have piled up and free all the excess memory.

parent 3010f024

No related branches found

No related tags found

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 398 additions and 10 deletions

Please register or to comment