Skip to content
Snippets Groups Projects
Commit 80d379ee authored by Vladimir Davydov's avatar Vladimir Davydov Committed by Konstantin Osipov
Browse files

socket: fix race between unix tcp server stop and start

If called on a unix socket, bind(2) creates a new file, see unix(7).
When we stop a unix tcp server, we should remove that file. Currently,
we do it from the tcp server fiber, after the server loop is broken,
which happens when the socket is closed, see tcp_server_loop(). This
opens a time window for another tcp server to reuse the same path:

    main fiber                  tcp server loop
    ----------                  ---------------

    -- Start a tcp server.
    s = socket.tcp_server('unix/', sock_path, ...)
    -- Stop the server.
    s:close()

                                socket_readable? => no, break loop

    -- Start a new tcp server. Use the same path as before.
    -- This function succeeds, because the socket is closed
    -- so tcp_server_bind_addr() will clean up by itself.
    s = socket.tcp_server('unix/', sock_path, ...)

     tcp_server_bind
      tcp_server_bind_addr
       socket_bind => EADDRINUSE
       tcp_connect => ECONNREFUSED
       -- Remove dead unix socket.
       fio.unlink(addr.port)
       socket_bind => success

                                -- Deletes unix socket used
                                -- by the new server.
                                fio.unlink(addr.port)

In particular, the race results in sporadic failures of app-tap/console
test, which restarts a tcp server using the same file path.

To fix this issue, let's close the socket after removing the socket
file. This is absolutely legit on any UNIX system, and this eliminates
the race shown above, because a new server that tries to bind on the
same path as the one already used by a dying server will not receive
ECONNREFUSED until the socket fd is closed and hence the file is
removed.

A note about the app-tap/console test. After this patch is applied,
socket.close() takes a little longer for unix tcp server, because it
yields twice, once for removing the socket file and once for closing the
socket file descriptor. As a result, on_disconnect() trigger left from
the previous test case has time to run after session.type() check.
Actually, those triggers have already been tested and we should have
cleared them before proceeding to the next test case. So instead of
adding two new on_disconnect checks to the test plan, let's clear the
triggers before session.type() test case and remove 3 on_connect and 5
on_auth checks from the test plan.

Closes #3168
parent c6951c92
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment