#ZN8 no content received on the first PING

I’m stuck on Stage #ZN8.

Test fails on the first PING command. However, I can connect replica to the master locally, do the handshake, then connect client to the master, execute SET x y command (it replicates to replica), then connect another client to replica, execute GET x command, and receive the value y back.

If I compare logs from the test execution of the previous stage CF8, I don’t see a meaningful differences.

Here are my logs from the CF8 test (pass):

[tester::#CF8] Running tests for Stage #CF8 (Replication - Empty RDB transfer)
[tester::#CF8] $ ./your_program.sh --port 6379
[your_program] Settings file: /root/.local/share/MyRedis/settings.json
[your_program] Logs from your program will appear here!
[your_program] Listening on port 6379
[tester::#CF8] [client] $ redis-cli PING
[tester::#CF8] [client] Sent bytes: "*1\r\n$4\r\nPING\r\n"
[your_program] Connected new client!
[your_program] Waiting for request...
[your_program] Thread 4. Received request: *1
[your_program] $4
[your_program] PING
[your_program] 
[your_program] Going to send response: +PONG
...

And here are my logs from the ZN8 test (failed):

[tester::#ZN8] Running tests for Stage #ZN8 (Replication - Single-replica propagation)
[tester::#ZN8] $ ./your_program.sh --port 6379
[your_program] Settings file: /root/.local/share/MyRedis/settings.json
[your_program] Logs from your program will appear here!
[your_program] Listening on port 6379
[tester::#ZN8] [handshake] [replica] $ redis-cli PING
[tester::#ZN8] [handshake] [replica] Sent bytes: "*1\r\n$4\r\nPING\r\n"
[your_program] Connected new client!
[your_program] Waiting for request...
[tester::#ZN8] Received: "" (no content received)
[tester::#ZN8]            ^ error
[tester::#ZN8] Error: Expected start of a new RESP2 value (either +, -, :, $ or *)
[tester::#ZN8] Test failed

From the server perspective, I expect the same behavior, but the [handshake] [replica] client gets closed immediately claiming to content received.

Is there any difference between [tester::#CF8] [client] and [tester::#ZN8] [handshake] [replica] clients? How can I reproduce the issue locally?

Hey @AntonMinko, looks like the issue is caused by calling server.Stop() inside the finally block.

In this stage, both a client and a replica will connect to your server:

However, the client connection is closed (because the server is stopped) before it can send anything to your server:

Let me know if you’d like any further clarification!

Ouch, that’s a great catch, @andy1li
Indeed, I had a race condition that didn’t reveal itself until this stage. From the very first stages, I had a code like this:

while (true)
{
    try
    {
        server.Start(10);
        var socket = await server.AcceptSocketAsync(); // wait for client
        WriteLine("Connected new client!");
        
        var worker = serviceProvider.GetRequiredService<IWorker>();

        _ = Task.Run(async () => await HandleConnectionAsync(socket, worker));
    }
    finally
    {
        server.Stop();
    }
}

It recreates the server after connecting each client. This code is inefficient, but it worked as closing a server does not close client sockets.

The problem: Note the 10 in server.Start(10). It means that there is a queue of up to 10 incoming connections. As you properly mentioned, the test connects with the replica and client (apparently, in parallel). Assuming the listener get the client connection request first and the replica connection request comes in second. The server.AcceptSocketAsync() creates the socket for a client connection and closes the server - together with the second connection from the replica waiting in the queue!

Solution: move server.Start and server.Stop outside the while loop.

Thank you a lot, @andy1li, for looking in this issue (my bug) and pointing it out!

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.