Intermitting failures w/ replication extension

Hi, it’s weird that when I run “codecrafters test” – it passes all tests successfully (sometimes not on the first run, but like the 2nd or 3rd), but when I run “codecrafters submit” it looks like it always fails. How come is that?

There’s no difference between how tests are run for these two commands - in most cases this happens when there’s code that has some kind of timing/concurrency bug that only manifests some of the time (which explains why codecrafters test doesn’t pass either sometimes). If you paste the logs you’re seeing + a snippet of your code, we might be able to help figure out what’s wrong!

In general, trying to submit a stage when your tests are flaky is a bad idea - it’ll just become a bigger problem as you move to further stages.

Thanks for the prompt response!

I assume I wouldn’t have been able to submit my code in the early stages if it was an errored code, although, I agree it might be an issue with concurrency somewhere. Maybe when the server starts or something. I have attached the error below!

Btw, I’m now middleway through on the RDB persistence stage, but the error comes from the replication stages. Sometimes it’s a similar error from even earlier stages. I don’t have a screenshot now.

The error suggests that the tester didn’t get a response to REPLCONF GETACK *. Could you share what your code looks like, and maybe try to add some debug logs around the area where you receive the GETACK command and respond accordingly?

I think I’ve fixed the issue. It was related to how I was handling an incoming empty RDB file straight after the handshake process.

1 Like

@inner nice! What was the bug?

previously my code looked like this:

    private void SendPsync(NetworkStream stream)
    {
        const string replconfPsync = "*3\r\n$5\r\nPSYNC\r\n$1\r\n?\r\n$2\r\n-1\r\n";
        StreamWrite(stream, replconfPsync);

        if (!StreamRead(stream).Contains("FULLRESYNC"))
        {
            ThrowHandshakeFailed(nameof(SendPsync));
        }
    }
    
    private static void ThrowHandshakeFailed(string failedMethodName)
    {
        throw new Exception($"Handshake failed on step: {failedMethodName}");
    }

    private static string StreamRead(NetworkStream stream)
    {
        var buffer = new byte[1024];
        var bytesRead = stream.Read(buffer, 0, buffer.Length);
        var response = Encoding.UTF8.GetString(buffer, 0, bytesRead);

        return response;
    }

my guess is that it was causing program to read the stream as much as it could take in one go and then move forward. Usually it could read the first part, which was the “+FULLRESYNC” part. But after that, the master node also sends the empty RDB file and the program wasn’t explicitly expecting it so it was crashing when running codecrafter’s test (interestingly it was never crashing when I was starting master and replica servers locally).

So I’ve updated my code to read the line with the “+FULLRESYNC” and then read the next one with empty RDB file and this seems to be now working fine (perhaps I should also add some sort of timeout as well):

    private async Task SendPsync(NetworkStream stream)
    {
        const string replconfPsync = "*3\r\n$5\r\nPSYNC\r\n$1\r\n?\r\n$2\r\n-1\r\n";
        StreamWrite(stream, replconfPsync);

        var fullResyncResponse = await ReadLineAsync(stream);
        Console.WriteLine("Received: " + fullResyncResponse);

        if (fullResyncResponse.StartsWith("+FULLRESYNC"))
        {
            // Read the RDB length
            var rdbLengthStr = await ReadLineAsync(stream);
            if (rdbLengthStr.StartsWith("$"))
            {
                var rdbLength = int.Parse(rdbLengthStr.Substring(1));
                var rdbFile = new byte[rdbLength];

                // Read the RDB file
                var bytesRead = 0;
                while (bytesRead < rdbLength)
                {
                    var read = await stream.ReadAsync(rdbFile, bytesRead, rdbLength - bytesRead);
                    if (read == 0)
                    {
                        throw new Exception("Unexpected end of stream while reading RDB file.");
                    }

                    bytesRead += read;
                }

                Console.WriteLine("Received RDB file of length: " + rdbLength);
            }
        }
    }

    static async Task<string> ReadLineAsync(NetworkStream stream)
    {
        var sb = new StringBuilder();
        var buffer = new byte[1];
        while (true)
        {
            var bytesRead = await stream.ReadAsync(buffer, 0, 1);
            if (bytesRead == 0)
            {
                throw new Exception("Unexpected end of stream.");
            }

            var ch = (char)buffer[0];
            if (ch == '\r')
            {
                // Expecting \n after \r
                bytesRead = await stream.ReadAsync(buffer, 0, 1);
                if (bytesRead == 0 || buffer[0] != '\n')
                {
                    throw new Exception("Malformed response.");
                }

                break;
            }

            sb.Append(ch);
        }

        return sb.ToString();
    }
1 Like

Even i am facing the same issue in test stage 13 ($YG4), sometimes i am reciving data , and it is passing the tests and sometimes i am not able to pass the test cases even though the code is same.

hey @rohitpaulk , in this stage,can you tell that whether the master send the replconf, ping and replconf commands for second time immediately or will it send after a certain acknowledgement or timeouts.
because when i am reading over a connection
Sent RDB file.
[tester::#YD3] [test] master: $ redis-cli REPLCONF GETACK *
[tester::#YD3] [test] master: Sent bytes: “3\r\n$8\r\nREPLCONF\r\n$6\r\nGETACK\r\n$1\r\n\r\n”
[your_program]
[your_program] 15:20:27.714 [debug] Received data chunk: <<43, 70, 85, 76, 76, 82, 69, 83, 89, 78, 67, 32, 55, 53, 99, 100, 55, 98, 99, 49, 48, 99, 52, 57, 48, 52, 55, 101, 48, 100, 49, 54, 51, 54, 54, 48, 102, 51, 98, 57, 48, 54, 50, 53, 98, 49, 97, 102, 51, 49, …>>, bytes: 186
[your_program]
[your_program] 15:20:27.714 [debug] Received PSYNC response: “+FULLRESYNC 75cd7bc10c49047e0d163660f3b90625b1af31dc 0”, bytes read: 56
[your_program]
[your_program] 15:20:27.714 [info] PSYNC successful. Replication ID: 75cd7bc10c49047e0d163660f3b90625b1af31dc, Offset: 0
[your_program]
[your_program] 15:20:27.714 [debug] Received data chunk: <<36, 56, 56, 13, 10, 82, 69, 68, 73, 83, 48, 48, 49, 49, 250, 9, 114, 101, 100, 105, 115, 45, 118, 101, 114, 5, 55, 46, 50, 46, 48, 250, 10, 114, 101, 100, 105, 115, 45, 98, 105, 116, 115, 192, 64, 250, 5, 99, 116, 105, …>>, bytes: 130
[your_program]
[your_program] 15:20:27.714 [debug] Reading exact 88 bytes
[your_program]
[your_program] 15:20:27.714 [debug] Read exact data: <<82, 69, 68, 73, 83, 48, 48, 49, 49, 250, 9, 114, 101, 100, 105, 115, 45, 118, 101, 114, 5, 55, 46, 50, 46, 48, 250, 10, 114, 101, 100, 105, 115, 45, 98, 105, 116, 115, 192, 64, 250, 5, 99, 116, 105, 109, 101, 194, 109, 8, …>>
[your_program]
[your_program] 15:20:27.714 [info] Received complete RDB file of size 88 bytes
[your_program]
[your_program] 15:20:27.714 [debug] Starting to read command
[your_program]
[your_program] 15:20:27.714 [debug] Received data chunk: “3\r\n$8\r\nREPLCONF\r\n$6\r\nGETACK\r\n$1\r\n\r\n”, bytes: 37
[your_program]
[your_program] 15:20:27.714 [debug] Received data chunk: “$8\r\nREPLCONF\r\n$6\r\nGETACK\r\n$1\r\n*\r\n”, bytes: 33
[your_program]
[your_program] 15:20:27.714 [debug] Reading exact 8 bytes
[your_program]
[your_program] 15:20:27.714 [debug] Read exact data: “REPLCONF”
[your_program]
[your_program] 15:20:27.714 [debug] Received data chunk: “\r\n$6\r\nGETACK\r\n$1\r\n*\r\n”, bytes: 21
[your_program]
[your_program] 15:20:27.714 [debug] Argument total bytes: 14
[your_program]
[your_program] 15:20:27.714 [debug] Received data chunk: “$6\r\nGETACK\r\n$1\r\n*\r\n”, bytes: 19
[your_program]
[your_program] 15:20:27.714 [debug] Reading exact 6 bytes
[your_program]
[your_program] 15:20:27.714 [debug] Read exact data: “GETACK”
[your_program]
[your_program] 15:20:27.714 [debug] Received data chunk: “\r\n$1\r\n*\r\n”, bytes: 9
[your_program]
[your_program] 15:20:27.714 [debug] Argument total bytes: 12
[your_program]
[your_program] 15:20:27.714 [debug] Received data chunk: “$1\r\n*\r\n”, bytes: 7
[your_program]
[your_program] 15:20:27.714 [debug] Reading exact 1 bytes
[your_program]
[your_program] 15:20:27.714 [debug] Read exact data: “"
[your_program]
[your_program] 15:20:27.714 [debug] Received data chunk: “\r\n”, bytes: 2
[your_program]
[your_program] 15:20:27.714 [debug] Argument total bytes: 7
[your_program]
[your_program] 15:20:27.714 [debug] Finished reading all arguments. Total bytes: 37
[your_program]
[your_program] 15:20:27.718 [info] Received command: [“REPLCONF”, “GETACK”, "
”], total bytes: 37
[your_program]
[your_program] 15:20:27.718 [info] Executing REPLCONF GETACK. Current offset: 0
[your_program]
[your_program] 15:20:27.718 [info] Sent REPLCONF ACK response: “*3\r\n$8\r\nREPLCONF\r\n$3\r\nACK\r\n$1\r\n0\r\n”
[your_program]
[your_program] 15:20:27.718 [debug] Starting to read command
[your_program]
[your_program] 15:20:27.718 [error] Error receiving data: :emsgsize
[your_program]
[your_program] 15:20:27.718 [error] Error reading command header: :emsgsize
[your_program]
[your_program] 15:20:27.718 [error] Error reading command: :emsgsize
[your_program]
[your_program] 15:20:27.718 [info] Handshake completed successfully
[tester::#YD3] [test] master: Received bytes: “*3\r\n$8\r\nREPLCONF\r\n$3\r\nACK\r\n$1\r\n0\r\n”
[tester::#YD3] [test] master: Received RESP array: [“REPLCONF”, “ACK”, “0”]
[tester::#YD3] [test] Received [“REPLCONF”, “ACK”, “0”]
[tester::#YD3] [propagation] master: > PING
[tester::#YD3] [propagation] master: Sent bytes: “*1\r\n$4\r\nPING\r\n”
[tester::#YD3] [test] master: > REPLCONF GETACK *
[tester::#YD3] [test] master: Sent bytes: “3\r\n$8\r\nREPLCONF\r\n$6\r\nGETACK\r\n$1\r\n\r\n”
[tester::#YD3] write tcp 127.0.0.1:6379->127.0.0.1:56026: write: broken pipe
[tester::#YD3] Test failed
[tester::#YD3] Terminating program
[your_program]
[your_program] 15:20:27.726 [notice] SIGTERM received - shutting down
[your_program]
[tester::#YD3] Program terminated successfully

I am able to parse a first command successfully , but after processing that command, i am reading again , but i am getting :emsgsize error and i am not getting why it is the case , and the connection is disconnecting between the master and replica which has been established for communication,

@ProgMastermind YG4 is “Command Processing”, and ACKs aren’t involved in that stage. Could you create a separate post for this please? I think you’re referring to #YD3?

Also, please try to phrase your questions clearly and use triple backticks (```) to format your code. It’s hard to read through your message and figure out what the actual question is.

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.