#YG4 - Expected simple string or bulk string, got NIL error

I’m stuck on Stage #(change to your stage, ex. #YG4).

I’ve tried to check whether the connections are maintained which they seemed to be as well as checking whether the message parsing is the issue, which I couldn’t find any issues with.

I kind of ran out of hypothesis and would really appreciate some direction here.

Here are my logs:


[tester::#YG4] [handshake] Sent RDB file.
[tester::#YG4] [propagation] master: > SET foo 123
[tester::#YG4] [propagation] master: Sent bytes: "*3\r\n$3\r\nSET\r\n$3\r\nfoo\r\n$3\r\n123\r\n"
[tester::#YG4] [propagation] master: > SET bar 456
[tester::#YG4] [propagation] master: Sent bytes: "*3\r\n$3\r\nSET\r\n$3\r\nbar\r\n$3\r\n456\r\n"
[tester::#YG4] [propagation] master: > SET baz 789
[tester::#YG4] [propagation] master: Sent bytes: "*3\r\n$3\r\nSET\r\n$3\r\nbaz\r\n$3\r\n789\r\n"
[tester::#YG4] [test] Getting key foo
[tester::#YG4] [test] client: $ redis-cli GET foo
[tester::#YG4] [test] client: Sent bytes: "*2\r\n$3\r\nGET\r\n$3\r\nfoo\r\n"
[tester::#YG4] [test] client: Received bytes: "$-1\r\n"
[tester::#YG4] [test] client: Received RESP null bulk string: "$-1\r\n"
[your_program] [DEBUG] Handling command from client with role:  GET replica
[your_program] [DEBUG] Received command:  GET From client: replica id:  conn_1736109889038147876
[your_program] [DEBUG] Starting with role replica: Key not found: foo
[tester::#YG4] [test] Retrying... (1/5 attempts)
[tester::#YG4] [test] client: > GET foo
[tester::#YG4] [test] client: Sent bytes: "*2\r\n$3\r\nGET\r\n$3\r\nfoo\r\n"
[tester::#YG4] [test] client: Received bytes: "$-1\r\n"
[tester::#YG4] [test] client: Received RESP null bulk string: "$-1\r\n"
[your_program] [DEBUG] Handling command from client with role:  GET replica
[your_program] [DEBUG] Received command:  GET From client: replica id:  conn_1736109889038147876
[your_program] Key not found: foo
[tester::#YG4] [test] Retrying... (2/5 attempts)
[tester::#YG4] [test] client: > GET foo
[tester::#YG4] [test] client: Sent bytes: "*2\r\n$3\r\nGET\r\n$3\r\nfoo\r\n"
[your_program] [DEBUG] Handling command from client with role:  GET replica
[your_program] [DEBUG] Received command:  GET From client: replica id:  conn_1736109889038147876
[tester::#YG4] [test] client: Received bytes: "$-1\r\n"
[tester::#YG4] [test] client: Received RESP null bulk string: "$-1\r\n"
[your_program] Key not found: foo
[tester::#YG4] [test] Retrying... (3/5 attempts)
[tester::#YG4] [test] client: > GET foo
[tester::#YG4] [test] client: Sent bytes: "*2\r\n$3\r\nGET\r\n$3\r\nfoo\r\n"
[tester::#YG4] [test] client: Received bytes: "$-1\r\n"
[tester::#YG4] [test] client: Received RESP null bulk string: "$-1\r\n"
[your_program] [DEBUG] Handling command from client with role:  GET replica
[your_program] [DEBUG] Received command:  GET From client: replica id:  conn_1736109889038147876
[your_program] Key not found: foo
[tester::#YG4] [test] Retrying... (4/5 attempts)
[tester::#YG4] [test] client: > GET foo
[tester::#YG4] [test] client: Sent bytes: "*2\r\n$3\r\nGET\r\n$3\r\nfoo\r\n"
[tester::#YG4] [test] client: Received bytes: "$-1\r\n"
[tester::#YG4] [test] client: Received RESP null bulk string: "$-1\r\n"
[your_program] [DEBUG] Handling command from client with role:  GET replica
[your_program] [DEBUG] Received command:  GET From client: replica id:  conn_1736109889038147876
[your_program] Key not found: foo
[tester::#YG4] [test] Retrying... (5/5 attempts)
[tester::#YG4] [test] client: > GET foo
[tester::#YG4] [test] client: Sent bytes: "*2\r\n$3\r\nGET\r\n$3\r\nfoo\r\n"
[tester::#YG4] [test] client: Received bytes: "$-1\r\n"
[tester::#YG4] [test] client: Received RESP null bulk string: "$-1\r\n"
[your_program] [DEBUG] Handling command from client with role:  GET replica
[your_program] [DEBUG] Received command:  GET From client: replica id:  conn_1736109889038147876
[your_program] Key not found: foo
[tester::#YG4] Expected simple string or bulk string, got NIL
[tester::#YG4] Test failed
[tester::#YG4] Terminating program
[tester::#YG4] Program terminated successfully

I’ve submitted the code through the CLI so I assume you guys can see it.

I’ll take a look shortly.

Hi @jorgerodrigues, there might be multiple issues at play here. Let’s tackle the first two that I noticed:

  1. The replica’s handshake should be blocking, namely it shouldn’t do anything until the connection to the master is established. So s.sendHandshake should not be a goroutine.

  1. The connection to the master established in the handshake should not be closed, because the master will propagate commands (which your code need to handle as well) to replicas through those connections:

Good catches, thanks Andy!
I did the changes but the error remains the same after those.
Any other good ideas of what could be creating the issues?

Sorry if I missed it, but it seems that the conn to the master in the handshake is “abandoned”, without handling further commands from the master?

Another good point.
I have now added a connection handler at the end of the handshake as below (and submitted so you can also see/run)

	s.handleConnection(masterConn)

however, after running the tests the error is the very same

Hey @andy1li let me know if you discovered anything here and thanks for the help so far

Great progress so far! Let’s address three more issues:

  1. The code is not passing an earlier stage #ZU2 (Handle concurrent clients).

This can be fixed by calling handleConnection as a goroutine:


  1. Similarly, handleConnection should be called as a goroutine in sendHandshake as well:


  1. It appears that the empty RDB file from the master isn’t being handled at the moment.

Awesome stuff, @andy1li . It seems that it was indeed the lack combination of:

  1. Not starting the reading of commands after the handshake, and
  2. Not properly handling the reading of the RDB file from the commands

Thanks for the help!

1 Like

@andy1li
Feeling a bit dumb here…

The code passes the test…but sometimes it fails.
It tells me that one of three things might be happening:

  1. Concurrency issue
  2. Or a lack of handling possible ways that the commands are send to the replica.
  3. There’s an issue with the test runs that runs it successfully sometimes when it should fail.

I’ve been trying to find any of the two but without success. Would you be able to help?

@jorgerodrigues can you share the code? I think you might be getting the same issue I had when I was doing the Redis challenge.

The issue I was running into was that: my replica was being called just after it got initialized, thus, not having enough time to finish the handshake with master.

This was my approach :

void listener(RomulusConn::BaseConnection* conn, int serverFD) {

        if (conn->getRole() == RomulusConn::SLAVE) {
            waitForHandShake(static_cast<RomulusConn::Slave*>(conn));
        }

        PRINT_SUCCESS("Listener Started On ServerFD: " + std::to_string(serverFD));

        std::vector<std::thread> threads {};
        while (true) {
            struct sockaddr_in clientAddr {};
            int clientAddrLen = sizeof(clientAddr);
            int clientFD = accept(serverFD, (struct sockaddr *) &clientAddr, (socklen_t *) &clientAddrLen);

            threads.emplace_back(std::thread(handleConnection, conn, clientFD)).detach();
        }
        close(serverFD);
    }
1 Like

Do you know the best way to put the code up on github without messing up with the codecrafters workflow? I am assuming if I change the remote it will mess it up but not sure.

I am not sure that this is the case that I am experiencing. I can recreate the issue doing the following steps:

  1. Launch the master
  2. Launch the replica
  3. From redis-client sent a set foo 123
  4. From redis-client sent a get foo 123 (receive the right result)
  5. From the redis-client sent a redis-cli -h localhost -p 6380 get foo on this one I am getting nil

@jorgerodrigues I’ll take another look at your code shortly.

In the meantime, you can publish your code to GitHub like this:

1 Like

Hi @jorgerodrigues, I couldn’t locate the code you added to handle the empty RDB file from the master.

To debug, I added a print and ran your code several times:

When the tests were passing, the empty RDB file happened to be read with the PSYNC response (so it got handled accidentally):

When only the PSYNC response was read, the tests would fail because the RDB file was not handled:

You guys are awesome. Thanks for the help.

Okay. I started now by adding a check after the PSYNC command succeeds. The check goes as follows:

	if string(buff[:11]) != "+FULLRESYNC" {
		return fmt.Errorf("invalid PSYNC response from master: %s", string(buff))
	}

Then I decided to test locally and run the tests to make sure I am still getting the same errors and, to my surprise, it all worked/passed, but again, intermittently.

I must admit, that I don’t fully understand at this point why simply adding this check makes it pass. Would you be able to shine a light on this?

Thanks a lot for all the help so far.

1 Like

Another thing is that it is not clear to me what the handling of the RDB file should be up to this stage - I went through the description of the previous steps and couldn’t see anything about needing to read/load the contents of the file yet - was simply assuming that it will come in later stages.
Am I missing anything here?

if string(buff[:11]) != “+FULLRESYNC”

It’s highly unlikely that adding this check will make a significant difference. You may have just experienced a few lucky runs.


Since the RDB file is empty, you can read and safely ignore it. However, it’s crucial to verify it has been fully read rather than leaving things to chance.

Thanks again for looking into it!

So, I went ahead and created a new method for reading the file (even though we don’t load it in memory just yet.)

It does go through, we do get an error of a missing crlf item at the end which I need to debug still.
However, the tests are not consistently passing yet.

It is obvious that there is a lack on my understanding of which steps should happen here, and what are the preconditions for the handshake to be completed before propagation can start for real.

What exactly is expected at this point of the handshake for it to complete successfully and allow for propagation?
I think that the answer to this will lead to the understanding of what should be fixed and why it “sometimes works”

I am very eager to understand this more than I am to simply pass the stage.

Thanks for all the help so far. I really appreciate it.
The latest version of my changes is submitted.

Finally managed to get it working consistently now.
The issue was that after the handshake process was completed, I was passing down the master connection down for handling the commands, instead of the replica connection.

I also noticed that sometimes handshake process fails depending on the order and when the contents of the RDB file is sent. It is not clear to me yet what the issue is and I will continue trying to debug that part now.

1 Like

I also noticed that sometimes handshake process fails depending on the order and when the contents of the RDB file is sent. It is not clear to me yet what the issue is and I will continue trying to debug that part now.

The thing about TCP/socket programming is that Read operates at a lower level. It doesn’t interpret or understand the specific protocol being used.

This means there’s no guarantee that a single Read will return exactly one Redis command.

For example, after the master sends the empty RDB file and the GETACK command, the replica might Read both in one go: