I’m stuck on Stage #(change to your stage, ex. #YG4).
I’ve tried to check whether the connections are maintained which they seemed to be as well as checking whether the message parsing is the issue, which I couldn’t find any issues with.
I kind of ran out of hypothesis and would really appreciate some direction here.
Here are my logs:
[tester::#YG4] [handshake] Sent RDB file.
[tester::#YG4] [propagation] master: > SET foo 123
[tester::#YG4] [propagation] master: Sent bytes: "*3\r\n$3\r\nSET\r\n$3\r\nfoo\r\n$3\r\n123\r\n"
[tester::#YG4] [propagation] master: > SET bar 456
[tester::#YG4] [propagation] master: Sent bytes: "*3\r\n$3\r\nSET\r\n$3\r\nbar\r\n$3\r\n456\r\n"
[tester::#YG4] [propagation] master: > SET baz 789
[tester::#YG4] [propagation] master: Sent bytes: "*3\r\n$3\r\nSET\r\n$3\r\nbaz\r\n$3\r\n789\r\n"
[tester::#YG4] [test] Getting key foo
[tester::#YG4] [test] client: $ redis-cli GET foo
[tester::#YG4] [test] client: Sent bytes: "*2\r\n$3\r\nGET\r\n$3\r\nfoo\r\n"
[tester::#YG4] [test] client: Received bytes: "$-1\r\n"
[tester::#YG4] [test] client: Received RESP null bulk string: "$-1\r\n"
[your_program] [DEBUG] Handling command from client with role: GET replica
[your_program] [DEBUG] Received command: GET From client: replica id: conn_1736109889038147876
[your_program] [DEBUG] Starting with role replica: Key not found: foo
[tester::#YG4] [test] Retrying... (1/5 attempts)
[tester::#YG4] [test] client: > GET foo
[tester::#YG4] [test] client: Sent bytes: "*2\r\n$3\r\nGET\r\n$3\r\nfoo\r\n"
[tester::#YG4] [test] client: Received bytes: "$-1\r\n"
[tester::#YG4] [test] client: Received RESP null bulk string: "$-1\r\n"
[your_program] [DEBUG] Handling command from client with role: GET replica
[your_program] [DEBUG] Received command: GET From client: replica id: conn_1736109889038147876
[your_program] Key not found: foo
[tester::#YG4] [test] Retrying... (2/5 attempts)
[tester::#YG4] [test] client: > GET foo
[tester::#YG4] [test] client: Sent bytes: "*2\r\n$3\r\nGET\r\n$3\r\nfoo\r\n"
[your_program] [DEBUG] Handling command from client with role: GET replica
[your_program] [DEBUG] Received command: GET From client: replica id: conn_1736109889038147876
[tester::#YG4] [test] client: Received bytes: "$-1\r\n"
[tester::#YG4] [test] client: Received RESP null bulk string: "$-1\r\n"
[your_program] Key not found: foo
[tester::#YG4] [test] Retrying... (3/5 attempts)
[tester::#YG4] [test] client: > GET foo
[tester::#YG4] [test] client: Sent bytes: "*2\r\n$3\r\nGET\r\n$3\r\nfoo\r\n"
[tester::#YG4] [test] client: Received bytes: "$-1\r\n"
[tester::#YG4] [test] client: Received RESP null bulk string: "$-1\r\n"
[your_program] [DEBUG] Handling command from client with role: GET replica
[your_program] [DEBUG] Received command: GET From client: replica id: conn_1736109889038147876
[your_program] Key not found: foo
[tester::#YG4] [test] Retrying... (4/5 attempts)
[tester::#YG4] [test] client: > GET foo
[tester::#YG4] [test] client: Sent bytes: "*2\r\n$3\r\nGET\r\n$3\r\nfoo\r\n"
[tester::#YG4] [test] client: Received bytes: "$-1\r\n"
[tester::#YG4] [test] client: Received RESP null bulk string: "$-1\r\n"
[your_program] [DEBUG] Handling command from client with role: GET replica
[your_program] [DEBUG] Received command: GET From client: replica id: conn_1736109889038147876
[your_program] Key not found: foo
[tester::#YG4] [test] Retrying... (5/5 attempts)
[tester::#YG4] [test] client: > GET foo
[tester::#YG4] [test] client: Sent bytes: "*2\r\n$3\r\nGET\r\n$3\r\nfoo\r\n"
[tester::#YG4] [test] client: Received bytes: "$-1\r\n"
[tester::#YG4] [test] client: Received RESP null bulk string: "$-1\r\n"
[your_program] [DEBUG] Handling command from client with role: GET replica
[your_program] [DEBUG] Received command: GET From client: replica id: conn_1736109889038147876
[your_program] Key not found: foo
[tester::#YG4] Expected simple string or bulk string, got NIL
[tester::#YG4] Test failed
[tester::#YG4] Terminating program
[tester::#YG4] Program terminated successfully
I’ve submitted the code through the CLI so I assume you guys can see it.
Hi @jorgerodrigues, there might be multiple issues at play here. Let’s tackle the first two that I noticed:
The replica’s handshake should be blocking, namely it shouldn’t do anything until the connection to the master is established. So s.sendHandshake should not be a goroutine.
The connection to the master established in the handshake should not be closed, because the master will propagate commands (which your code need to handle as well) to replicas through those connections:
@jorgerodrigues can you share the code? I think you might be getting the same issue I had when I was doing the Redis challenge.
The issue I was running into was that: my replica was being called just after it got initialized, thus, not having enough time to finish the handshake with master.
This was my approach :
void listener(RomulusConn::BaseConnection* conn, int serverFD) {
if (conn->getRole() == RomulusConn::SLAVE) {
waitForHandShake(static_cast<RomulusConn::Slave*>(conn));
}
PRINT_SUCCESS("Listener Started On ServerFD: " + std::to_string(serverFD));
std::vector<std::thread> threads {};
while (true) {
struct sockaddr_in clientAddr {};
int clientAddrLen = sizeof(clientAddr);
int clientFD = accept(serverFD, (struct sockaddr *) &clientAddr, (socklen_t *) &clientAddrLen);
threads.emplace_back(std::thread(handleConnection, conn, clientFD)).detach();
}
close(serverFD);
}
Do you know the best way to put the code up on github without messing up with the codecrafters workflow? I am assuming if I change the remote it will mess it up but not sure.
I am not sure that this is the case that I am experiencing. I can recreate the issue doing the following steps:
Launch the master
Launch the replica
From redis-client sent a set foo 123
From redis-client sent a get foo 123 (receive the right result)
From the redis-client sent a redis-cli -h localhost -p 6380 get foo on this one I am getting nil
Okay. I started now by adding a check after the PSYNC command succeeds. The check goes as follows:
if string(buff[:11]) != "+FULLRESYNC" {
return fmt.Errorf("invalid PSYNC response from master: %s", string(buff))
}
Then I decided to test locally and run the tests to make sure I am still getting the same errors and, to my surprise, it all worked/passed, but again, intermittently.
I must admit, that I don’t fully understand at this point why simply adding this check makes it pass. Would you be able to shine a light on this?
Another thing is that it is not clear to me what the handling of the RDB file should be up to this stage - I went through the description of the previous steps and couldn’t see anything about needing to read/load the contents of the file yet - was simply assuming that it will come in later stages.
Am I missing anything here?
It’s highly unlikely that adding this check will make a significant difference. You may have just experienced a few lucky runs.
Since the RDB file is empty, you can read and safely ignore it. However, it’s crucial to verify it has been fully read rather than leaving things to chance.
So, I went ahead and created a new method for reading the file (even though we don’t load it in memory just yet.)
It does go through, we do get an error of a missing crlf item at the end which I need to debug still.
However, the tests are not consistently passing yet.
It is obvious that there is a lack on my understanding of which steps should happen here, and what are the preconditions for the handshake to be completed before propagation can start for real.
What exactly is expected at this point of the handshake for it to complete successfully and allow for propagation?
I think that the answer to this will lead to the understanding of what should be fixed and why it “sometimes works”
I am very eager to understand this more than I am to simply pass the stage.
Thanks for all the help so far. I really appreciate it.
The latest version of my changes is submitted.
Finally managed to get it working consistently now.
The issue was that after the handshake process was completed, I was passing down the master connection down for handling the commands, instead of the replica connection.
I also noticed that sometimes handshake process fails depending on the order and when the contents of the RDB file is sent. It is not clear to me yet what the issue is and I will continue trying to debug that part now.
I also noticed that sometimes handshake process fails depending on the order and when the contents of the RDB file is sent. It is not clear to me yet what the issue is and I will continue trying to debug that part now.
The thing about TCP/socket programming is that Read operates at a lower level. It doesn’t interpret or understand the specific protocol being used.
This means there’s no guarantee that a single Read will return exactly one Redis command.
For example, after the master sends the empty RDB file and the GETACK command, the replica might Read both in one go: