Sending output but not being recognised by tester?

I’m stuck on Stage #YG4.

It seems like my solution is writing the bytes to the output stream of the connection, but the same is not being read by the tester.

Here are my logs:

remote: [tester::#YG4] Running tests for Stage #YG4 (Replication - Command Processing)
remote: [tester::#YG4] Master is running on port 6379
remote: [tester::#YG4] $ ./your_program.sh --port 6380 --replicaof "localhost 6379"
remote: [tester::#YG4] [handshake] master: Waiting for replica to initiate handshake with "PING" command
remote: [your_program] Server started at port: 6380
remote: [your_program] [M :6379] Sending [42, 49, 13, 10, 36, 52, 13, 10, 80, 73]
remote: [tester::#YG4] [handshake] master: Received bytes: "*1\r\n$4\r\nPING\r\n"
remote: [tester::#YG4] [handshake] master: Received RESP array: ["PING"]
remote: [tester::#YG4] [handshake] Received ["PING"]
remote: [tester::#YG4] [handshake] master: Sent "PONG"
remote: [tester::#YG4] [handshake] master: Sent bytes: "+PONG\r\n"
remote: [tester::#YG4] [handshake] master: Waiting for replica to send "REPLCONF listening-port 6380" command
remote: [your_program] [M :6379] Sending [42, 51, 13, 10, 36, 56, 13, 10, 82, 69]
remote: [tester::#YG4] [handshake] master: Received bytes: "*3\r\n$8\r\nREPLCONF\r\n$14\r\nlistening-port\r\n$4\r\n6380\r\n"
remote: [tester::#YG4] [handshake] master: Received RESP array: ["REPLCONF", "listening-port", "6380"]
remote: [tester::#YG4] [handshake] Received ["REPLCONF", "listening-port", "6380"]
remote: [tester::#YG4] [handshake] master: Sent "OK"
remote: [tester::#YG4] [handshake] master: Sent bytes: "+OK\r\n"
remote: [tester::#YG4] [handshake] master: Waiting for replica to send "REPLCONF capa" command
remote: [tester::#YG4] [handshake] master: Received bytes: "*3\r\n$8\r\nREPLCONF\r\n$4\r\ncapa\r\n$6\r\npsync2\r\n"
remote: [tester::#YG4] [handshake] master: Received RESP array: ["REPLCONF", "capa", "psync2"]
remote: [tester::#YG4] [handshake] Received ["REPLCONF", "capa", "psync2"]
remote: [your_program] [M :6379] Sending [42, 51, 13, 10, 36, 56, 13, 10, 82, 69]
remote: [tester::#YG4] [handshake] master: Sent "OK"
remote: [tester::#YG4] [handshake] master: Sent bytes: "+OK\r\n"
remote: [tester::#YG4] [handshake] master: Waiting for replica to send "PSYNC" command
remote: [your_program] [M :6379] Sending [42, 51, 13, 10, 36, 53, 13, 10, 80, 83]
remote: [tester::#YG4] [handshake] master: Received bytes: "*3\r\n$5\r\nPSYNC\r\n$1\r\n?\r\n$2\r\n-1\r\n"
remote: [tester::#YG4] [handshake] master: Received RESP array: ["PSYNC", "?", "-1"]
remote: [tester::#YG4] [handshake] Received ["PSYNC", "?", "-1"]
remote: [tester::#YG4] [handshake] master: Sent "FULLRESYNC 75cd7bc10c49047e0d163660f3b90625b1af31dc 0"
remote: [tester::#YG4] [handshake] master: Sent bytes: "+FULLRESYNC 75cd7bc10c49047e0d163660f3b90625b1af31dc 0\r\n"
remote: [tester::#YG4] [handshake] Sending RDB file...
remote: [tester::#YG4] [handshake] master: Sent bytes: "$88\r\nREDIS0011\xfa\tredis-ver\x057.2.0\xfa\nredis-bits\xc0@\xfa\x05ctime\xc2m\b\xbce\xfa\bused-mem°\xc4\x10\x00\xfa\baof-base\xc0\x00\xff\xf0n;\xfe\xc0\xffZ\xa2"
remote: [tester::#YG4] [handshake] Sent RDB file.
remote: [tester::#YG4] [propagation] master: > SET foo 123
remote: [tester::#YG4] [propagation] master: Sent bytes: "*3\r\n$3\r\nSET\r\n$3\r\nfoo\r\n$3\r\n123\r\n"
remote: [tester::#YG4] [propagation] master: > SET bar 456
remote: [tester::#YG4] [propagation] master: Sent bytes: "*3\r\n$3\r\nSET\r\n$3\r\nbar\r\n$3\r\n456\r\n"
remote: [tester::#YG4] [propagation] master: > SET baz 789
remote: [tester::#YG4] [propagation] master: Sent bytes: "*3\r\n$3\r\nSET\r\n$3\r\nbaz\r\n$3\r\n789\r\n"
remote: [tester::#YG4] [test] Getting key foo
remote: [your_program] [Handshake] Slave role with master replID: 75cd7bc10c49047e0d163660f3b90625b1af31dc and replOffset: 0
remote: [tester::#YG4] [test] client: $ redis-cli GET foo
remote: [tester::#YG4] [test] client: Sent bytes: "*2\r\n$3\r\nGET\r\n$3\r\nfoo\r\n"
remote: [your_program] [Handshake] [RDB File] Recieved 88 bytes -> [82,69,68,73,83,48,48,49,49,-6]...
remote: [your_program] [Handshake] [RDB File] Load failed: java.nio.BufferUnderflowException
remote: [your_program] [Handshake] Completed successfully
remote: [your_program] [:49424] Accepted connection
remote: [your_program] [C :49424] [143029 ms] (GET, foo)
remote: [your_program] [M :6379] [143031 ms] (SET, foo, 123)
remote: [your_program] [M :6379] [143032 ms] (SET, bar, 456)
remote: [your_program] [M :6379] [143033 ms] (SET, baz, 789)
remote: [your_program] [C :49424] Sending [36, 45, 49, 13, 10]
remote: [tester::#YG4] [test] client: Received bytes: "$-1\r\n"
remote: [tester::#YG4] [test] client: Received RESP null bulk string: "$-1\r\n"
remote: [tester::#YG4] [test] Retrying... (1/5 attempts)
remote: [tester::#YG4] [test] client: > GET foo
remote: [tester::#YG4] [test] client: Sent bytes: "*2\r\n$3\r\nGET\r\n$3\r\nfoo\r\n"
remote: [your_program] [C :49424] [143540 ms] (GET, foo)
remote: [your_program] [C :49424] Sending [36, 51, 13, 10, 49, 50, 51, 13, 10]
remote: [tester::#YG4] [test] timed out, test exceeded 10 seconds
remote: [tester::#YG4] [test] Test failed
remote: [tester::#YG4] [test] Terminating program
remote: [tester::#YG4] [test] Error terminating program: 'execution timed out'
remote: 
remote: View our article on debugging test failures: https://codecrafters.io/debug
remote: 

You can see from the logs that:

  • The server performs the handshake appropriately, exchanges the RDB file with the master
  • The master’s propagated commands are received by the replica server, but no response is sent back to the master.
remote: [your_program] [M :6379] [143031 ms] (SET, foo, 123)
remote: [your_program] [M :6379] [143032 ms] (SET, bar, 456)
remote: [your_program] [M :6379] [143033 ms] (SET, baz, 789)
  • For both of the client queries, the server writes some bytes to the output stream:
remote: [your_program] [C :49424] [143029 ms] (GET, foo)
...
remote: [your_program] [C :49424] Sending [36, 45, 49, 13, 10]
remote: [tester::#YG4] [test] client: Received bytes: "$-1\r\n"
remote: [tester::#YG4] [test] client: Received RESP null bulk string: "$-1\r\n"
remote: [tester::#YG4] [test] Retrying... (1/5 attempts)
remote: [tester::#YG4] [test] client: > GET foo
remote: [tester::#YG4] [test] client: Sent bytes: "*2\r\n$3\r\nGET\r\n$3\r\nfoo\r\n"
remote: [your_program] [C :49424] [143540 ms] (GET, foo)
remote: [your_program] [C :49424] Sending [36, 51, 13, 10, 49, 50, 51, 13, 10]

But the last command is not read by the tester, and the whole test times out.

I tried replicating the tester’s scenarios locally with two instances of my server (one as master, and the other as client), and all seems to work well. I can’t seem to figure out the issue.

You can checkout the code here.

I am super unsure, but I think it’s a bug in the tester?

From what I could gather, in Golang, defer only returns when the enclosing function exits, and not the enclosing scope. For example:

package main

import (
	"fmt"
	"sync"
)

func main() {
	mtx := sync.Mutex{}
	
	for i := range 5 {
		fmt.Printf("Iteration: %d\n", i)
		mtx.Lock();
		defer mtx.Unlock();
	}
	
	fmt.Println("ended")
}

If you are to run this code, you would notice that the execution gets stuck after the second iteration as we call lock on the mutex mtx, but the deferred unlock is never called:

I think the same is happening in the tester’s Run function for the SendCommandTestCase testcase:

The outline of this function is also analogous to the example that I have shown above:

  • We acquire the lock on readMutex and call defer in the retry loop to unlock it
  • We read the value from the connection
  • If the shouldRetryFunc returns true, then we busy run the same loop again. This is problematic as the defer for the current instance of the code is never called and leads to a deadlock!

I tried to add detailed logs to my current implementation and ran it multiple times, and this deadlock seems to be the reason this test is failing. I might be making some silly error in understanding here (given that the test has been functioning well for quite a while now), so would appreciate a sanity check here.

Seems like the relevant commit is only a month old, so it might be possible that it may not have been caught after all.

Hey @EshaanAgg, great catch! I’ve pushed a fix and upgraded the tester version for you. Let me know if everything works as expected!

Thanks for the quick fix! Even my tests pass for the stage now. Glad to know that after all those hours of turmoil, my implementation wasn’t completely off :slight_smile:

Might be a bit out of scope, but this gave me an idea for a devops pipeline that might be relevant for the team in future: whenever we are refactoring existing stages, we can take some of the completed solutions from the users and also run the updated tests on them (along with the official solutions). This might help to catch such regressions, which may be caught only on “almost” correct implementations!

Thanks again!

2 Likes

We’re working on this as part of our recent work on code examples :slightly_smiling_face: Before every tester is deployed we’ll test 100 or so code examples to observe where we get different results.

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.