Replication (#YG4) GET commands received before SET?

NexFlare · April 11, 2024, 11:34pm

I’m stuck on Stage 13.

I’ve tried created a handshake with master node. The test passes a few times. What I can see is that before a SET command can come to replica node a GET command is called that returns nil as response. I made sure to add a lock on getter and setter of my cache. I cannot find a way to run the GET command after the SET is successful.

Here are my logs:
This image shows my test are passing. If you notice, it is because the replica got SET command before a GET

Here is the screenshot that shows GET called before SET could be called.

And here’s are links to relevant files:

github.com

NexFlare/codecrafters-redis-go/blob/master/internal/redis/index.go#L169


      
          	for i, s := range(handShakeArray) {
          		s = response.GetArrayString(response.GetBulkString(s))
          		go r.handleHandShakeRequest(conn, s, ch)
          		resp := <- ch
          		fmt.Println("Response complete ", i, s)
          		if resp == 0 {
          			fmt.Println("Error in handshake. Closing connection")
          			conn.Close()
          			return
          		}
          	}
          
          	for {
          		buffer := make([]byte, 128)
          		_, err = conn.Read(buffer)
          		if err == nil {
          			strBuffer := strings.Trim(string(buffer), "\x00")
          			if len(strBuffer) > 0 {
          				go r.handleCommand(conn, strBuffer, false)
          			}
          		}

github.com

NexFlare/codecrafters-redis-go/blob/master/internal/redis/handleCommand.go#L37


      
          	if r.Replication.Role == "master" {
          		f(response.GetSimpleString(fmt.Sprintf("FULLRESYNC %s 0", r.Replication.MasterReplid)))
          		fileContent := "UkVESVMwMDEx+glyZWRpcy12ZXIFNy4yLjD6CnJlZGlzLWJpdHPAQPoFY3RpbWXCbQi8ZfoIdXNlZC1tZW3CsMQQAPoIYW9mLWJhc2XAAP/wbjv+wP9aog=="
          		bin, err := base64.StdEncoding.DecodeString(fileContent)
          		if err == nil {
          			f(response.GetFileString(string(bin)))
          		}
          	}
          }
          
          func(r *Redis) handleSetCommand(cmd *command.Command, f func(string)) {
          	var responseString string
          	var hasError bool
          	var err error
          	if len(cmd.Arguments) == 2 {
          		r.Store.Set(cmd.Arguments[0], cmd.Arguments[1])
          		responseString = response.GetSimpleString("OK")
          	} else if len(cmd.Arguments) == 4 {
          		duration, err := strconv.Atoi(cmd.Arguments[3])
          		if err != nil {
          			responseString = response.GetSimpleString("ERROR")

Any hint or solution to tackle this problem is highly appreciated.

Thank you

naqet · April 15, 2024, 9:56am

I’m stuck on the same issue. I have the same problem with GET commands possibly being called before SET commands.

Stage 12 went through without any issues, and locally it seams to be working fine.

Here is my repo:
https://github.com/naqet/naqet-codecrafters-redis-go.

rohitpaulk · April 16, 2024, 12:34pm

@NexFlare / @naqet there’s a 500ms sleep between each retry there, and we’re retrying 5 times, so that’s 2.5 seconds in total - the tester assumes that should be sufficient time to have received the SET command.

I wonder if there’s an issue with a lock not being released? Can you try adding logs around when (a) a lock is acquired (b) a process is going to block waiting for a lock and (c) a lock is released? This might help pinpoint the issue.

remuspoienar · April 17, 2024, 9:31am

Same issue here. I didn’t use channels or any intentional blocking so far, And just like @naqet all things went well until now, and locally i can’t reproduce the scenario in the test. After running the tests multiple times i noticed that it is definitely a race condition, in most cases the replica manages to respond to 2 of the 3 SET commands.
@rohitpaulk could i use any locks without knowing, aka is the net.Conn read/write based on a lock/channel behind the scenes ?

naqet · April 17, 2024, 9:46am

I used sync.Map as database for simplicity, so there shouldn’t be any issues with locks as far as I know.

nishojib · April 17, 2024, 9:20pm

The issue could be related to the reply you get after PSYNC… sometimes its comes out like this:

+FULLRESYNC 75cd7bc10c49047e0d163660f3b90625b1af31dc 0\r\n$88\r\nREDIS0011\xfa\tredis-ver\x057.2.0\xfa\nredis-bits\xc0@\xfa\x05ctime\xc2m\b\xbce\xfa\bused-mem°\xc4\x10\x00\xfa\baof-base\xc0\x00\xff\xf0n;\xfe\xc0\xffZ\xa2*3\r\n$3\r\nSET\r\n$3\r\nfoo\r\n$3\r\n123\r\n*3\r\n$3\r\nSET\r\n$3\r\nbar\r\n$3\r\n456\r\n*3\r\n$3\r\nSET\r\n$3\r\nbaz\r\n$3\r\n789\r\n

everything gets packed together so you need to figure out a way to parse queries like this…

you can put a print statement right after the PSYNC command is sent and check the reply you get… use fmt.Printf("%#v", string(buf[:n])) to also print the new line characters

rohitpaulk · April 18, 2024, 7:58am

Ah, yep this can be one cause here. The response to PSYNC, the RDB file and the propagated commands can be received all at once. This is because there’s no “wait” step (like waiting for a response) in between - so even though they’re sent “one by one”, we can’t know if your program received them and thus have to continue sending the others. So your program might receive all at once, or one by one.

We’ll incorporate this into the instructions!

remuspoienar · April 18, 2024, 7:45pm

@rohitpaulk i see that in stage 13, in tests the master is not booted in the same way as in tests of stage 12. i added some extra logging and they just don’t show, however they appear in stage 12 tests (See screenshots) . Are we supposed to block only the replica commands(like GET foo) while the handshake is still in process ? Really lost here, tried to solve this all week, initially without channels, now with channels for state and im at the same point. What can i do to get more understanding of what is wrong with my code ? The test doesn’t seem to be using my code for the master instance, otherwise logs would be the answer…

remuspoienar · April 18, 2024, 7:45pm

This is test 12 output with extra logs at the start

rohitpaulk · April 19, 2024, 12:04am

Hey @remuspoienar,

I understand this can be hard to debug, we’re actively thinking of ways to make our logs + instructions better. Hopefully once we’ve figured this out we’ll be able to make the experience smoother for others

The test doesn’t seem to be using my code for the master instance, otherwise logs would be the answer…

In stage 13, the tester acts as the master and only executes user’s code as a replica. The reason we do this is so that we can test the replica’s behaviour and ensure it is correct & matches the official Redis specification. If we didn’t do this, you might run into false positives, which would likely cause problems in later stages.

Are we supposed to block only the replica commands(like GET foo) while the handshake is still in process?

Hmm, not really.

When a Redis server is booted as the replica, it must do two things at once:

Respond to Redis clients like usual on --port
Initiate the handshake with a master, and once complete, continuously listen for propagated commands on the replication connection (the same one used for the handshake)

So there’s no blocking needed per-se, however the replicate does need to complete the handshake before it can start receiving propagated commands. It can start responding to regular commands from clients right away, that isn’t blocked on anything.

remuspoienar · April 19, 2024, 10:22am

Thanks @rohitpaulk
Unfortunately im out of options so i ll just switch to another track, because locally all this works, and managed to connect 3 replicas and they all get propagated commands and have it ready for redis clients. For me this was an issue with the test, because I already pass the replica requirements, locally, all handshakes are smoothly done, with proper logging and attention. I wouldn’t be here if i hadn’t done that first, i assure you, since it wouldn’t be fair to complain in that case.

As for hte blocking part i meant the replicas blocking until they finish the handshake and start the replication loop, reusing the connection, as you said, exactly what i 've done as well. I hoped that if i delayed with some sleep on both master replica thinking it would no longer send multiple cmds in the same read buf, but ofc not the case since the test boots another master - got that bit now, thanks.
But the replica still has to block if it received a GET foo command while its doing the handshake. i tried handshake in a goroutine and also in a blocking way, the result is the same. The get commands are sent while the hanshake is in progress.
I managed to solve that and now i have to parse multiple commands from the same read operation in the replication loop. Every test run yields a difference there, so this is why i tried delaying stuff

NexFlare · April 19, 2024, 7:20pm

That is a great observation. I looked into this and it seems to have solved the problem.

ValentinJub · April 20, 2024, 9:23am

I’m faced with the same issue as others, in stage 13 my replica receives all 3 requests at the same time and now I have to find a way to parse that.

I feel frustrated to have to tweak my code to pass this particular stage’s tests, it’s the only one where I’m receiving many requests in one TCP stream and like others I noted that it is not my Master sending it because however much delay I put between each request they get ignored

nishojib · April 21, 2024, 7:05am

I am glad it helped your issue. That took me way too long to figure out… what worked was drawing it out in a piece of paper and tracking each and every request and response in both master and replicas with logging… another issue was it was passing “sometimes” which made things even more confusing… but that’s how softwares are I suppose haha… it was fun nonetheless.

kamilogorek · April 22, 2024, 3:46pm

Can confirm that the issue was indeed the RDS file being squashed together with SET commands. Needed to parse the remaining handshake buffer and apply replicated commands to make it work. Thanks @nishojib!

system · April 27, 2024, 3:46pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.

system · June 4, 2024, 5:41am

Note: I’ve updated the title of this post to include the stage ID (#YG4). You can learn about the stages rename here: Upcoming change: Stages overhaul

Topic		Replies	Views
Redis Challenge Stage #YG4: only some set commands reach replica (Go) Challenges challenge:redis	8	204	October 4, 2024
Test master not waiting for response to get request? Challenges challenge:redis	9	70	November 12, 2024
Hardly Stuck on Challange #YG$ Challenges challenge:redis	21	131	August 1, 2024
[Gleam] Stuck on Redis Challenge, Replication, Stage 13 Challenges challenge:redis	3	91	June 5, 2024
Redis Challenge: Stuck on Command Processing Challenges challenge:redis	6	169	October 4, 2024

Replication (#YG4) GET commands received before SET?

Related topics