Replication #ZN8: Feedback with more detail on problem?

I’m stuck on stage 11 of replication in the redis challenge.

As far as I can tell things should be operating as expected but I can’t tell what about the behavior isn’t meeting this step’s expectations.

Here are my logs:

[your_program] command: Set("baz", "789", None)            
[replication-11] client: Received bytes: "+OK\r\n"         
[replication-11] client: Received RESP value: "OK"         
[replication-11] Received "OK"                             
[your_program] replicating set command to slave (127.0.0.1, 37342): "*3\r\n$3\r\nSET\r\n$3\r\nfoo\r\n$3\r\n123\r\n"   
[your_program] replicating set command to slave (127.0.0.1, 37342): "*3\r\n$3\r\nSET\r\n$3\r\nbar\r\n$3\r\n456\r\n"   
[your_program] replicating set command to slave (127.0.0.1, 37342): "*3\r\n$3\r\nSET\r\n$3\r\nbaz\r\n$3\r\n789\r\n"   
[replication-11] replica: Received bytes: ""               
[replication-11] Received: ""                              
[replication-11]            ^ error                        
[replication-11] Error: Expected start of a new RESP value (either +, -, :, $ or *)                                   
[replication-11] Test failed                               
[replication-11] Terminating program                       
[your_program] thread 'tokio-runtime-worker' panicked at src/main.rs:470:77:                                          
[your_program] failed to write set command to replication client: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }

And my code is on github

Thank you for reporting this!

From the logs point of view, improvements I can think of:

  • We should make it clear that after receiving the “OK”, we’re now checking for commands to be propagated on the replication connection. The next “Received bytes: xxx” log doesn’t make sense without this.
  • The replica: Received bytes: "" can be confusing when we actually didn’t read anything, we should probably omit that log entirely if we read 0 bytes.
  • The second Received: "" log is a tad confusing too, maybe we can say something more like Received: "" (no content received after 2 seconds).

Re: the issue itself, I think the last line is most relevant:

failed to write set command to replication client: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }

I’d expect this to happen if the connection was prematurely closed - looks like the handshake succeeded, so this must’ve happened after the handshake, and before the first command was propagated.

I haven’t taken a look at the code yet, thought I’d write down what I could infer from the logs first. Will take a look at the code sometime today!

@brittlewis12 I think I might have the wrong commit - can you try pushing the latest to GitHub? The one I pulled was dc18a209b5ce5cc7c6e1e28344cb7eceb65bc8d8: Add support for DEL command....... · brittlewis12/redis-codecrafters-rs@dc18a20 · GitHub.

On this commit, I don’t see the Received "OK" logs, it seems to be failing on the first SET command:

thanks so much for taking a look! very much appreciate your reflections regarding the logs.

that’s really strange, i just ran the same thing checking out that commit (i’ve been noodling around a bunch locally since) and i still got the same output (attached).

Aha, okay - will give this another go tomorrow! Will try running multiple times and see if the results are consistent + dig deeper.

Can confirm that I see the same logs now - my bad, I was running this against an old test runner that has a very old tester version.

Didn’t get far today, will revisit tomo!

@brittlewis12 Okay, think I’ve got an answer here!

Was about to go to bed and realized that there could be a conflict between the write & read parts since they use the same lock, and that turned out to be it:

Ensuring that the loop doesn’t loop again (i.e. lock for a read) after a stream is converted to a replication client fixes the issue here.

Logs:

This obviously isn’t a “proper” solution, since you’ll need to read from the client in later stages. Maybe using separate read / write locks will help?

This explains the behaviours we were seeing:

  • The “replicating to …” log happens before the lock is acquired, and at that point the program is waiting on the lock, which is held by the “read” call at the start of the loop.
  • The tester waiting for 2 seconds to receive a value, but eventually gave up. That caused the “Received: ‘’” error.
  • When the tester gave up, it closed the replica’s connection, causing the “broken pipe” error.
2 Likes

Ahh thank you so much Paul! I was already mulling over moving replication to use channels, but I’d have been a bit dissatisfied without understanding why that will resolve it. Thanks again!

1 Like

Note: I’ve updated the title of this post to include the stage ID (#ZN8). You can learn about the stages rename here: Upcoming change: Stages overhaul