Replication Stage (#XV6) Slave receive nothing on occasion

I’m stuck on Replication stage 14.

In stage 13, I have already support that the rdb file and commands may come from the TcpStream together and I can always pass the test. However, in stage 14 it seems that the slave sometimes just get nothing when the master send GetAck command. I add some logs, and everything seems fine (my read buf has length of 1024, and the stream is connecting to master).

Here are my logs in the situation that I passed the test:

[replication-14] Running tests for Replication > Stage #14: ACKs with no commands
[replication-14] Master is running on port 6379
[replication-14] $ ./spawn_redis_server.sh --port 6380 --replicaof localhost 6379
[replication-14] master: Received bytes: "*1\r\n$4\r\nPING\r\n"
[replication-14] master: Received RESP value: ["PING"]
[replication-14] Received ["PING"]
[your_program] slave connect with master: peer: 127.0.0.1:6379 local: 127.0.0.1:50914
[replication-14] master: Sent "PONG"
[replication-14] master: Sent bytes: "+PONG\r\n"
[replication-14] master: Received bytes: "*3\r\n$8\r\nREPLCONF\r\n$14\r\nlistening-port\r\n$4\r\n6380\r\n"
[replication-14] master: Received RESP value: ["REPLCONF", "listening-port", "6380"]
[replication-14] Received ["REPLCONF", "listening-port", "6380"]
[replication-14] master: Sent "OK"
[replication-14] master: Sent bytes: "+OK\r\n"
[replication-14] master: Received bytes: "*3\r\n$8\r\nREPLCONF\r\n$4\r\ncapa\r\n$6\r\npsync2\r\n"
[replication-14] master: Received RESP value: ["REPLCONF", "capa", "psync2"]
[replication-14] Received ["REPLCONF", "capa", "psync2"]
[replication-14] master: Sent "OK"
[replication-14] master: Sent bytes: "+OK\r\n"
[replication-14] master: Received bytes: "*3\r\n$5\r\nPSYNC\r\n$1\r\n?\r\n$2\r\n-1\r\n"
[replication-14] master: Received RESP value: ["PSYNC", "?", "-1"]
[replication-14] Received ["PSYNC", "?", "-1"]
[replication-14] master: Sent "FULLRESYNC 75cd7bc10c49047e0d163660f3b90625b1af31dc 0"
[replication-14] master: Sent bytes: "+FULLRESYNC 75cd7bc10c49047e0d163660f3b90625b1af31dc 0\r\n"
[replication-14] Sending RDB file...
[replication-14] master: Sent bytes: "$88\r\nREDIS0011\xfa\tredis-ver\x057.2.0\xfa\nredis-bits\xc0@\xfa\x05ctime\xc2m\b\xbce\xfa\bused-mem°\xc4\x10\x00\xfa\baof-base\xc0\x00\xff\xf0n;\xfe\xc0\xffZ\xa2"
[replication-14] Sent RDB file.
[replication-14] master: $ redis-cli REPLCONF GETACK *
[replication-14] master: Sent bytes: "*3\r\n$8\r\nREPLCONF\r\n$6\r\nGETACK\r\n$1\r\n*\r\n"
[your_program] connection to master: peer: 127.0.0.1:6379, local: 127.0.0.1:50914
[your_program] slave buf len: 1024, cursor: 0
[your_program] slave read 130 bytes from master
[your_program] get rdb file, buf remains: 37
[your_program] get command array, buf remains: 0
[your_program] get command in array: ["replconf", "getack", "*"]
[your_program] slave get command: Replconf(GetAck)
[your_program] offset is Some(0), length is 31
[your_program] slave need response: [Array([SimpleString("REPLCONF"), SimpleString("ACK"), SimpleString("0")])]
[your_program] connection to master: peer: 127.0.0.1:6379, local: 127.0.0.1:50914
[your_program] slave buf len: 1024, cursor: 0
[replication-14] master: Received bytes: "*3\r\n+REPLCONF\r\n+ACK\r\n+0\r\n"
[replication-14] master: Received RESP value: ["REPLCONF", "ACK", "0"]
[replication-14] Received ["REPLCONF", "ACK", "0"]
[replication-14] Test passed.
[replication-14] Terminating program
[your_program] slave read 0 bytes from master
[replication-14] Program terminated successfully

Here is the log in situation that I did not pass the test

remote: [replication-14] Running tests for Replication > Stage #14: ACKs with no commands
remote: [replication-14] Master is running on port 6379
remote: [replication-14] $ ./spawn_redis_server.sh --port 6380 --replicaof localhost 6379
remote: [your_program] slave connect with master: peer: 127.0.0.1:6379 local: 127.0.0.1:41210
remote: [replication-14] master: Received bytes: "*1\r\n$4\r\nPING\r\n"
remote: [replication-14] master: Received RESP value: ["PING"]
remote: [replication-14] Received ["PING"]
remote: [replication-14] master: Sent "PONG"
remote: [replication-14] master: Sent bytes: "+PONG\r\n"
remote: [replication-14] master: Received bytes: "*3\r\n$8\r\nREPLCONF\r\n$14\r\nlistening-port\r\n$4\r\n6380\r\n"
remote: [replication-14] master: Received RESP value: ["REPLCONF", "listening-port", "6380"]
remote: [replication-14] Received ["REPLCONF", "listening-port", "6380"]
remote: [replication-14] master: Sent "OK"
remote: [replication-14] master: Sent bytes: "+OK\r\n"
remote: [replication-14] master: Received bytes: "*3\r\n$8\r\nREPLCONF\r\n$4\r\ncapa\r\n$6\r\npsync2\r\n"
remote: [replication-14] master: Received RESP value: ["REPLCONF", "capa", "psync2"]
remote: [replication-14] Received ["REPLCONF", "capa", "psync2"]
remote: [replication-14] master: Sent "OK"
remote: [replication-14] master: Sent bytes: "+OK\r\n"
remote: [replication-14] master: Received bytes: "*3\r\n$5\r\nPSYNC\r\n$1\r\n?\r\n$2\r\n-1\r\n"
remote: [replication-14] master: Received RESP value: ["PSYNC", "?", "-1"]
remote: [replication-14] Received ["PSYNC", "?", "-1"]
remote: [replication-14] master: Sent "FULLRESYNC 75cd7bc10c49047e0d163660f3b90625b1af31dc 0"
remote: [replication-14] master: Sent bytes: "+FULLRESYNC 75cd7bc10c49047e0d163660f3b90625b1af31dc 0\r\n"
remote: [replication-14] Sending RDB file...
remote: [replication-14] master: Sent bytes: "$88\r\nREDIS0011\xfa\tredis-ver\x057.2.0\xfa\nredis-bits\xc0@\xfa\x05ctime\xc2m\b\xbce\xfa\bused-mem°\xc4\x10\x00\xfa\baof-base\xc0\x00\xff\xf0n;\xfe\xc0\xffZ\xa2"
remote: [replication-14] Sent RDB file.
remote: [replication-14] master: $ redis-cli REPLCONF GETACK *
remote: [replication-14] master: Sent bytes: "*3\r\n$8\r\nREPLCONF\r\n$6\r\nGETACK\r\n$1\r\n*\r\n"
remote: [your_program] connection to master: peer: 127.0.0.1:6379, local: 127.0.0.1:41210
remote: [your_program] slave buf len: 1024, cursor: 0
remote: [replication-14] Received: "" (no content received)
remote: [replication-14]            ^ error
remote: [replication-14] Error: Expected start of a new RESP value (either +, -, :, $ or *)
remote: [replication-14] Test failed
remote: [replication-14] Terminating program
remote: [your_program] slave read 0 bytes from master
remote: [replication-14] Program terminated successfully

And here’s a snippet of my code that the slave listens to master

i                s.spawn(move || {
                    let mut buf = BytesMut::zeroed(1024);
                    let mut cursor = 0;
                    let mut offset = None;
                    loop {
                        println!(
                            "connection to master: peer: {}, local: {}",
                            connect.peer_addr().unwrap(),
                            connect.local_addr().unwrap()
                        );
                        println!("slave buf len: {}, cursor: {}", buf.len(), cursor);
                        let n = connect
                            .read(&mut buf[cursor..])
                            .context("read command from master")
                            .unwrap();
                        println!("slave read {n} bytes from master");
                        if n == 0 {
                            break;
                        }
   
                        let _ = buf.split_off(n);
// new_cursor is used, when the buf do not include the whole command, so next read will start will new_cursor
                        let (protocols, new_cursor) = Decoder
                            .decode_protocol(&mut buf)
                            .context("decode input bytes to protocol")
                            .unwrap();
                        protocols
                            .into_iter()
                            .for_each(|DecodedProtocol { protocol, length }| {
                                match protocol {
                                    RedisProtocol::Rdb(_) => {
                                        // TODO: deal with rdb file
                                    }
                                    RedisProtocol::Array(a) => {
                                        let command = RedisCommand::parse_from(a)
                                            .context(
                                                "parse redis command from array of string protocol",
                                            )
                                            .unwrap();
                                        println!("slave get command: {command:?}");
                                        let response = ref_self
                                            .execute(&command, &map, None, &mut offset, length)
                                            .context("execute command from master")
                                            .unwrap();
                                        println!("slave need response: {response:?}");
                                        if matches!(
                                            command,
                                            RedisCommand::Replconf(ReplconfCommand::GetAck)
                                        ) {
                                            response.iter().for_each(|r| {
                                                connect
                                                    .write_all(&r.as_bytes())
                                                    .context("write GetAck response to master")
                                                    .unwrap();
                                            });
                                        }
                                    }
                                    _ => {
                                        eprintln!(
                                        "other redis protocols are not support as input command"
                                        )
                                    }
                                }
                            });
                        match new_cursor {
                            Some(c) => cursor = c,
                            None => {
                                buf.resize(1024, 0);
                                cursor = 0
                            }
                        }
                    }
                });

I know the code may be too complicated… I just wanna show where I print the log and basic logic. Looking forward to you help.

Sorry for the delay here, I’ll take a look at this today!

@jjakj1 took a look at this, the flakiness has to do with the fact that after the handshake the program reads a value from the master and discards it (It must be expecting FULLRESYNC + the RDB file).

I added logs to the bytes that were read:

In a failure case, here’s what the value turns out to be:

(See how the GETACK is present in that data that was discarded?)

This is a case that needs to be handled - a TCP “read” isn’t guaranteed to match a single RESP value sent by a Redis server. You might receive a partial value, or you might receive multiple values at once (you’ll need to parse the data to figure out which case it is)

Hope this helps!

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.

Note: I’ve updated the title of this post to include the stage ID (#XV6). You can learn about the stages rename here: Upcoming change: Stages overhaul.