Redis Replication #NA2: Timeout Issue

I’m stuck on this stage.

So, I received the command as:

*3\r\n$4\r\nWAIT\r\n$1\r\n3\r\n$3\r\n500

which expects a timeout of 500 ms. However, it’s getting a timeout in 500 ms, resulting in 0 responses during that time, causing the test to fail. I tried adding a readTimeout for the duration, but it still fails, so I removed that.

Please let me know what I am doing wrong.

Here are my logs:

[replication-17] client: $ redis-cli WAIT 3 500
[replication-17] client: Sent bytes: "*3\r\n$4\r\nWAIT\r\n$1\r\n3\r\n$3\r\n500\r\n"
[your_program] 2024-04-28T07:22:34.579Z INFO    master:6379: "Connection received from \"[::1]:36720\""
[your_program] 2024-04-28T07:22:34.579Z INFO    master:6379: "Received command: \"*3\\r\\n$4\\r\\nWAIT\\r\\n$1\\r\\n3\\r\\n$3\\r\\n500\""
[your_program] 2024-04-28T07:22:34.579Z INFO    master:6379: "Splitted Commands: [*3\r\n$4\r\nWAIT\r\n$1\r\n3\r\n$3\r\n500\r\n]"
[your_program] 2024-04-28T07:22:34.579Z INFO    master:6379: "Waiting for 500 milliseconds"
[your_program] 2024-04-28T07:22:34.579Z INFO    master:6379: "ACK request sent to replica server \"[::1]:36658\""
[your_program] 2024-04-28T07:22:34.579Z INFO    master:6379: "ACK request sent to replica server \"[::1]:36670\""
[your_program] 2024-04-28T07:22:34.579Z INFO    master:6379: "ACK request sent to replica server \"[::1]:36698\""
[your_program] 2024-04-28T07:22:34.579Z INFO    master:6379: "ACK request sent to replica server \"[::1]:36712\""
[your_program] 2024-04-28T07:22:34.579Z INFO    master:6379: "ACK request sent to replica server \"[::1]:36650\""
[your_program] 2024-04-28T07:22:34.579Z INFO    master:6379: "ACK request sent to replica server \"[::1]:36686\""
[your_program] 2024-04-28T07:22:35.180Z INFO    master:6379: "Timeout reached as per the WAIT command"
[your_program] 2024-04-28T07:22:35.180Z INFO    master:6379: "Command is \"wait\" --> Response is \":0\\r\\n\""
[replication-17] client: Received bytes: ":0\r\n"
[replication-17] client: Received RESP value: 0
[replication-17] Expected 6, got 0
[replication-17] Test failed
[replication-17] Terminating program
[your_program] 2024-04-28T07:22:35.188Z INFO    master:6379: "Connection closed by \"[::1]:36720\""
[replication-17] Program terminated successfully

And here’s a snippet of my code:


	for {
				select {
				case ack := <-respChan:
					log.LogInfo(fmt.Sprintf("ACK received: %t", ack))
					totalAcks++
					if ack {
						ackReceived++
					}
					if totalAcks == replicaServersCount {
						return ackReceived, nil
					}
				case <-ctx.Done():
					log.LogInfo("Timeout reached as per the WAIT command")
					return ackReceived, nil
				case <-defaultTimeout.C:
					log.LogInfo("No response received from replica servers in default timeout")
					return ackReceived, nil
				}
			}

it seems to be in stage 17 when there is not previous write command you don’t actually need to send the ack command, just return the number of the connected replicas. i don’t know if the the intention of the challenge of not. but it seems to work in my code.

1 Like

Yes, that was the case. Spent at least a week for this!
Thank you :slightly_smiling_face:

Yep! We’ll make this more clear when we work on instructions for this stage, sorry it’s taken so long

1 Like

Does this not make stage 17 incompatible with stage 18? My code passes stage 18 when I default to returning the number of acknowledged replicas on timeout, but it only passes stage 17 if I default to returning the total connected replicas. Can you update the stage 17 test so that the replicas will respond to all “REPLCONF GETACK *” requests?

Relevant code that’s working for stage 18:

scope.launch {
            val numReplicas = args[0].toInt()
            val timeout = args[1].toLong()
            val connectedReplicaCount = replicaManager.getReplicaCount()
            if (connectedReplicaCount == 0) {
                println("No replicas connected")
                socket.write(Buffer.buffer(RespSerializer().createInteger(0)))
                return@launch
            }

            val result =
                withTimeoutOrNull(timeout) {
                    while (replicaManager.getAcknowledgmentCount() < connectedReplicaCount && replicaManager.getAcknowledgmentCount() < numReplicas) {
                        delay(10) // Check every 100ms
                        println(
                            "Waiting for $numReplicas replicas to acknowledge. Current count: ${replicaManager.getAcknowledgmentCount()}",
                        )
                    }
                    replicaManager.getAcknowledgmentCount()
                } ?: replicaManager.getAcknowledgmentCount().also {
                    println("Timeout waiting for replicas to acknowledge")
                }

            socket.write(Buffer.buffer(RespSerializer().createInteger(result)))
        }

Ah, I can see how this’d be confusing.

Here’s what an actual Redis server does (we test against official Redis binaries 100s of times on each release to make sure our tester isn’t faulty):

  • A master keeps track of the current replication offset, and the replication offset of every connected replica.
    • The current replication offset incremented every time a new write command is sent to the replication stream
    • The master’s view of each replica’s replication offset is updated whenever a response to GETACK is received.
  • When a WAIT command is received
    • A master first checks its replication offset against the offset it has for connected replicas - if all values are equal, it can safely return the number of connected replicas without having to issue GETACK requests.
    • If the current replication offset is ahead of what the master has recorded for connected replicas, the master issues GETACK requests to all replicas and waits for responses before responding.

That was very helpful, thanks! I definitely overlooked that when reading the docs. I got it working now with a small change.

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.

Note: I’ve updated the title of this post to include the stage ID (#NA2). You can learn about the stages rename here: Upcoming change: Stages overhaul.