which expects a timeout of 500 ms. However, it’s getting a timeout in 500 ms, resulting in 0 responses during that time, causing the test to fail. I tried adding a readTimeout for the duration, but it still fails, so I removed that.
Please let me know what I am doing wrong.
Here are my logs:
[replication-17] client: $ redis-cli WAIT 3 500
[replication-17] client: Sent bytes: "*3\r\n$4\r\nWAIT\r\n$1\r\n3\r\n$3\r\n500\r\n"
[your_program] 2024-04-28T07:22:34.579Z INFO master:6379: "Connection received from \"[::1]:36720\""
[your_program] 2024-04-28T07:22:34.579Z INFO master:6379: "Received command: \"*3\\r\\n$4\\r\\nWAIT\\r\\n$1\\r\\n3\\r\\n$3\\r\\n500\""
[your_program] 2024-04-28T07:22:34.579Z INFO master:6379: "Splitted Commands: [*3\r\n$4\r\nWAIT\r\n$1\r\n3\r\n$3\r\n500\r\n]"
[your_program] 2024-04-28T07:22:34.579Z INFO master:6379: "Waiting for 500 milliseconds"
[your_program] 2024-04-28T07:22:34.579Z INFO master:6379: "ACK request sent to replica server \"[::1]:36658\""
[your_program] 2024-04-28T07:22:34.579Z INFO master:6379: "ACK request sent to replica server \"[::1]:36670\""
[your_program] 2024-04-28T07:22:34.579Z INFO master:6379: "ACK request sent to replica server \"[::1]:36698\""
[your_program] 2024-04-28T07:22:34.579Z INFO master:6379: "ACK request sent to replica server \"[::1]:36712\""
[your_program] 2024-04-28T07:22:34.579Z INFO master:6379: "ACK request sent to replica server \"[::1]:36650\""
[your_program] 2024-04-28T07:22:34.579Z INFO master:6379: "ACK request sent to replica server \"[::1]:36686\""
[your_program] 2024-04-28T07:22:35.180Z INFO master:6379: "Timeout reached as per the WAIT command"
[your_program] 2024-04-28T07:22:35.180Z INFO master:6379: "Command is \"wait\" --> Response is \":0\\r\\n\""
[replication-17] client: Received bytes: ":0\r\n"
[replication-17] client: Received RESP value: 0
[replication-17] Expected 6, got 0
[replication-17] Test failed
[replication-17] Terminating program
[your_program] 2024-04-28T07:22:35.188Z INFO master:6379: "Connection closed by \"[::1]:36720\""
[replication-17] Program terminated successfully
And here’s a snippet of my code:
for {
select {
case ack := <-respChan:
log.LogInfo(fmt.Sprintf("ACK received: %t", ack))
totalAcks++
if ack {
ackReceived++
}
if totalAcks == replicaServersCount {
return ackReceived, nil
}
case <-ctx.Done():
log.LogInfo("Timeout reached as per the WAIT command")
return ackReceived, nil
case <-defaultTimeout.C:
log.LogInfo("No response received from replica servers in default timeout")
return ackReceived, nil
}
}
it seems to be in stage 17 when there is not previous write command you don’t actually need to send the ack command, just return the number of the connected replicas. i don’t know if the the intention of the challenge of not. but it seems to work in my code.
Does this not make stage 17 incompatible with stage 18? My code passes stage 18 when I default to returning the number of acknowledged replicas on timeout, but it only passes stage 17 if I default to returning the total connected replicas. Can you update the stage 17 test so that the replicas will respond to all “REPLCONF GETACK *” requests?
Relevant code that’s working for stage 18:
scope.launch {
val numReplicas = args[0].toInt()
val timeout = args[1].toLong()
val connectedReplicaCount = replicaManager.getReplicaCount()
if (connectedReplicaCount == 0) {
println("No replicas connected")
socket.write(Buffer.buffer(RespSerializer().createInteger(0)))
return@launch
}
val result =
withTimeoutOrNull(timeout) {
while (replicaManager.getAcknowledgmentCount() < connectedReplicaCount && replicaManager.getAcknowledgmentCount() < numReplicas) {
delay(10) // Check every 100ms
println(
"Waiting for $numReplicas replicas to acknowledge. Current count: ${replicaManager.getAcknowledgmentCount()}",
)
}
replicaManager.getAcknowledgmentCount()
} ?: replicaManager.getAcknowledgmentCount().also {
println("Timeout waiting for replicas to acknowledge")
}
socket.write(Buffer.buffer(RespSerializer().createInteger(result)))
}
Here’s what an actual Redis server does (we test against official Redis binaries 100s of times on each release to make sure our tester isn’t faulty):
A master keeps track of the current replication offset, and the replication offset of every connected replica.
The current replication offset incremented every time a new write command is sent to the replication stream
The master’s view of each replica’s replication offset is updated whenever a response to GETACK is received.
When a WAIT command is received
A master first checks its replication offset against the offset it has for connected replicas - if all values are equal, it can safely return the number of connected replicas without having to issue GETACK requests.
If the current replication offset is ahead of what the master has recorded for connected replicas, the master issues GETACK requests to all replicas and waits for responses before responding.
Note: I’ve updated the title of this post to include the stage ID (#NA2). You can learn about the stages rename here: Upcoming change: Stages overhaul.