Redis Replication #NA2: Timeout Issue

JayeshBaldawa · April 28, 2024, 7:33am

I’m stuck on this stage.

So, I received the command as:

*3\r\n$4\r\nWAIT\r\n$1\r\n3\r\n$3\r\n500

which expects a timeout of 500 ms. However, it’s getting a timeout in 500 ms, resulting in 0 responses during that time, causing the test to fail. I tried adding a readTimeout for the duration, but it still fails, so I removed that.

Please let me know what I am doing wrong.

Here are my logs:

[replication-17] client: $ redis-cli WAIT 3 500
[replication-17] client: Sent bytes: "*3\r\n$4\r\nWAIT\r\n$1\r\n3\r\n$3\r\n500\r\n"
[your_program] 2024-04-28T07:22:34.579Z INFO    master:6379: "Connection received from \"[::1]:36720\""
[your_program] 2024-04-28T07:22:34.579Z INFO    master:6379: "Received command: \"*3\\r\\n$4\\r\\nWAIT\\r\\n$1\\r\\n3\\r\\n$3\\r\\n500\""
[your_program] 2024-04-28T07:22:34.579Z INFO    master:6379: "Splitted Commands: [*3\r\n$4\r\nWAIT\r\n$1\r\n3\r\n$3\r\n500\r\n]"
[your_program] 2024-04-28T07:22:34.579Z INFO    master:6379: "Waiting for 500 milliseconds"
[your_program] 2024-04-28T07:22:34.579Z INFO    master:6379: "ACK request sent to replica server \"[::1]:36658\""
[your_program] 2024-04-28T07:22:34.579Z INFO    master:6379: "ACK request sent to replica server \"[::1]:36670\""
[your_program] 2024-04-28T07:22:34.579Z INFO    master:6379: "ACK request sent to replica server \"[::1]:36698\""
[your_program] 2024-04-28T07:22:34.579Z INFO    master:6379: "ACK request sent to replica server \"[::1]:36712\""
[your_program] 2024-04-28T07:22:34.579Z INFO    master:6379: "ACK request sent to replica server \"[::1]:36650\""
[your_program] 2024-04-28T07:22:34.579Z INFO    master:6379: "ACK request sent to replica server \"[::1]:36686\""
[your_program] 2024-04-28T07:22:35.180Z INFO    master:6379: "Timeout reached as per the WAIT command"
[your_program] 2024-04-28T07:22:35.180Z INFO    master:6379: "Command is \"wait\" --> Response is \":0\\r\\n\""
[replication-17] client: Received bytes: ":0\r\n"
[replication-17] client: Received RESP value: 0
[replication-17] Expected 6, got 0
[replication-17] Test failed
[replication-17] Terminating program
[your_program] 2024-04-28T07:22:35.188Z INFO    master:6379: "Connection closed by \"[::1]:36720\""
[replication-17] Program terminated successfully

And here’s a snippet of my code:


	for {
				select {
				case ack := <-respChan:
					log.LogInfo(fmt.Sprintf("ACK received: %t", ack))
					totalAcks++
					if ack {
						ackReceived++
					}
					if totalAcks == replicaServersCount {
						return ackReceived, nil
					}
				case <-ctx.Done():
					log.LogInfo("Timeout reached as per the WAIT command")
					return ackReceived, nil
				case <-defaultTimeout.C:
					log.LogInfo("No response received from replica servers in default timeout")
					return ackReceived, nil
				}
			}

mtfcd · April 28, 2024, 11:13am

it seems to be in stage 17 when there is not previous write command you don’t actually need to send the ack command, just return the number of the connected replicas. i don’t know if the the intention of the challenge of not. but it seems to work in my code.

JayeshBaldawa · May 1, 2024, 4:38pm

Yes, that was the case. Spent at least a week for this!
Thank you

rohitpaulk · May 4, 2024, 1:25pm

Yep! We’ll make this more clear when we work on instructions for this stage, sorry it’s taken so long

malshoff · May 8, 2024, 5:16am

Does this not make stage 17 incompatible with stage 18? My code passes stage 18 when I default to returning the number of acknowledged replicas on timeout, but it only passes stage 17 if I default to returning the total connected replicas. Can you update the stage 17 test so that the replicas will respond to all “REPLCONF GETACK *” requests?

Relevant code that’s working for stage 18:

scope.launch {
            val numReplicas = args[0].toInt()
            val timeout = args[1].toLong()
            val connectedReplicaCount = replicaManager.getReplicaCount()
            if (connectedReplicaCount == 0) {
                println("No replicas connected")
                socket.write(Buffer.buffer(RespSerializer().createInteger(0)))
                return@launch
            }

            val result =
                withTimeoutOrNull(timeout) {
                    while (replicaManager.getAcknowledgmentCount() < connectedReplicaCount && replicaManager.getAcknowledgmentCount() < numReplicas) {
                        delay(10) // Check every 100ms
                        println(
                            "Waiting for $numReplicas replicas to acknowledge. Current count: ${replicaManager.getAcknowledgmentCount()}",
                        )
                    }
                    replicaManager.getAcknowledgmentCount()
                } ?: replicaManager.getAcknowledgmentCount().also {
                    println("Timeout waiting for replicas to acknowledge")
                }

            socket.write(Buffer.buffer(RespSerializer().createInteger(result)))
        }

rohitpaulk · May 8, 2024, 9:58am

Ah, I can see how this’d be confusing.

Here’s what an actual Redis server does (we test against official Redis binaries 100s of times on each release to make sure our tester isn’t faulty):

A master keeps track of the current replication offset, and the replication offset of every connected replica.
- The current replication offset incremented every time a new write command is sent to the replication stream
- The master’s view of each replica’s replication offset is updated whenever a response to GETACK is received.
When a WAIT command is received
- A master first checks its replication offset against the offset it has for connected replicas - if all values are equal, it can safely return the number of connected replicas without having to issue GETACK requests.
- If the current replication offset is ahead of what the master has recorded for connected replicas, the master issues GETACK requests to all replicas and waits for responses before responding.

malshoff · May 8, 2024, 2:34pm

That was very helpful, thanks! I definitely overlooked that when reading the docs. I got it working now with a small change.

system · May 13, 2024, 2:34pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.

system · June 4, 2024, 2:33am

Note: I’ve updated the title of this post to include the stage ID (#NA2). You can learn about the stages rename here: Upcoming change: Stages overhaul.

Topic		Replies	Views
Redis Replication #NA2: timeout before receiving expected number of responses Challenges challenge:redis	10	275	June 12, 2024
Getting timeout #bs1 Bug Reports challenge:redis	6	47	December 8, 2024
Go Redis Replication #NA2 Challenges challenge:redis	5	282	December 18, 2024
Ruby: Redis Replication Stage #NA2: Error reading replica acknowledgement Challenges challenge:redis	12	293	June 4, 2024
Wait with multiple commands ,replicas extension on python Challenges challenge:redis	6	45	October 4, 2024

Redis Replication #NA2: Timeout Issue

Related topics