#TU8 replicas not responding to REPLCONF GETACK *

I believe there’s a bug in the replicas in TU8: they’re not responding to REPLCONF GETACK *, which I’m using to check connectivity when I get a WAIT command.

Note that I also pass the next stage NA2 (WAIT relies on acks). But this logic is at odds with TU8 (WAIT does not rely on acks).

I do have a horrid hack to conditionally have WAIT return max connected clients when I get 0 acks… this will make me pass TU8, but I rather not do that.

[tester::#TU8] [test] client: $ redis-cli WAIT 3 500
[tester::#TU8] [test] client: Sent bytes: "*3\r\n$4\r\nWAIT\r\n$1\r\n3\r\n$3\r\n500\r\n"
[your_program] INFO Main - Server accepted request socket localPort=6379 port=54514
[your_program] INFO Main - Server waiting for input on port=6379
[your_program] INFO MemDB - REQ=[WAIT, 3, 500]
[your_program] DEBUG MemDB - Waiting 500ms...
[your_program] DEBUG MemDB - Waiting...
[your_program] DEBUG MemDB - Sending *3\r\n$8\r\nREPLCONF\r\n$6\r\nGETACK\r\n$1\r\n*\r\n thread=28
[your_program] DEBUG MemDB - Sending *3\r\n$8\r\nREPLCONF\r\n$6\r\nGETACK\r\n$1\r\n*\r\n thread=31
[your_program] DEBUG MemDB - Sending *3\r\n$8\r\nREPLCONF\r\n$6\r\nGETACK\r\n$1\r\n*\r\n thread=27
[your_program] DEBUG MemDB - Sending *3\r\n$8\r\nREPLCONF\r\n$6\r\nGETACK\r\n$1\r\n*\r\n thread=29
[your_program] DEBUG MemDB - Sending *3\r\n$8\r\nREPLCONF\r\n$6\r\nGETACK\r\n$1\r\n*\r\n thread=26
[your_program] DEBUG MemDB - Sending *3\r\n$8\r\nREPLCONF\r\n$6\r\nGETACK\r\n$1\r\n*\r\n thread=30
[your_program] DEBUG MemDB - Waiting...
[your_program] DEBUG MemDB - Waiting...
[your_program] DEBUG MemDB - Waiting...
[your_program] DEBUG MemDB - Waiting...
[your_program] DEBUG MemDB - Waiting...
[your_program] DEBUG MemDB - Waiting...
[your_program] DEBUG MemDB - Waiting...
[your_program] DEBUG MemDB - Waiting...
[your_program] DEBUG MemDB - Waiting...
[your_program] DEBUG MemDB - Responded acked=0 connected=6
[your_program] DEBUG MemDB - Bytes read=828
[tester::#TU8] [test] client: Received bytes: ":0\r\n"
[tester::#TU8] [test] client: Received RESP integer: 0
[tester::#TU8] Expected 6, got 0

Hey @HowieTKL, this is more of an edge case than an actual bug. Realistically, it’s possible that not all replicas respond within the specified timeout.

That said, I totally get why you’d rather avoid the hack.

You might want to try implementing the behavior described in the official Redis doc:

In the specific case of the implementation of WAIT , Redis remembers, for each client, the replication offset of the produced replication stream when a given write command was executed in the context of a given client. When WAIT is called Redis checks if the specified number of replicas already acknowledged this offset or a greater one.

Let me know if you’d like further clarification!

Thanks for the info! With the current challenge, we only rely on master’s GETACK, which means if replicas never respond, the master cannot update its acks for WAIT. Problematic.

Enhancement: allow replicas to async ping acks to master. Means master does not have to rely on GETACK alone. More context to your quote above from official docs:

Since the introduction of partial resynchronization with replicas (PSYNC feature) Redis replicas asynchronously ping their master with the offset they already processed in the replication stream. This is used in multiple ways:

  1. Detect timed out replicas.
  2. Perform a partial resynchronization after a disconnection.
  3. Implement WAIT.

Does seem like a nice enhancement to have. Either way, for WAIT to work optimally, replicas need to either reply to acks or async send their own acks.

1 Like

@HowieTKL Thanks for the insights! You’re absolutely right. It would definitely be a nice enhancement to have.

With the current challenge, we only rely on master’s GETACK , which means if replicas never respond, the master cannot update its acks for WAIT .

Since it’s an edge case where the master just started up, all offsets are zero and all replicas are trivially up-to-date. :wink:

When WAIT is called Redis checks if the specified number of replicas already acknowledged this offset or a greater one.

Following the official behavior is the easiest approach to eliminate the hack, as it naturally accounts for the case 0 >= 0.

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.