Stuck on #ZU2 in the Redis challenge with Zig and io_uring

I’m stuck on Stage #ZU2 of the Redis challenge in Zig.

I have a similar issue to Redis Stage #ZU2 in Zig. I don't know why the test fails, except my setup is a bit different: I implemented a simple event loop based on the io_uring API (called directly through the io_uring system calls).

I’ve tried making concurrent requests to my program using multiple terminals and while loops, I’ve tried running multiple requests in parallel with GNU parallel, I’ve even run the tester code locally (you can find it here: github DOT com SLASH codecrafters-io/redis-tester). No matter what, it works completely fine on my local machine, but the test runner consistently fails.

Here are my logs:

remote: Running tests on your code. Logs should appear shortly...
remote:
remote: [compile] Moved ./.codecrafters/run.sh → ./your_program.sh
remote: [compile] Compilation successful.
remote:
remote: [tester::#QQ0] Running tests for Stage #QQ0 (Implement the ECHO command)
remote: [tester::#QQ0] $ ./your_program.sh
remote: [your_program] Logs from your program will appear here!
remote: [your_program] Waiting for something to come through
remote: [tester::#QQ0] $ redis-cli ECHO grape
remote: [your_program] Event fd/type/ptr: 3 event_queue.EVENT_TYPE.CONNECTION event_queue.Event@7fff0f2cc8f0
remote: [your_program] accepted new connection
remote: [your_program] Waiting for something to come through
remote: [your_program] Event fd/type/ptr: 5 event_queue.EVENT_TYPE.RECEIVE_COMMAND event_queue.Event@7ff119fa3000
remote: [your_program] Waiting for something to come through
remote: [your_program] Event fd/type/ptr: 5 event_queue.EVENT_TYPE.SENT_RESPONSE event_queue.Event@7ff119f9f000
remote: [your_program] Waiting for something to come through
remote: [tester::#QQ0] Received "grape"
remote: [tester::#QQ0] Test passed.
remote: [your_program] Event fd/type/ptr: 5 event_queue.EVENT_TYPE.RECEIVE_COMMAND event_queue.Event@7ff119f9e000
remote: [your_program] Waiting for something to come through
remote:
remote: [tester::#ZU2] Running tests for Stage #ZU2 (Handle concurrent clients)
remote: [tester::#ZU2] $ ./your_program.sh
remote: [tester::#ZU2] client-1: $ redis-cli PING
remote: [your_program] Logs from your program will appear here!
remote: [your_program] Waiting for something to come through
remote: [tester::#ZU2] Received: "" (no content received)
remote: [tester::#ZU2]            ^ error
remote: [tester::#ZU2] Error: Expected start of a new RESP2 value (either +, -, :, $ or *)
remote: [tester::#ZU2] Test failed (try setting 'debug: true' in your codecrafters.yml to see more details)
remote:
remote: View our article on debugging test failures: https://codecrafters.io/debug
remote:
To https://git.codecrafters.io/a263c44a0801f92f
   507ef2f..d079f28  master -> master

I’ve been in talks with @andy1li via email, and he mentioned that the same test code doesn’t have an issue with the official Redis implementation or other user implementations in other programming languages. Interestingly, he also mentioned that he noticed that on the failing runs, it seems that the request from the client arrives too early to the server (the client logs to the terminal, before the server does), and that adding a delay between starting my server and sending the request makes the tests consistently succeed:

You can find my code on my GitHub: GitHub - andrea-berling/rediz: A Redis implementation in Zig, for learning and fun. I’m quite puzzled on what’s going on, and I’m also frustrated about being blocked on this stage despite having a supposedly working program (I was even able to pass further tests running the codecrafters redis tester locally, I now have support for GETs and SETs with expirations). Does anybody have any suggestions? I could really use your help :folded_hands:

1 Like

UPDATE: I found the issue and was able to put a fix in place. If you’re curious about what was going on, check out this ChatGPT conversation: ChatGPT - Wait syscall signal handling. TL;DR: the tester was returning fine from the wait after killing my server, but that didn’t guarantee that the kernel had cleared up all of its internal data structures used to keep track of listening sockets. So, the listening socket for the Redis instance for the previous test was probably still lingering in the kernel and the next test was connecting to that one instead of the new one, and getting back a TCP RST upon sending the first command. Which also explains why waiting between tests helped mitigate this. I’ve added code to capture the SIGTERM sent by the tester to my program and a “cancel all” operation to my io_uring ring during cleanup, that seemed to do the trick :slightly_smiling_face:

Moral of the story is: cancel everything explicitly when using io_uring in your cleanup code, and make sure your cleanup code runs even if your program is terminated by a signal. Oh, and never give up of course :grin:

2 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.