Stage #KU4: stuck on a weird bug, help needed

Hello,

So I’ve been trying to figure out what was wrong in my code when it came to reading the cluster metadata file (as I elected to completely rewrite the code since it became quickly evident my previous strategy would not work for finding multiple partitions), and I completely fail to understand how the record count is so big in a record batch. I would really appreciate any help on this.

From what I could see, the record count somehow takes a huge value…? And it seems to vary based on the metadata log file’s content. Sometimes it works and shows a reasonable value (in which case the test fails anyway but for other reasons, and I want to investigate this first as it’s mission critical), and other times I get this (I have no idea why this is):

remote: [tester::#KU4] Connection to broker at localhost:9092 successful
remote: [tester::#KU4] Sending "DescribeTopicPartitions" (version: 0) request (Correlation id: 1769793653)
remote: [tester::#KU4] Hexdump of sent "DescribeTopicPartitions" request: 
remote: [tester::#KU4] Idx  | Hex                                             | ASCII
remote: [tester::#KU4] -----+-------------------------------------------------+-----------------
remote: [tester::#KU4] 0000 | 00 00 00 23 00 4b 00 00 69 7c e8 75 00 0c 6b 61 | ...#.K..i|.u..ka
remote: [tester::#KU4] 0010 | 66 6b 61 2d 74 65 73 74 65 72 00 02 04 70 61 7a | fka-tester...paz
remote: [tester::#KU4] 0020 | 00 00 00 00 02 ff 00                            | .......
remote: [tester::#KU4] 
remote: [your_program] Client connected at file descriptor: 4        
remote: [your_program] Request API Key: 75        
remote: [your_program] Request API Version: 0        
remote: [your_program] Request Correlation ID: 1769793653        
remote: [your_program] Read 663 bytes from cluster metadata file        
remote: [your_program] Created a new record batch        
remote: [your_program] Read base offset: 1        
remote: [your_program] Read batch length: 79        
remote: [your_program] Read partition leader epoch: 1        
remote: [your_program] Read magic byte        
remote: [your_program] Read CRC checksum: b069457c        
remote: [your_program] Read batch attributes: 0        
remote: [your_program] Read last offset delta: 0        
remote: [your_program] Read base timestamp        
remote: [your_program] Read max timestamp        
remote: [your_program] Read producer ID: -1        
remote: [your_program] Read producer epoch: -1        
remote: [your_program] Read base sequence: -1        
remote: [your_program] Read record count: 1        
remote: [your_program] Resized record vector        
remote: [your_program] Read record length        
remote: [your_program] Read record attributes        
remote: [your_program] Read timestamp delta        
remote: [your_program] Read offset delta        
remote: [your_program] Read key length        
remote: [your_program] Resized key string        
remote: [your_program] Created a new record batch        
remote: [your_program] Read base offset: 7310575178888798510        
remote: [your_program] Read batch length: 1986359923        
remote: [your_program] Read partition leader epoch: 1768910336        
remote: [your_program] Read magic byte        
remote: [your_program] Read CRC checksum: 0        
remote: [your_program] Read batch attributes: 0        
remote: [your_program] Read last offset delta: 2        
remote: [your_program] Read base timestamp        
remote: [your_program] Read max timestamp        
remote: [your_program] Read producer ID: 1099511730656        
remote: [your_program] Read producer epoch: 23341        
remote: [your_program] Read base sequence: 352321537        
remote: [your_program] Read record count: 91e05b2d        
remote: [your_program] terminate called after throwing an instance of 'std::bad_alloc'        
remote: [your_program]   what():  std::bad_alloc        
remote: [tester::#KU4] EOF
remote: [tester::#KU4] Test failed
remote: [tester::#KU4] Terminating program
remote: [tester::#KU4] Program terminated successfully
remote: 
remote: Try our CLI to run tests faster without Git: https://codecrafters.io/cli        
remote: 
remote: View our article on debugging test failures: https://codecrafters.io/debug        
remote: 
To https://git.codecrafters.io/ab320d5cff2dda65
 	refs/heads/master:refs/heads/master	96cffdf..71f317e
Done

The code itself should be available on GitHub, if you need any deeper look.

I would really appreciate any help given on this. After some thoughts, I think it might be a tester bug, as the output metadata log clearly isn’t that big (the log shows a reasonable amount of bytes, so I can only conclude the record length value in certain batches is botched somehow…)

Hey @fortwoone, could you try logging these fields as well? I tried tweaking your code but wasn’t able to print anything useful.

After some thoughts, I think it might be a tester bug

For context, our tester has been validated against the official Kafka, so a bug on our end is pretty unlikely. :sweat_smile:

Oh, I see. Let me log the fields in question and see what happens

@fortwoone Let me know if you’d like me to take another look once those fields are logged.

Okay, so I also suspect I don’t read varints correctly after reading the entire log again. The first batch processes fine, it’s the second one that starts mucking tests up, so it’s probably some varint-related problem

So, what I found here is that indeed, the varints not being read correctly (due to me still expecting them to take only one byte at that point) were what mucked up the data reading after the first record batch. The problem was solved once I created a varint type specifically for this

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.