(docs) Inconsistency between challenge description and referenced docs for RDB parsing

The reference linked for the challenge at RDB File Format describes the encoding of resizedb section as,

This op code was introduced in RDB version 7.

It encodes two values to speed up RDB loading by avoiding additional resizes and rehashing.
The op code is **followed by two length-encoded integers** indicating:

Database hash table size
Expiry hash table size

where it states the opcode 0xFB is followed by two length-encoded integers. However, the challenge description defines it as size-encoded as,

FB                       // Indicates that hash table size information follows.
03                       /* The size of the hash table that stores the keys and values (size encoded).
                            Here, the total key-value hash table size is 3. */
02                       /* The size of the hash table that stores the expires of the keys (size encoded).
                            Here, the number of keys with an expiry is 2. */

In practice, the challenge description seems to be correct (in that the sizes are directly encoded as bytes). However, is this because we have limited keys for the test (that can be encoded directly as a single byte for hash table size), or am I understanding the hash table sizes differently?

It would be helpful if there was a standardized RDB file format specification we could regard as the source of truth, but I couldn’t find one.

Hey @pranjalpokharel7, great question!

In this context, length-encoded and size-encoded essentially refer to the same thing.

Would you mind elaborating a bit on what inconsistency you’re seeing beyond the naming difference?

The definitive reference for how this works is the Redis source code itself:

Ahh got it, I was confused by the naming difference itself. My problem was that I assumed length encoded integer meant an integer encoded in string encoding (i.e. the case of 0b11xxxxxx prefix) and tried to parse the initial byte information as length of bytes to read next. It seems I intermixed the two terms during my initial read.

This implementation below would fail if the sizes are larger than 63 (i.e. require more than two bytes).

    def _parse_resize_db(self, reader: RDBReader) -> tuple[int, int]:
        # these only passed the tests because they were 6-bit unsigned encoded
        db_ht_size = int.from_bytes(reader.read(1), "little")
        exp_ht_size = int.from_bytes(reader.read(1), "little")
        return db_ht_size, exp_ht_size

The tests didn’t catch it at the time because (I assume) the RDB file to be parsed doesn’t contain large enough keyspace (say, keyspace size to be represented by 14-bit unsigned) for this case to fail.

I guess in this case, the initial length we read as a part of encoding itself is the size of the tables?

1 Like

the initial length we read as a part of encoding itself is the size of the tables?

Would you mind sharing a screenshot to clarify what you’re referring to?

For eg, in the challenge description, the byte sequence for resizedb information is,

FB 03 02

What I’ve understood (as of now), is that the size of the keyspace table itself is length-encoded as 03 - the prefix is 0b00xxxxxx which means the remaining 6-bits actually represents the size of the table itself (which in this case is 3 or 0bxx000011.

Previously I thought they were string encoded integers - that 3 was actually the length of the number of bytes to read next which meant the size of the keyspace table would be 02 <next_byte> <next_byte> in little endian.

If I’ve come to the right conclusion now, it means previously I was confused about the size vs length encoding (due to the subtle naming difference), and I instead came to the conclusion that it was a question of length vs string encoding. I feel like the issue is resolved now and we can close this thread. Thanks!

1 Like

Thanks for sharing the details! Your understanding is now spot on. :tada:

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.