(docs) Inconsistency between challenge description and referenced docs for RDB parsing

pranjalpokharel7 · July 17, 2025, 2:38pm

The reference linked for the challenge at RDB File Format describes the encoding of resizedb section as,

This op code was introduced in RDB version 7.

It encodes two values to speed up RDB loading by avoiding additional resizes and rehashing.
The op code is **followed by two length-encoded integers** indicating:

Database hash table size
Expiry hash table size

where it states the opcode 0xFB is followed by two length-encoded integers. However, the challenge description defines it as size-encoded as,

FB                       // Indicates that hash table size information follows.
03                       /* The size of the hash table that stores the keys and values (size encoded).
                            Here, the total key-value hash table size is 3. */
02                       /* The size of the hash table that stores the expires of the keys (size encoded).
                            Here, the number of keys with an expiry is 2. */

In practice, the challenge description seems to be correct (in that the sizes are directly encoded as bytes). However, is this because we have limited keys for the test (that can be encoded directly as a single byte for hash table size), or am I understanding the hash table sizes differently?

It would be helpful if there was a standardized RDB file format specification we could regard as the source of truth, but I couldn’t find one.

andy1li · July 20, 2025, 2:24pm

Hey @pranjalpokharel7, great question!

In this context, length-encoded and size-encoded essentially refer to the same thing.

Would you mind elaborating a bit on what inconsistency you’re seeing beyond the naming difference?

The definitive reference for how this works is the Redis source code itself:

github.com/redis/redis

src/rdb.c

1e388d8b9


      
          * valid stored value, the caller should use rioGetReadError() to check for
          * errors after calling this function. */
          long long rdbLoadMillisecondTime(rio *rdb, int rdbver) {
             int64_t t64;
             if (rioRead(rdb,&t64,8) == 0) return LLONG_MAX;
             if (rdbver >= 9) /* Check the top comment of this function. */
                 memrev64ifbe(&t64); /* Convert in big endian if the system is BE. */
             return (long long)t64;
          }
          
          /* Saves an encoded length. The first two bits in the first byte are used to
          * hold the encoding type. See the RDB_* definitions for more information
          * on the types of encoding. */
          int rdbSaveLen(rio *rdb, uint64_t len) {
             unsigned char buf[2];
             size_t nwritten;
          
             if (len < (1<<6)) {
                 /* Save a 6 bit len */
                 buf[0] = (len&0xFF)|(RDB_6BITLEN<<6);
                 if (rdbWriteRaw(rdb,buf,1) == -1) return -1;

github.com/redis/redis

src/rdb.h

1e388d8b9


      
          /* Defines related to the dump file format. To store 32 bits lengths for short
           * keys requires a lot of space, so we check the most significant 2 bits of
           * the first byte to interpreter the length:
           *
           * 00|XXXXXX => if the two MSB are 00 the len is the 6 bits of this byte
           * 01|XXXXXX XXXXXXXX =>  01, the len is 14 bits, 6 bits + 8 bits of next byte
           * 10|000000 [32 bit integer] => A full 32 bit len in net byte order will follow
           * 10|000001 [64 bit integer] => A full 64 bit len in net byte order will follow
           * 11|OBKIND this means: specially encoded object will follow. The six bits
           *           number specify the kind of object that follows.
           *           See the RDB_ENC_* defines.
           *
           * Lengths up to 63 are stored using a single byte, most DB keys, and may
           * values, will fit inside. */
          #define RDB_6BITLEN 0
          #define RDB_14BITLEN 1
          #define RDB_32BITLEN 0x80
          #define RDB_64BITLEN 0x81
          #define RDB_ENCVAL 3
          #define RDB_LENERR UINT64_MAX

pranjalpokharel7 · July 21, 2025, 8:08am

Ahh got it, I was confused by the naming difference itself. My problem was that I assumed length encoded integer meant an integer encoded in string encoding (i.e. the case of 0b11xxxxxx prefix) and tried to parse the initial byte information as length of bytes to read next. It seems I intermixed the two terms during my initial read.

This implementation below would fail if the sizes are larger than 63 (i.e. require more than two bytes).

    def _parse_resize_db(self, reader: RDBReader) -> tuple[int, int]:
        # these only passed the tests because they were 6-bit unsigned encoded
        db_ht_size = int.from_bytes(reader.read(1), "little")
        exp_ht_size = int.from_bytes(reader.read(1), "little")
        return db_ht_size, exp_ht_size

The tests didn’t catch it at the time because (I assume) the RDB file to be parsed doesn’t contain large enough keyspace (say, keyspace size to be represented by 14-bit unsigned) for this case to fail.

I guess in this case, the initial length we read as a part of encoding itself is the size of the tables?

andy1li · July 21, 2025, 8:21am

the initial length we read as a part of encoding itself is the size of the tables?

Would you mind sharing a screenshot to clarify what you’re referring to?

pranjalpokharel7 · July 21, 2025, 8:52am

For eg, in the challenge description, the byte sequence for resizedb information is,

FB 03 02

What I’ve understood (as of now), is that the size of the keyspace table itself is length-encoded as 03 - the prefix is 0b00xxxxxx which means the remaining 6-bits actually represents the size of the table itself (which in this case is 3 or 0bxx000011.

Previously I thought they were string encoded integers - that 3 was actually the length of the number of bytes to read next which meant the size of the keyspace table would be 02 <next_byte> <next_byte> in little endian.

If I’ve come to the right conclusion now, it means previously I was confused about the size vs length encoding (due to the subtle naming difference), and I instead came to the conclusion that it was a question of length vs string encoding. I feel like the issue is resolved now and we can close this thread. Thanks!

andy1li · July 21, 2025, 9:04am

Thanks for sharing the details! Your understanding is now spot on.

system · July 26, 2025, 9:05am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
RDB file parsing Challenges challenge:redis	3	341	May 22, 2024
Confusion about stage jz6 RDB file example (mismatched hashtable size?) Challenges challenge:redis	2	69	September 17, 2024
Reading length of Keys - Read a key challenge Challenges challenge:redis	3	32	March 17, 2025
Handle RDB version differences Challenges challenge:redis	13	593	September 10, 2024
[Go][#SM4] Potentially malformed test cases Challenges challenge:redis	2	53	July 16, 2025

(docs) Inconsistency between challenge description and referenced docs for RDB parsing

Related topics