Weird Decoding Behaviour of bytes in python

shreyasganesh0 · November 4, 2024, 1:49am

Anyone who knows python codec. Im seeing some weird behaviour and would love to know the reason

Here are my logs:

your_program] File is empty 'utf-8' codec can't decode byte 0xfa in position 9: invalid start byte

And here’s a snippet of my code:

self.file_byte_data_as_string = str(self.file_byte_data, 'utf-8')

It doesnt let me decode to utf-8 when I try to convert the file byte array directly but if i do this

header = str(self.file_byte_data[:5],'utf-8')

it works perfectly fine and i get a string.

andy1li · November 4, 2024, 2:30am

@shreyasganesh0 The first 9 bytes in an RDB file are valid ASCII/Unicode characters (REDIS0011 in this case).

However, the 10th byte at position 9 is 0xfa, which is not a valid Unicode character. Attempting to decode it as UTF-8 would result in the error you saw:

‘utf-8’ codec can’t decode byte 0xfa in position 9: invalid start byte

shreyasganesh0 · November 4, 2024, 2:31am

Oh that makes so much sense. Thanks!

system · November 9, 2024, 2:31am

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Redis Challenge #CF8 Challenges challenge:redis	4	48	October 21, 2024
#CF8 invalid RDB file Challenges challenge:redis	6	49	November 30, 2024
RDB file parsing Challenges challenge:redis	3	262	May 22, 2024
Issues with Socket connection after handling initial commands Challenges challenge:redis	3	63	August 6, 2024
RDB file: CRC64 checksum Challenges challenge:redis	5	36	January 24, 2025

Weird Decoding Behaviour of bytes in python

Related topics