Weird Decoding Behaviour of bytes in python

Anyone who knows python codec. Im seeing some weird behaviour and would love to know the reason

Here are my logs:

your_program] File is empty 'utf-8' codec can't decode byte 0xfa in position 9: invalid start byte

And here’s a snippet of my code:

self.file_byte_data_as_string = str(self.file_byte_data, 'utf-8')

It doesnt let me decode to utf-8 when I try to convert the file byte array directly but if i do this

header = str(self.file_byte_data[:5],'utf-8')

it works perfectly fine and i get a string.

@shreyasganesh0 The first 9 bytes in an RDB file are valid ASCII/Unicode characters (REDIS0011 in this case).

However, the 10th byte at position 9 is 0xfa, which is not a valid Unicode character. Attempting to decode it as UTF-8 would result in the error you saw:

‘utf-8’ codec can’t decode byte 0xfa in position 9: invalid start byte

Oh that makes so much sense. Thanks!

1 Like