Could you please help me to read a sample.db file?
Question
I’m trying to count all cells with a type of table in the SQLite sample.db file. To do this, I need to interpret the cell header, but I’m having trouble understanding the byte sequences involved.
For example, the record for row_id = 1 of the apples table, starting from offset 3983, is as follows:
The next 1B … serial type of type column → 23 → value length is (23 - 13) / 2 B
The next 1B … serial type of name column → 26 → value length is (26 - 13) / 2 B
The next 1B … serial type of tbl_name column → 26 → value length is (26 - 13) / 2 B
The next 1B … serial type of rootpage column → 1
The next 1B … serial type of sql column → 129 → value length is (129 - 13) / 2 B
The next 1B … Unknown → 57 → unknown value length is (57 - 13) / 2 B?
The next 5B … value of type column → ‘table’
The next 6B … value of name column → ‘apples’
The next 6B … value of tbl_name column → ‘apples’
The next 6B … value of tbl_name column → ‘2’
The next 58B … value of sql column → ‘CREATE TABLE apples\n(\n\tid integer primary key autoincremen’
The next 22B … value of sql column → 't,\n\tname text,\n\tcolor ’
There are two points I don’t understand:
What does the 1B between row_id and serial types represent? According to the SQLite documentation, the row_id should be immediately followed by the payload, i.e., the array of serial types.
Table B-Tree Leaf Cell (header 0x0d): A varint which is the total number of bytes of payload, including any overflow A varint which is the integer key, a.k.a. “rowid” The initial portion of the payload that does not spill to overflow pages. A 4-byte big-endian integer page number for the first page of the overflow page list - omitted if all payload fits on the b-tree page.
What does the 1B representing the serial type of the sql column (with a value of 57) indicate? Additionally, I’m concerned that the actual content length of the sql column exceeds the size that can be calculated from the serial type…
I solved this issue.
Initially, I didn’t understand how varints work. They should be decoded using the following logic
def decode_varint(byte_stream):
result = 0
shift = 0
for byte in byte_stream:
result = (result << 7) | (byte & 0b01111111)
if not (byte & 0b10000000):
return result
raise ValueError("Unexpected end of byte stream")
This is a complete byte analysis result.
================ page header ================
page_type : 13
first_freeblock : 0
number_of_cells : 3
cell_content_area : 3779
number_of_fragmented_free_bytes : 0
right_most_pointer : None
=============== cell header =================
No 0
pointer : 3779
payload_size : 120
row id : 3
header_size : 7
number of serial types : 5
number of contents : 5
each content info
- serial type : 23
serial type pointer : 3782
content size : 5
content : b'table'
content pointer : 3788
- serial type : 27
serial type pointer : 3783
content size : 7
content : b'oranges'
content pointer : 3793
- serial type : 27
serial type pointer : 3784
content size : 7
content : b'oranges'
content pointer : 3800
- serial type : 1
serial type pointer : 3785
content size : 1
content : b'\x04'
content pointer : 3807
- serial type : 199
serial type pointer : 3787
content size : 93
content : b'CREATE TABLE oranges\n(\n\tid integer primary key autoincrement,\n\tname text,\n\tdescription text\n)'
content pointer : 3808
=============== cell header =================
No 1
pointer : 3901
payload_size : 80
row id : 2
header_size : 6
number of serial types : 5
number of contents : 5
each content info
- serial type : 23
serial type pointer : 3904
content size : 5
content : b'table'
content pointer : 3909
- serial type : 43
serial type pointer : 3905
content size : 15
content : b'sqlite_sequence'
content pointer : 3914
- serial type : 43
serial type pointer : 3906
content size : 15
content : b'sqlite_sequence'
content pointer : 3929
- serial type : 1
serial type pointer : 3907
content size : 1
content : b'\x03'
content pointer : 3944
- serial type : 89
serial type pointer : 3908
content size : 38
content : b'CREATE TABLE sqlite_sequence(name,seq)'
content pointer : 3945
=============== cell header =================
No 2
pointer : 3983
payload_size : 111
row id : 1
header_size : 7
number of serial types : 5
number of contents : 5
each content info
- serial type : 23
serial type pointer : 3986
content size : 5
content : b'table'
content pointer : 3992
- serial type : 25
serial type pointer : 3987
content size : 6
content : b'apples'
content pointer : 3997
- serial type : 25
serial type pointer : 3988
content size : 6
content : b'apples'
content pointer : 4003
- serial type : 1
serial type pointer : 3989
content size : 1
content : b'\x02'
content pointer : 4009
- serial type : 185
serial type pointer : 3991
content size : 86
content : b'CREATE TABLE apples\n(\n\tid integer primary key autoincrement,\n\tname text,\n\tcolor text\n)'
content pointer : 4010