Could you please help me to read a sample.db file?
Question
I’m trying to count all cells with a type of table in the SQLite sample.db file. To do this, I need to interpret the cell header, but I’m having trouble understanding the byte sequences involved.
For example, the record for row_id = 1 of the apples table, starting from offset 3983, is as follows:
% xxd -s 3983 -l 128 -C sample.db
00000f8f: 6f01 0717 1919 0181 3974 6162 6c65 6170 o.......9tableap
00000f9f: 706c 6573 6170 706c 6573 0243 5245 4154 plesapples.CREAT
00000faf: 4520 5441 424c 4520 6170 706c 6573 0a28 E TABLE apples.(
00000fbf: 0a09 6964 2069 6e74 6567 6572 2070 7269 ..id integer pri
00000fcf: 6d61 7279 206b 6579 2061 7574 6f69 6e63 mary key autoinc
00000fdf: 7265 6d65 6e74 2c0a 096e 616d 6520 7465 rement,..name te
00000fef: 7874 2c0a 0963 6f6c 6f72 2074 6578 740a xt,..color text.
00000fff: 290d 0000 0004 0fa1 000f e30f d60f bd0f )...............
I’ve interpreted this as follows:
Starting from offset 3983:
- The first 1B … payload size → 111
- The next 1B … row_id → 1
- The next 1B … Unknown → 7
- The next 1B … serial type of
type column → 23 → value length is (23 - 13) / 2 B
- The next 1B … serial type of
name column → 26 → value length is (26 - 13) / 2 B
- The next 1B … serial type of
tbl_name column → 26 → value length is (26 - 13) / 2 B
- The next 1B … serial type of
rootpage column → 1
- The next 1B … serial type of
sql column → 129 → value length is (129 - 13) / 2 B
- The next 1B … Unknown → 57 → unknown value length is (57 - 13) / 2 B?
- The next 5B … value of
type column → ‘table’
- The next 6B … value of
name column → ‘apples’
- The next 6B … value of
tbl_name column → ‘apples’
- The next 6B … value of
tbl_name column → ‘2’
- The next 58B … value of
sql column → ‘CREATE TABLE apples\n(\n\tid integer primary key autoincremen’
- The next 22B … value of
sql column → 't,\n\tname text,\n\tcolor ’
There are two points I don’t understand:
- What does the 1B between
row_id and serial types represent? According to the SQLite documentation, the row_id should be immediately followed by the payload, i.e., the array of serial types.
Table B-Tree Leaf Cell (header 0x0d): A varint which is the total number of bytes of payload, including any overflow A varint which is the integer key, a.k.a. “rowid” The initial portion of the payload that does not spill to overflow pages. A 4-byte big-endian integer page number for the first page of the overflow page list - omitted if all payload fits on the b-tree page.
- What does the 1B representing the
serial type of the sql column (with a value of 57) indicate? Additionally, I’m concerned that the actual content length of the sql column exceeds the size that can be calculated from the serial type…
I found the following description.
A record contains a header and a body, in that order. The header begins with a single varint which determines the total number of bytes in the header.
I solved this issue.
Initially, I didn’t understand how varints work. They should be decoded using the following logic
def decode_varint(byte_stream):
result = 0
shift = 0
for byte in byte_stream:
result = (result << 7) | (byte & 0b01111111)
if not (byte & 0b10000000):
return result
raise ValueError("Unexpected end of byte stream")
This is a complete byte analysis result.
================ page header ================
page_type : 13
first_freeblock : 0
number_of_cells : 3
cell_content_area : 3779
number_of_fragmented_free_bytes : 0
right_most_pointer : None
=============== cell header =================
No 0
pointer : 3779
payload_size : 120
row id : 3
header_size : 7
number of serial types : 5
number of contents : 5
each content info
- serial type : 23
serial type pointer : 3782
content size : 5
content : b'table'
content pointer : 3788
- serial type : 27
serial type pointer : 3783
content size : 7
content : b'oranges'
content pointer : 3793
- serial type : 27
serial type pointer : 3784
content size : 7
content : b'oranges'
content pointer : 3800
- serial type : 1
serial type pointer : 3785
content size : 1
content : b'\x04'
content pointer : 3807
- serial type : 199
serial type pointer : 3787
content size : 93
content : b'CREATE TABLE oranges\n(\n\tid integer primary key autoincrement,\n\tname text,\n\tdescription text\n)'
content pointer : 3808
=============== cell header =================
No 1
pointer : 3901
payload_size : 80
row id : 2
header_size : 6
number of serial types : 5
number of contents : 5
each content info
- serial type : 23
serial type pointer : 3904
content size : 5
content : b'table'
content pointer : 3909
- serial type : 43
serial type pointer : 3905
content size : 15
content : b'sqlite_sequence'
content pointer : 3914
- serial type : 43
serial type pointer : 3906
content size : 15
content : b'sqlite_sequence'
content pointer : 3929
- serial type : 1
serial type pointer : 3907
content size : 1
content : b'\x03'
content pointer : 3944
- serial type : 89
serial type pointer : 3908
content size : 38
content : b'CREATE TABLE sqlite_sequence(name,seq)'
content pointer : 3945
=============== cell header =================
No 2
pointer : 3983
payload_size : 111
row id : 1
header_size : 7
number of serial types : 5
number of contents : 5
each content info
- serial type : 23
serial type pointer : 3986
content size : 5
content : b'table'
content pointer : 3992
- serial type : 25
serial type pointer : 3987
content size : 6
content : b'apples'
content pointer : 3997
- serial type : 25
serial type pointer : 3988
content size : 6
content : b'apples'
content pointer : 4003
- serial type : 1
serial type pointer : 3989
content size : 1
content : b'\x02'
content pointer : 4009
- serial type : 185
serial type pointer : 3991
content size : 86
content : b'CREATE TABLE apples\n(\n\tid integer primary key autoincrement,\n\tname text,\n\tcolor text\n)'
content pointer : 4010