I need help with debugging duplicates when reading SQLite leaf interior pages

I’m stuck on Stage #WS9. It looks like I’m reading the same inforamtion from different leaf pages twice.

My understanding is that the root page of the superheroes table is an interior table page. So I read its children and I add them to the results of the query (if a child is a leaf page) or call my read_interior_page() recursively (if a child is an interior page).

I’ve tried to print the results that I’m getting from every leaf cell from a child of the root page and one layer below:

        for cell in &interior_page.cells {
            let pointer = (cell.left_child_page_num * u32::from(self.header.page_size)).into();
            let child = Self::get_page(self, pointer, None)?;
            match child {
                Page::LeafTable(leaf) => {
                    let mut r = Self::read_leaf_page(&leaf, query, table_info)?;
                    if !r.is_empty() {
                        println!("got from leaf table {}: {:?}", &leaf.offset / 4096, r);
                    }
                    res.append(&mut r);
                }
                Page::InteriorTable(interior) => {
                    for cell in &interior.cells {
                        let pointer =
                            (cell.left_child_page_num * u32::from(self.header.page_size)).into();
                        let child = Self::get_page(self, pointer, None)?;
                        match child {
                            Page::LeafTable(leaf) => {
                                let mut r = Self::read_leaf_page(&leaf, query, table_info)?;
                                if !r.is_empty() {
                                    println!(
                                        "got from INNER leaf table {}: {:?}",
                                        &leaf.offset / 4096,
                                        r
                                    );
                                }
                                res.append(&mut r);
                            }
                            Page::InteriorTable(_) => todo!("next layer"),
                            Page::LeafIndex => (),
                            Page::InteriorIndex => (),
                        }
                    }
                }
                _ => {
                    //dbg!("other type");
                }
            }
        }

Here are my logs:

got from leaf table 49: [["350", "Congorilla (New Earth)"]]
got from leaf table 36: [["1131", "Lambien (New Earth)"]]
got from INNER leaf table 75: [["350", "Congorilla (New Earth)"]]
got from INNER leaf table 94: [["1131", "Lambien (New Earth)"]]
got from INNER leaf table 151: [["4533", "Kal-El (DC One Million)"]]
got from INNER leaf table 123: [["5496", "Midas (New Earth)"]]
got from leaf table 271: [["4533", "Kal-El (DC One Million)"]]
got from leaf table 267: [["4765", "Ahura-Mazda (New Earth)"]]
got from leaf table 254: [["5496", "Midas (New Earth)"]]

So it looks like the numbers of the pages are different, but the data stored inside is the same.
This does not seem correct to me, so I would appreciate any tips on how to fix it. Thank you very much!

@rodio Looks like you’ve successfully passed #WS9. :tada:

Just checking – do you still need any assistance with this? If not, would you mind sharing how you got unstuck?

@andy1li thanks for your reply! It did pass, but only because I’ve added deduplication at the end. This way I know that all rows are present, but some of them are duplicated. Still don’t know why data is duplicated in the first place. I’d appreciate if you could help!

@rodio Got it. I’ll take a look as soon as possible.

EDIT: Sorry, I’m a bit tied up this week. I’ll get back to you early next week.

1 Like

I’ve noticed that codecrafters test runs different set of tests, is that correct? Anyway, the version of the code that I have now works on a set of tests that does not include ./your_program.sh test.db "SELECT id, name FROM superheroes WHERE hair_color = 'Reddish Brown Hair'" and $ ./your_program.sh test.db "SELECT id, name FROM superheroes WHERE eye_color = 'Amber Eyes'"

@rodio Yep, our tests use randomized queries to prevent hardcoded responses.

In the latest version of you code, I noticed some commented-out code for nested InteriorTable, so I added a panic! to check if it’s reachable—and it turns out it is.

Fleshing out that commented-out arm should bring you closer to passing this stage. Let me know if you’d like to discuss how to approach the implementation!

Thanks for your help! Yes, I’d like to discuss it further if possible.

I tried to read the nested interior tables through calling that “read_interior_page” recursively. This is when I see duplicates. As you can see in the logs in my first message, it looks like the root page has both child leaf pages and also child interior pages (log messages “got from INNER leaf table”). On the other hand, the documentation states that “In a well-formed database, all children of an interior b-tree have the same depth” so I think this should not be possible.

Is calling read_interior_page recursively a wrong approach?

@rodio I believe calling read_interior_page recursively is the right approach.

I’ve confirmed that superheroes.db contains duplicate pages, but I’ll need to check with the team to determine whether this is intentional or something that needs fixing.

I’ll keep you posted with any updates.

1 Like

Final findings:

  1. Ran sqlite3_analyzer superheroes.db

Table SUPERHEROES

Number of entries… 6895
B-tree depth… 2
Primary pages used… 109

A B-tree of depth 2 indicates that there are no nested interior table pages—just a root page with 108 leaf pages below it.

  1. @rodio Turns out there’s an off-by-one error in your code here:
fn read_interior_page(...) ->... {
    let pointer = (cell.left_child_page_num * ...).into();

Fixing this error should help your code pass stage #WS9.

  1. The previously encountered “nested interior table pages” were likely free pages with residual duplicate data.
1 Like

@andy1li Thank you for the detailed explanation! It was indeed that off-by-one error.

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.