Building SQLite from Scratch: 740 Lines of C++23 to Understand Every Byte of a .db File

The article explains that SQLite is an embedded database engine stored in a single `.db` file, unlike client-server databases like MySQL. It introduces TinySqlite, a 740-line C++23 project that reads the raw binary structure of a `.db` file without using the official SQLite library, dissecting components like the file header, B-tree pages, and varint encoding. The piece emphasizes SQLite's ubiquity in everyday devices and aims to help readers understand the exact byte-level format that makes SQL queries work.

You fire up a MySQL client, connect to port 3306, send off your SQL, and the server parses, optimizes, hits an index, fetches rows, and packs the result back to you. You can picture that entire pipeline. SQLite has none of that. No server process, no port, no wire protocol. Just a single file: my.db . So the real question is — what exactly is stuffed inside that file that makes SELECT FROM apples WHERE color='Yellow' return the right answer? TinySqlite https://github.com/Tenaryo/TinySqlite takes this apart across 740 lines of C++23. It doesn't link against the official SQLite library. It opens a .db file's raw binary and pries the data directly out of the disk bytes. We'll follow its code path, peeling back SQLite's file format layer by layer. This article covers: file header → B-tree pages → varint encoding → the schema table → full table scans → index scans. What SQLite Actually Is Let's get the definition straight first. SQLite is an embedded relational database engine — in plain English: it's a C library you compile into your program, and once you open a file, you can run SQL against it. No server, no install, no root password. If you're familiar with MySQL, here's the mental model. MySQL is a restaurant — a dedicated kitchen server process , waitstaff connection handling , a complex ordering system query optimizer . You sit down, say "SELECT," and the back of house scrambles to bring you the dish. SQLite is your fridge. Open it, grab what you need, nobody serves you. The entire database is a single data.db file. Copy it, carry it, done. Traditional client-server database: SQLite's embedded model: ┌─────────┐ TCP/network ┌───────────┐ ┌──────────────────────────────┐ │ Your app │ ←───────────→ │ DB server │ │ Your app │ └─────────┘ └───────────┘ │ ├── libsqlite.so engine │ │ ├── data.db the only file │ │ └── all ops are local calls │ └──────────────────────────────┘ Why should you care? You might not encounter MySQL every day, but you're almost certainly already using SQLite. Your phone's contacts, WeChat messages, Chrome bookmarks and browsing history — all stored in SQLite. Every iPhone, every Android device, every browser runs a SQLite instance. It's probably the most deployed database engine on the planet, bar none. Using it is trivial. Create a database, make a table, insert data, query: bash $ sqlite3 test.db sqlite CREATE TABLE fruits name TEXT, price INT ; sqlite INSERT INTO fruits VALUES 'apple', 5 ; sqlite SELECT FROM fruits WHERE price < 10; apple|5 If you're writing C/C++, include sqlite3.h and a handful of lines embed a full database in your program. Great. You're using it comfortably. But do you actually know — what do the bytes inside test.db look like? Now flip roles. Stop being the user, become the reverse engineer. TinySqlite is a set of reverse-engineering notes that dissects the .db file's binary structure piece by piece. Let's begin. Opening the File — How a .db File Is Organized The Entire File Is a Chain of Pages At the macro level, a .db file is astoundingly simple: it's a sequence of fixed-size pages laid end to end. Every page is the same size typically 4096 bytes , numbered starting from page 1. Picture a bookshelf where every shelf slot is the same width. To find the 3rd book, you start from the shelf edge and count to position 3 × slot width . SQLite pages work the same way — the data for page N starts at file offset N-1 × page size . my.db file: ┌─────── page 1 ───────┐┌─────── page 2 ───────┐┌─────── page 3 ───────┐┌── ... │ file header 1st 100B ││ page header ││ page header │ │ page size = 4096 ││ type = 0x0D leaf ││ type = 0x05 interior │ num tables = 3 ││ cells = row1,row2 ││ child page ptrs │ │ ... ││ ... ││ ... │ └───────────────────────┘└──────────────────────┘└──────────────────────┘ Page 1 is special — its first 100 bytes form the file header , storing global metadata. Every page after that has only a page header followed by actual data. What the File Header Carries The first 100 bytes of page 1 in every SQLite file follow a fixed format. The first 16 bytes are the magic string "SQLite format 3\000" — the file's "ID card." It tells any program that tries to read the file: hey, I'm a SQLite 3 format database. Bytes 16–17 store the page size . Note that this is stored in big-endian — high byte first. If these two bytes read 0x10 0x00 , that's 4096. If the page size is 512, they'd be 0x02 0x00 . Here's how TinySqlite reads the page size: static constexpr size t kPageSizeOffset = 16; auto read u16 be size t offset const noexcept - uint16 t { return static cast<uint16 t data offset << 8 | static cast<uint16 t data offset + 1 ; } // In the constructor: page size read u16 be kPageSizeOffset Two bytes assembled into a uint16 t . No magic. Another critical number hides at byte offset 103 corresponding to SQLite's byte 56 : the table count. TinySqlite reads this to know how many tables the database holds — both system and user tables. static constexpr size t kSchemaCountOffset = 103; num tables read u16 be kSchemaCountOffset What a B-tree Page Looks Like Now that you know a file is a chain of pages, the next question is — what's inside a page? How is data actually organized? SQLite uses a B-tree to organize data. Each table is a B-tree, and each node in that tree is a page. Pages have two roles: - Interior pages type : These don't store actual data rows. They store "signposts" — child page numbers and key ranges. Their job is navigation: which subtree contains the data you're looking for. 0x05 - Leaf pages type : These hold the real row data. Every 0x0D INSERT ultimately lands in a cell on some leaf page. Visually: Page 2 interior, 0x05 / \ Page ? leaf, 0x0D Page ? leaf, 0x0D Granny Smith Fuji Golden Delicious Honeycrisp A page's internal structure the page header starts at page offset 0: | Offset | Size | Meaning | |---|---|---| | 0 | 1 byte | page type 0x05 =interior table, 0x0D =leaf table, 0x02 =interior index, 0x0A =leaf index | | 1 | 2 bytes | offset of first freeblock | | 3 | 2 bytes | number of cells | | 5 | 2 bytes | start of cell content area | | 8 | 4 bytes | number of fragmented free bytes | For interior pages, after the page header comes a rightmost child pointer 4 bytes , pointing to the rightmost child page. Then the cell pointer array — 2 bytes per cell, each pointing to that cell's actual location within the page. This "rightmost child + cell pointer array" structure is what enables the B-tree to hop between pages. We'll expand on this when we cover full table scans. At this point, you know three key facts: - A .db file is a sequence of fixed-size pages - The file header tells you the page size and how many tables exist - A B-tree organizes the data — interior pages navigate, leaf pages store rows The next natural question — how does SQLite know which tables exist and where each table's B-tree root lives? The answer is tucked inside a special table. sqlite master — The Database's "Table of Contents" The Table of Tables SQLite has a hidden system table called sqlite master . Think of it as the table of contents at the front of a book — it doesn't store your business data, it describes the structure of the entire database. sqlite master system table, exists in every .db file ┌──────────┬───────────┬──────────┬─────────────────────────────┐ │ type │ name │ rootpage │ sql │ ├──────────┼───────────┼──────────┼─────────────────────────────┤ │ "table" │ "apples" │ 2 │ "CREATE TABLE apples ... " │ │ "table" │ "oranges" │ 4 │ "CREATE TABLE oranges ... " │ │ "index" │ "idx ..." │ 6 │ "CREATE INDEX..." │ └──────────┴───────────┴──────────┴─────────────────────────────┘ Each row represents a table, index, view, or trigger. The two most important columns are name the table name and rootpage the root page number of that table's B-tree . When you later run SELECT FROM apples , that rootpage = 2 is how the engine finds the entry point to the apples table's data. So how do you find sqlite master itself? Its data lives at a fixed location — page 1. The file header's kSchemaCountOffset tells you how many rows there are, and right after it, starting at kCellPtrArrayStart offset 108 , is the cell pointer array — each 2-byte pointer references a cell within page 1 that belongs to sqlite master. But before we can actually parse those cells, we need two encoding tools — varint and serial type. They're how SQLite "writes numbers" and "describes types" on disk. Varint: Waste Not, Want Not SQLite's on-disk format leans heavily on a variable-length integer encoding called varint . The core idea is simple: small numbers take less space, big numbers take more. The rule: each byte contributes its lower 7 bits as data, and the highest bit bit 7 is a "continue" flag. If bit 7 is 1, there are more bytes coming. If it's 0, this is the last byte. Up to 9 bytes, with the 9th byte using all 8 bits. | Value | Encoding hex | Bytes | |---|---|---| | 5 | 05 | 1 | | 300 | 82 2C | 2 | | 1000000 | 3D 09 40 | 3 | 300 = 0b 0000 0010 0010 1100 . Split into 7-bit groups: 0000010 and 0101100 . Add the continue bit — high group gets 1 10000010 = 0x82 , low group gets 0 00101100 = 0x2C . read varint does the reverse, pulling the value back out of the byte stream: php auto read varint size t offset const noexcept - VarintResult { uint64 t value = 0; for int i = 0; i < 9; ++i { auto byte = static cast<uint8 t data offset + i ; if i == 8 { value = value << 8 | byte; return {value, 9}; } value = value << 7 | static cast<uint64 t byte & 0x7F ; if byte & 0x80 == 0 return {value, static cast<size t i + 1}; } std::unreachable ; } Each iteration grabs 7 bits and shifts them into the result. The loop stops when it hits a byte whose high bit is 0. If it reaches the 9th byte without stopping, it uses the full 8 bits — that's varint's maximum width. Serial Type: What Exactly Is in This Column Every column of a record carries a serial type code on disk. It tells the parser: is this column NULL, an integer, text, and how many bytes does it occupy? | Type code | Meaning | Size in bytes | |---|---|---| | 0 | NULL | 0 | | 1 ~ 4 | 1/2/3/4-byte integer | equals the type code | | 5 | 6-byte integer | 6 | | 6 | 8-byte integer | 8 | | 7 | IEEE float | 8 | | 8 | literal 0 | 0 | | 9 | literal 1 | 0 | | ≥12 and even | BLOB | N - 12 / 2 | | ≥13 and odd | text string | N - 13 / 2 | Notice the two special values 8 and 9 — the integer values 0 and 1 take up zero bytes on disk; the serial type alone encodes the value. SQLite's disk format is this miserly. The corresponding code in TinySqlite: php static constexpr auto serial type size uint64 t serial type noexcept - size t { if serial type <= 4 return std::array{0, 1, 2, 3, 4} serial type ; if serial type == 5 return 6; if serial type == 6 || serial type == 7 return 8; if serial type = 12 && serial type % 2 == 0 return static cast<size t serial type - 12 / 2 ; if serial type = 13 && serial type % 2 == 1 return static cast<size t serial type - 13 / 2 ; return 0; } Decoding a Schema Row in Real Time With varint and serial type in hand, we can now decode a single row from sqlite master. Say this row describes the apples table: type="table", name="apples", rootpage=2, sql="CREATE TABLE apples ... " . A schema table cell roughly follows this layout: payload size varint rowid varint header size varint 5 serial types varint body: actual data for 5 columns TinySqlite's parse schema entry does exactly this: skip payload size and rowid → read 5 serial types → compute per-column byte sizes from each serial type → sequentially read type, name, tbl name, rootpage, and sql from the body region. You don't need to memorize every step — just know that this is how you get a table name and its rootpage out of raw binary. Once you have the rootpage, you start traversing that table's B-tree from the corresponding page. SELECT + WHERE — How Data Actually Gets Found The first three sections laid all the groundwork: file format, B-tree, schema table, encoding primitives. Now let's answer the question that's been hanging since the beginning — You type SELECT name FROM apples WHERE color='Yellow' . What does SQLite actually do? SQL parsing? TinySqlite handles it with string search — find FROM , split column names, find WHERE , extract conditions. Let's skip past that in one sentence. The interesting part is what comes next: finding the data. Full Table Scan: Recursive Descent, Zero Missed Rows Without an index, SQLite's only option is a full table scan — read every row of the target table top to bottom, then filter with the WHERE clause. Sounds simple enough, but when a table spans multiple pages, how do you guarantee nothing is skipped? The answer is in the B-tree traversal algorithm. The entry point is the rootpage — the page number recorded in sqlite master. TinySqlite starts here: auto rp = db.rootpage sel.table ; From the rootpage, read columns values recursively walks the entire B-tree. The core logic: - If it's an interior page : each cell has a left child pointer pointing to a subtree. Iterate all cells, recursively process each subtree. There's also a rightmost child pointer pointing to the rightmost subtree — can't forget that one. 0x05 - If it's a leaf page : each cell is a data row. Read them one by one and match against the WHERE condition. 0x0D The rightmost child is the most overlooked piece of design — that 4-byte pointer sitting before the cell pointer array on interior pages. It points to the subpage covering the range "to the right" of all cells. Without it, the rightmost chunk of data simply gets dropped. // Interior page processing logic auto num cells = read u16 be page offset + 3 ; auto right child = read u32 be page offset + 8 ; // ← don't forget this one for uint32 t i = 0; i < num cells; ++i { auto cell ptr = page offset + read u16 be page offset + 12 + i 2 ; auto child = read u32 be cell ptr ; // ← left child auto child rows = read columns values child, column indices, filter ; // ...collect rows from the child page... } auto right rows = read columns values right child, column indices, filter ; // ...collect rows from the rightmost child... The recursion keeps diving, stops at leaf pages, reads data, and bubbles it back up. The entire B-tree gets fully traversed — not a single row is missed. Column value extraction for each row relies on the serial type machinery from the previous section: read varints for serial types and byte widths, then pull data out of the payload region. If a column happens to be the one in the WHERE clause, compare on the spot. Index Scan: Taking the B-tree Shortcut The problem with full table scans is obvious: you're only looking for color='Yellow' , yet you're reading every apple of every color. When a table has hundreds of thousands of rows, this hurts. SQLite's solution is an index . An index is itself a B-tree, but with a few twists: - Page types are 0x02 interior index page and 0x0A leaf index page - Cells don't carry full rows. They carry index column values + rowid - The B-tree is sorted by the indexed column Back to WHERE color='Yellow' . If the apples table has an index on the color column, TinySqlite's path becomes: Step 1: Search the index B-tree, collect matching rowids. The index search function traverses the index B-tree to find every entry where color='Yellow' . Because the index pages are sorted by the color column, the search is binary — compare the index value on the current page, go left if the target is lower, collect the rowid on match, and stop or continue searching the left subtree if the target is higher. Vastly more efficient than a full table scan. Step 2: Use rowids to locate individual rows in the table B-tree. With a list of rowids in hand, read row by rowid does point lookups on the table B-tree. Each point lookup follows a path similar to the recursive scan — compare rowids on interior pages to decide which child page to descend into — but it hunts for a single row rather than traversing every cell. php if sel.where { auto idx rp = db.index rootpage sel.table, sel.where- column ; if idx rp { std::vector<uint64 t rowids; db.index search idx rp, sel.where- value, rowids ; for auto rid : rowids { auto row = db.read row by rowid rp, rid, col indices ; // ...collect results... } return result; } } // No index, fall back to full table scan return QueryResult{.rows = db.read columns values rp, col indices, filter }; The code above is the core decision logic in TinySqlite's execute query : if there's an index, use it; otherwise, do a full scan. SQL parsing, schema rootpage lookup, B-tree traversal, WHERE filtering, index search — everything covered so far converges into these dozen lines. Of course, production-grade SQLite is far more complex. It has a query optimizer to pick among indexes, WAL journaling for crash recovery, multi-version concurrency control, B-tree page splits and rebalancing. But the core skeleton — file header → page → B-tree → schema → full scan / index scan — is exactly this. Databases Aren't That Mysterious Starting from the client-server architecture contrast with MySQL, we've peeled all the way down to the .db file's bedrock: fixed-size pages strung together into a tree, a handful of header bytes telling you page size and table count, a schema table whose varint-encoded rows describe where every table lives, interior B-tree pages serving as signposts and leaf pages holding the actual data, and indexes as separate B-trees already sorted for you. 740 lines of C++23, zero external dependencies, spanning the full path from a binary file header to a SELECT query result. It won't run TPC-C, and it's not going to replace libsqlite3 in your project. But if you want to see what every byte in a .db file is doing, it's exactly enough. Code at https://github.com/Tenaryo/TinySqlite https://github.com/Tenaryo/TinySqlite — this is a teaching-grade implementation, not an industrial one. Issues and feedback are still welcome.