Building SQLite from Scratch: 740 Lines of C++23 to Understand Every Byte of a .db File The article explains that SQLite is an embedded database engine stored in a single `.db` file, unlike client-server databases like MySQL. It introduces TinySqlite, a 740-line C++23 project that reads the raw binary structure of a `.db` file without using the official SQLite library, dissecting components like the file header, B-tree pages, and varint encoding. The piece emphasizes SQLite's ubiquity in everyday devices and aims to help readers understand the exact byte-level format that makes SQL queries work. You fire up a MySQL client, connect to port 3306, send off your SQL, and the server parses, optimizes, hits an index, fetches rows, and packs the result back to you. You can picture that entire pipeline. SQLite has none of that. No server process, no port, no wire protocol. Just a single file: my.db . So the real question is — what exactly is stuffed inside that file that makes SELECT FROM apples WHERE color='Yellow' return the right answer? TinySqlite https://github.com/Tenaryo/TinySqlite takes this apart across 740 lines of C++23. It doesn't link against the official SQLite library. It opens a .db file's raw binary and pries the data directly out of the disk bytes. We'll follow its code path, peeling back SQLite's file format layer by layer. This article covers: file header → B-tree pages → varint encoding → the schema table → full table scans → index scans. What SQLite Actually Is Let's get the definition straight first. SQLite is an embedded relational database engine — in plain English: it's a C library you compile into your program, and once you open a file, you can run SQL against it. No server, no install, no root password. If you're familiar with MySQL, here's the mental model. MySQL is a restaurant — a dedicated kitchen server process , waitstaff connection handling , a complex ordering system query optimizer . You sit down, say "SELECT," and the back of house scrambles to bring you the dish. SQLite is your fridge. Open it, grab what you need, nobody serves you. The entire database is a single data.db file. Copy it, carry it, done. Traditional client-server database: SQLite's embedded model: ┌─────────┐ TCP/network ┌───────────┐ ┌──────────────────────────────┐ │ Your app │ ←───────────→ │ DB server │ │ Your app │ └─────────┘ └───────────┘ │ ├── libsqlite.so engine │ │ ├── data.db the only file │ │ └── all ops are local calls │ └──────────────────────────────┘ Why should you care? You might not encounter MySQL every day, but you're almost certainly already using SQLite. Your phone's contacts, WeChat messages, Chrome bookmarks and browsing history — all stored in SQLite. Every iPhone, every Android device, every browser runs a SQLite instance. It's probably the most deployed database engine on the planet, bar none. Using it is trivial. Create a database, make a table, insert data, query: bash $ sqlite3 test.db sqlite CREATE TABLE fruits name TEXT, price INT ; sqlite INSERT INTO fruits VALUES 'apple', 5 ; sqlite SELECT FROM fruits WHERE price < 10; apple|5 If you're writing C/C++, include sqlite3.h and a handful of lines embed a full database in your program. Great. You're using it comfortably. But do you actually know — what do the bytes inside test.db look like? Now flip roles. Stop being the user, become the reverse engineer. TinySqlite is a set of reverse-engineering notes that dissects the .db file's binary structure piece by piece. Let's begin. Opening the File — How a .db File Is Organized The Entire File Is a Chain of Pages At the macro level, a .db file is astoundingly simple: it's a sequence of fixed-size pages laid end to end. Every page is the same size typically 4096 bytes , numbered starting from page 1. Picture a bookshelf where every shelf slot is the same width. To find the 3rd book, you start from the shelf edge and count to position 3 × slot width . SQLite pages work the same way — the data for page N starts at file offset N-1 × page size . my.db file: ┌─────── page 1 ───────┐┌─────── page 2 ───────┐┌─────── page 3 ───────┐┌── ... │ file header 1st 100B ││ page header ││ page header │ │ page size = 4096 ││ type = 0x0D leaf ││ type = 0x05 interior │ num tables = 3 ││ cells = row1,row2 ││ child page ptrs │ │ ... ││ ... ││ ... │ └───────────────────────┘└──────────────────────┘└──────────────────────┘ Page 1 is special — its first 100 bytes form the file header , storing global metadata. Every page after that has only a page header followed by actual data. What the File Header Carries The first 100 bytes of page 1 in every SQLite file follow a fixed format. The first 16 bytes are the magic string "SQLite format 3\000" — the file's "ID card." It tells any program that tries to read the file: hey, I'm a SQLite 3 format database. Bytes 16–17 store the page size . Note that this is stored in big-endian — high byte first. If these two bytes read 0x10 0x00 , that's 4096. If the page size is 512, they'd be 0x02 0x00 . Here's how TinySqlite reads the page size: static constexpr size t kPageSizeOffset = 16; auto read u16 be size t offset const noexcept - uint16 t { return static cast