{"slug": "one-sqlite-file-and-one-harness-is-enough-for-french-social-housing", "title": "One SQLite File and One Harness Is Enough for French Social Housing", "summary": "A French social housing AI agent called PIERRE eliminated its entire RAG stack—including vector databases, embedding models, rerankers, and multiple cloud GPU services—replacing them with a single SQLite file and a harness, resulting in a faster, simpler, and more maintainable system. The open-source project, which serves both tenants and employees of HLM social housing, now operates without chunking strategies, tokenizer bugs, hybrid search tuning, or provider-dependent infrastructure that previously required multiple monthly subscriptions and GPU setups.", "body_md": "I deleted a lot of glue code. 10+ dependencies. A chunking strategy. A vector database. French stemmers. An embedding model. A reranker. A costly and lengthy build pipeline. A €200/month Hetzner GPU. A €15/month Hugging Face inference endpoint. A few euros per month for LLM-as-reranker calls on Groq or Cerebras.\n\nAnd the project got better. `oxfmt`\n\n— the fastest formatter — is the only thing that made it slower. More on that later.\n\nThis is the story.\n\nPIERRE is an [open-source AI agent](https://github.com/charnould/pierre) for French social housing — HLM — and a learning project on the side.\n\nAt its core is a chatbot for both tenants and employees: tenants asking about their solidarity rent surcharge (SLS), on-call agents looking up procedures, collection officers cross-checking patrimony data scattered across spreadsheets and internal documents.\nSome tasks need retrieval. Others need `SQL`\n\njoins, reasoning, or calculations.\n\nBut PIERRE is also something less fashionable: a traditional interface for HLM employees who are not comfortable with chat or AI systems.\n\nDrop a scanned paper letter. Add context in a form. Give it a `claim_id`\n\n. Hit submit. The agent checks the relevant sources, runs the required calculations, drafts a reply, and shows its reasoning — before anything is sent. Or drop the required documents and generate a repayment plan in seconds.\n\nLess \"chat with AI\". More old-school software powered by an agent. And new use cases discovered directly from the field: **forward-deployed engineering, HLM edition**.\n\nFrench bureaucracy, with people's homes attached.\n\nThat is the open source product 1.\n\nPIERRE used to look like a proper classic RAG system. Every box was a future runbook:\n\n```\n                        user question\n                              │\n              query augmentation/expansion (LLM call)\n                              │\n              ┌───────────────┴───────────────┐\n              ▼                               ▼\n       search-by-vectors.ts             search-by-bm25.ts\n        (bge-m3 via Ollama)              (French stemmer)\n              │                               │\n              └──────────────┬────────────────┘\n                             ▼\n                  SQLite + sqlite-vec\n                             │\n                             ▼\n              rank-chunks.ts → LLM-as-reranker\n               (Groq or Cerebras, or wait)\n                             │\n                             ▼\n                 LLM answer (fingers crossed)\n```\n\nIt worked. Mostly. That is how complexity wins. One reasonable step at a time.\n\nThe bad part was not the code. Though, sometimes, yes, the code too. The bad part was the contract.\n\nTo run the thing properly, you now had at least:\n\n- a chunking strategy nobody agreed on;\n- a tokenizer bug you discovered only in production;\n- a BM25 path, with French stemmers because naturally;\n- hybrid search tuning between lexical and semantic retrieval;\n- metadata filters users expected to \"just work\";\n- a reranking layer with its own latency budget;\n\nThen came the infrastructure:\n\n- a GPU setup just to generate embeddings fast enough on a Linux VPS;\n- a dedicated inference endpoint for full rebuilds;\n- a vectorization pipeline to monitor, retry, and occasionally pray over;\n- invisible retries around every provider call;\n- timeout handling everywhere;\n- logs to explain which layer failed;\n- traces to understand why the \"same\" question gave different answers twice in a row;\n\nAnd of course, the providers:\n\n- a fast LLM for invisible calls, like Groq or Cerebras;\n- a main provider like Mistral AI, Anthropic, OpenAI, or open models (DeepSeek, GLM, Qwen...);\n- fallback providers because eventually one of them would return a beautiful\n`HTTP 500`\n\n;\n\nThen came the meetings:\n\n- \"Why didn't it find my document?\"\n- \"Why DID it find this document?\"\n\n…And eventually, a growing suspicion that half the stack existed mostly to compensate for the other half.\n\nFor a big search product, fine. For a self-hosted agent in a *bailleur social*, that's a lot to ask.\n\n**A dependency is not just a package. It is a person who has to understand it later.**\n\nRetrieval can answer \"what does SLS mean?\". A useful agent must answer \"how many residences in Dijon have Iserba as the maintenance contractor?\". That second question is not in plain text anywhere in the corpus. It is a `JOIN`\n\n.\n\nSo I left chatbots for harnesses: Pi (OpenClaw harness), Claude Code, Codex... And the question became simpler and harder:\n\nHow do I give the agent access to the knowledge base and\n\nminimize its reasoning turnswhen speed matters?\n\n```\nbuild pipeline                                      runtime\n─────────────                                       ───────\n.docx / .xlsx                              one microVM per conversation\n      |                                                 |\n      v                                                 |\n  db.sqlite                                          question\n      ├─ documents (FTS5)                               |\n      ├─ staff_contact_information                      |\n      ├─ on_call_agent_procedures                       |\n      ├─ rental_units                                   v\n      └─ _readme (JSON schema) ── templated into ──> AGENTS.md  <- Pi reads at session start\n                                                     db.sqlite  <- Pi queries via sqlite3\n                                                        |\n                                                        v\n                                                      answer\n```\n\nEach PIERRE profile (tenant, on-call agent, collection officer…) gets one database. The harness sees only that file. Not a vector store plus a document store plus a metadata store plus a sidecar.\n\nMarkdown goes into an FTS5 table called `documents`\n\n. Spreadsheets become typed SQLite tables (`INTEGER`\n\nwhere every value is numeric, `TEXT`\n\notherwise). At build time, each column is analyzed and a minified JSON schema is stored in `_readme`\n\n, then templated into `AGENTS.md`\n\nat session start.\n\nKnowledge rebuild from scratch takes seconds. No GPU. Server cost moved from a €200/month GPU box to a €46/month Hetzner `AX41-NVMe`\n\n— old, cheap, and capable of nested virtualization for per-conversation microVMs (fully isolated, destroyed at the end; cold start is near-instant because the image is prebuilt). The model behind Pi is Claude Sonnet, billed per token: a few cents per conversation.\n\nPer-conversation microVMs (via [Smol Machines](https://smolmachines.com/)) are not architectural fashion. Data never leaks between sessions, state is destroyed with the microVM, and the GDPR conversation with a *bailleur* is much shorter.\n\nThat changes who gets to run AI. Not AI for organizations with platform teams. AI for the housing coordinator with a tiny IT department and a budget meeting in three weeks.\n\nChunking is hard. Fixed-size, overlap, semantic — entire libraries dedicated to it. I do not need any of it anymore.\n\nThe unstructured documents here (`.md`\n\n, `.docx`\n\n) are never 100 pages long. Each one covers one precise topic — a \"one document, one topic\" policy I follow and ask HLM organizations to follow too.\n\nSo the ingest does the simplest thing: one document, one row in the `documents`\n\nFTS5 table, full content. Period.\n\nThis is not always right. For a 500-page legal code, you would chunk. For short procedural notes written by humans for humans, you would not. **Match the strategy to the corpus, not to the meme.**\n\nFor prose, PIERRE uses SQLite FTS5:\n\n```\ntokenize = \"unicode61 remove_diacritics 2 tokenchars '-'\"\n```\n\nThree things in one line. `unicode61`\n\nfor French. `remove_diacritics 2`\n\nbecause users forget accents and documents have them. `tokenchars '-'`\n\nbecause `loca-pass`\n\nis one word, not two.\n\nFrench social housing has words you do not blur: `SLS`\n\n, `APL`\n\n, `CAF`\n\n, `loca-pass`\n\n, `charges récupérables`\n\n, `bailleur`\n\n, `conventionnement`\n\n, `commission d'attribution`\n\n. Lose the hyphen and the agent does a second turn to recover. The hyphen is not aesthetic. It is latency.\n\nBM25 is old. So what. FTS5 because BM25 ranks; `LIKE`\n\ndoes not. The agent runs:\n\n```\nSELECT rowid, content, snippet(documents, 0, '**', '**', '…', 200) AS excerpt\nFROM documents\nWHERE documents MATCH '\"loca-pass\" OR \"avance\" OR \"caution\"'\nORDER BY rank LIMIT 5;\n```\n\nQuery expansion is handled in the prompt, not in code. A capable LLM expands French synonyms well enough on a small corpus — tell it to, quote each term, combine with `OR`\n\n. No separate expansion service. `content`\n\nis always in the SELECT. The full document text comes back with the first query. The agent never fires a second query to re-fetch content it already has.\n\nNot glamorous. Works.\n\nA lot of organizational knowledge is not prose. Agency contacts. Rent grids. Routing rules. Patrimony data.\n\nDo not chunk that and hope cosine similarity finds the right contractor for building \"Rosa Parks\".\n\nPIERRE ingests Excel (`.xlsx`\n\n) by unmerging cells, normalizing headers and sheet names, and turning sheets into JSON rows. Names go to lowercase ASCII `snake_case`\n\n— the biggest single win:\n\n```\nCaractéristiques techniques du patrimoine immobilier\n                       │\n                       ▼\ncaracteristiques_techniques_du_patrimoine_immobilier\n```\n\nColumn names get the same treatment:\n\n```\nHeating c.   ─►   heating_contractor\nDate MES     ─►   date_mise_en_service\n```\n\nThis is not cosmetic. It is interface design for the agent (human or not). An LLM writes better SQL against `procedure_pour_les_agents_d_astreinte`\n\nthan against `Procédure pour les agents d'astreinte`\n\n. No quoting headaches. No invisible apostrophes. No accents that look identical but are not the same codepoint. `heating_contractor`\n\nis self-documenting. `Heating c.`\n\nis a riddle.\n\nNaming is step one. **Typing** is step two.\n\nFrench Excel is not CSV. Cells arrive as `1 234,56`\n\n, `25 %`\n\n, `1.234,56 €`\n\n, or `14/05/2024`\n\n. The ingest normalizes them before SQLite ever sees them:\n\n- European decimals and thousands separators → JavaScript\n`number`\n\n- Percentages → fractions (\n`25 %`\n\n→`0.25`\n\n) - Currency symbols stripped (\n`€`\n\n,`$`\n\n,`£`\n\n, …) - Dates →\n`YYYY-MM-DD`\n\nin Europe/Paris (`14/05/2024`\n\n→`2024-05-14`\n\n)\n\nA column where every non-null value is numeric becomes `INTEGER`\n\n. Values are stored as numbers, not strings.\n\nThat matters the moment someone asks a counting question:\n\n```\nSELECT COUNT(*) FROM caracteristiques_du_patrimoine\nWHERE surface_habitable > 50 AND annee_construction < 1990;\n```\n\nWith everything as `TEXT`\n\n, the agent would need a cast, a guess, or a Python detour. With `INTEGER`\n\n, it just runs SQL.\n\nThe best retrieval improvement was not a better embedding model. It was making the data boring enough that a dumb agent could query it — **and count on it**.\n\nThis is the part that matters.\n\nMinimizing turns is not a side-optimization. It is the difference between 10 seconds and 40 seconds. Same question. Same model. Only the turn budget changes. You can wait 40 seconds for Claude Code while making coffee. You cannot wait 40 seconds for a chatbot answer.\n\nSo the agent gets the map of the database for free. At build time, the pipeline analyzes every table and stores a minified JSON schema in `_readme`\n\n. At runtime, a template per profile contains a placeholder:\n\njson\n<!-- KNOWLEDGE_SCHEMA_HERE -->\nThat placeholder is replaced with the current `_readme`\n\ncontent and written into `AGENTS.md`\n\n. Pi reads it at session start.\n\nWithout that, the first turn is always discovery: `SELECT name FROM sqlite_master`\n\n, `PRAGMA table_info(...)`\n\n, `SELECT DISTINCT …`\n\n. Waste. The agent should spend its turns answering the tenant, not learning that a table exists.\n\nThe old schema was a Markdown table of names. Useful, but thin. The new one is JSON — one line per table, column metadata included. The build classifies each column:\n\n**discrete**(≤ 20 distinct values): lists the values, so the agent knows`type_logement`\n\nis`collectif`\n\nor`individuel`\n\nwithout probing**continuous_numeric**(> 20 values,`INTEGER`\n\n):`min`\n\nand`max`\n\nonly**date**(all non-null`TEXT`\n\nvalues match`YYYY-MM-DD`\n\n):`min`\n\nand`max`\n\ndate range**continuous_text**: free text, no enumeration\n\nLong discrete values (> 40 characters) omit the list and expose `discrete_count`\n\nonly — token budget again.\n\nAbbreviated and prettified example:\n\n```\n{\n  \"access\": \"read-only\",\n  \"tables\": [\n    {\n      \"name\": \"communes\",\n      \"rows\": 2456,\n      \"columns\": [\n        {\n          \"name\": \"nom\",\n          \"type\": \"TEXT\",\n          \"not_null\": true,\n          \"nature\": \"discrete\",\n          \"discrete_count\": 2,\n          \"values\": [\"lyon\", \"paris\"]\n        },\n        {\n          \"name\": \"population\",\n          \"type\": \"INTEGER\",\n          \"not_null\": true,\n          \"nature\": \"continuous_numeric\",\n          \"min\": 500000,\n          \"max\": 2200000\n        }\n      ]\n    },\n    {\n      \"name\": \"documents\",\n      \"description\": \"One complete document per row. Not chunked.\",\n      \"engine\": \"fts5\",\n      \"tokenizer\": \"unicode61 remove_diacritics 2 tokenchars '-'\",\n      \"rows\": 97,\n      \"columns\": [\n        { \"name\": \"rowid\", \"indexed\": false },\n        { \"name\": \"content\", \"indexed\": true }\n      ],\n      \"query_examples\": [\n        \"SELECT rowid, content FROM documents WHERE documents MATCH 'chauffage';\",\n        \"SELECT rowid, content, snippet(documents, 0, '[', ']', '...', 20) FROM documents WHERE documents MATCH 'ascenseur';\",\n        \"SELECT rowid, content FROM documents WHERE documents MATCH '\\\"dégât des eaux\\\" AND urgence';\"\n      ]\n    }\n  ]\n}\n```\n\nSame turn-budget logic as injecting table names — but the agent also sees value ranges and legal enums before its first query.\n\nThe same logic applies to the date. Many questions are time-sensitive (\"Am I allowed to intervene tonight in this sensitive building?\"). Without help, the agent calls a date tool — and doesn't even know the 2026 French public holidays. So `today_is()`\n\nis written into `AGENTS.md`\n\nat every session start:\n\n```\n<session>Current date and time (Europe/Paris): Sunday, May 24, 2026 (Week 21) at 14:56 (french public holiday: Whit Sunday).</session>\n```\n\nFree. No tool call.\n\nThe prompt extends the same logic. Query through `sqlite3`\n\n, not `python3`\n\n. Do not call `.tables`\n\nor `PRAGMA`\n\n— the schema is already in context. Put independent queries in one turn. Prefer joins over chains of small queries. Always include `content`\n\nin the `SELECT`\n\nfor `documents`\n\n.\n\nThis is not prompt-engineering theater. It is latency control.\n\nI once formatted the injected schema with `oxfmt`\n\n. The output looked great — a proper Markdown table, columns aligned, headings in the right place. Some headings and column names were very long. The markup that aligned them was longer: stretches of whitespace between every `|`\n\nto keep the visual grid.\n\nI shipped it.\n\nThe next day I noticed sessions started slow. Every session.\n\nA schema that fit in a few hundred tokens of compact text exploded into thousands of tokens of beautiful prose. Multiplied by every session start. Multiplied by every conversation.\n\nSeconds of latency, injected before the agent had even read the question. Pretty is not always kind. I reverted. Many seconds saved.\n\nThe schema is now JSON, not Markdown tables. `oxfmt`\n\nstill formats DOCX→Markdown elsewhere in the pipeline, but the session-start trap can take a new shape: `JSON.stringify(schema, null, 2)`\n\nor an over-long `values`\n\nlist on a discrete column would blow the budget just as fast. Hence minified JSON (`JSON.stringify(schema)`\n\nwith no indentation) and `discrete_count`\n\nwithout values when any entry exceeds 40 characters. Same lesson, different formatter.\n\nThis is not a religion.\n\nPIERRE today: a few hundred documents per profile, around ten spreadsheets — some with roughly 200,000+ rows. Boring scale. Ten million documents? Use a real search system. Fuzzy semantic discovery over unknown material? Embeddings are useful.\n\nFor a domain-specific product like this, retrieval can be one well-known file, a capable harness, a strong model, and a few boring rules that save turns:\n\n- Quote FTS terms.\n- Inject the schema once — don't make the agent rediscover tables.\n- Normalize names. Type numbers and dates at ingest.\n- Let the build describe columns: discrete values, ranges, dates.\n- Keep the schema compact: minified JSON, no pretty-print.\n- One topic per doc.\n- Name Excel sheets and column headers like someone else will have to query them — because someone will.\n\nI re-ran the same questions through both stacks. The new one was clearly better, and easier to evolve. But the prompt evolved in parallel. Honest enough.\n\nThat is not an AI problem. That is a knowledge-management problem.\n\nMay 27, 2026\n\n[Charles-Henri Arnould](https://www.linkedin.com/in/charnould) · [charnould@pierre-ia.org](mailto:charnould@pierre-ia.org)\n\nDrafted in English with LLM help; opinions are mine.\n\n## Footnotes\n\n-\n**Why is PIERRE open source?** French social housing landlords do the same public-interest job across territories. Paris and Avignon are not in a market battle — same procedures, same laws, different spreadsheets. So the shared HLM knowledge base is open data, the code is open, and any landlord can fork, host, and improve it. Some already do. HLM must not miss the AI wave. \"AI strategy\" too often means proprietary tooling, expensive integrations, and fresh dependency. PIERRE is built the**other way**: open, modular, pragmatic. Modern enough to be useful. Simple enough for a small IT team. And able to plug into the prehistoric apps the sector cannot drop overnight. Own your IT stack — every euro spent on complexity or vendor lock-in is a euro not spent on social housing.[↩](#user-content-fnref-oss-6e3d46f05c508228fc55b3861eca8821)", "url": "https://wpnews.pro/news/one-sqlite-file-and-one-harness-is-enough-for-french-social-housing", "canonical_source": "https://github.com/charnould/pierre/blob/master/docs/documentation/2026-05-from-rag-to-sqlite-and-harness.md", "published_at": "2026-05-27 13:48:38+00:00", "updated_at": "2026-05-27 14:16:46.064236+00:00", "lang": "en", "topics": ["ai-agents", "ai-products", "ai-tools", "natural-language-processing", "ai-infrastructure"], "entities": ["PIERRE", "Hetzner", "Hugging Face", "Groq", "Cerebras", "oxfmt", "HLM", "SLS"], "alternates": {"html": "https://wpnews.pro/news/one-sqlite-file-and-one-harness-is-enough-for-french-social-housing", "markdown": "https://wpnews.pro/news/one-sqlite-file-and-one-harness-is-enough-for-french-social-housing.md", "text": "https://wpnews.pro/news/one-sqlite-file-and-one-harness-is-enough-for-french-social-housing.txt", "jsonld": "https://wpnews.pro/news/one-sqlite-file-and-one-harness-is-enough-for-french-social-housing.jsonld"}}