Tlamatini – Local-first AI dev assistant with 68 agents and hybrid RAG Tlamatini, a locally-deployed AI developer assistant named after the Nahuatl word for "one who knows," now features 68 drag-and-drop agents following its v1.9.0 release. The tool combines a hybrid RAG pipeline with a multi-turn orchestration layer and ACPX delegation to external coding agents, operating local-first by default with cloud LLMs as opt-in only. The latest update introduces the STM32er agent, a zero-config firmware bridge that scaffolds, builds, flashes, and observes STM32 microcontrollers while refusing to produce or flash mis-targeted firmware. "one who knows" — a locally-deployed AI developer assistant Tlamatini Nahuatl for "one who knows" is a locally-deployed AI developer assistant that pairs a hybrid RAG pipeline 82-rag FAISS + BM25, metadata extraction, context budgeting with a Multi-Turn 35-tutorial-the-multi-turn-toggle tool-orchestration layer, ACPX 5-acpx--external-coding-agent-clis-as-tools delegation to external coding-agent CLIs Claude Code https://docs.anthropic.com/en/docs/claude-code/overview , Cursor https://cursor.com , Codex https://github.com/openai/codex , Gemini https://github.com/google-gemini/gemini-cli , Qwen https://github.com/QwenLM/qwen-code , … , and a visual workflow designer 4-visual-workflow-designer-agentic control panel with 68 drag-and-drop agents . Local-first by default: the full RAG pipeline, the Multi-Turn execution loop, and every workflow agent run on your machine — embeddings and chat are driven by your local Ollama install. Cloud LLMs Claude API, Ollama Pro/Max and ACPX delegation to cloud CLIs are opt-in per-request, never the default. Sensitive code never leaves the box unless you explicitly route it out. Latest — v1.9.0 2026-05-26 : STM32er, zero-config firmware bridge.A newSTM32eragent canvas node + Multi-Turn tool chat agent stm32er brings the catalog to68 agents. It bridges the STM32 Template Project MCP to scaffold, build, flash, observe serial / SWD , and reset STM32 firmware.Zero-config auto-bootstrapmeans the user only installs STM32CubeIDE + Tlamatini — STM32er downloads, installs, and validates the MCP server on first use. Acritical-mission safety preflightvalidates the toolchain and a positively-confirmed connected ST-LINK probe before flashing, andrefusesrather than producing or flashing mis-targeted firmware. Three new catalog demos ship in migration 0103 STM32 GENESIS / BLINKY / HIL OBSERVATORY . See §3.15 . 🌐 Website https://xaiht.org · · https://www.youtube.com/watch?v=4MyRXBahHuU&t=41s ▶️ One-minute teaser · /XAIHT/Tlamatini/blob/main/BookOfTlamatini.md 📖 Long-form docs · /XAIHT/Tlamatini/blob/main/VERSIONING.md 🏷️ Versioning 🎬 More demos 1. Overview 1-overview 2. Quickstart source mode 2-quickstart-source-mode 3. Using the Chat 3-using-the-chat-agent /agent/ 3.1. Chat layout in 30 seconds 31-chat-layout-in-30-seconds 3.2. Setting code as context 32-setting-code-as-context 3.3. Tutorial: a one-shot question no toggles 33-tutorial-a-one-shot-question-no-toggles 3.4. Tutorial: the 34-tutorial-the-internet-toggle internet toggle 3.5. Tutorial: the 35-tutorial-the-multi-turn-toggle Multi-Turn toggle 3.6. Tutorial: the 36-tutorial-the-exec-report-toggle Exec Report toggle 3.7. Tutorial: the 37-tutorial-the-acpx-toggle ACPX toggle 3.8. From chat to flow: the 38-from-chat-to-flow-the-create-flow-button Create Flow button 3.9. Why Chat-created flows are safer now 39-why-chat-created-flows-are-safer-now 3.10. The 310-the-db-menu--backup-set-db-and-the-start-up-swap-in DB menu — Backup, Set DB, and the start-up swap-in 3.11. The 311-the-acpx-skills-menu--browse-configure-diagnostics-reload ACPX-Skills menu — Browse, Configure, Diagnostics, Reload 3.12. Tutorial: command a window from chat 312-tutorial-command-a-window-from-chat-chat agent windower chat agent windower 3.13. Tutorial: drive a browser from chat 313-tutorial-drive-a-browser-from-chat-chat agent playwrighter chat agent playwrighter 3.14. Tutorial: run Kali Linux tools from chat 314-tutorial-run-kali-linux-tools-from-chat-chat agent kalier chat agent kalier 3.15. Tutorial: build and flash STM32 firmware from chat 315-tutorial-build-and-flash-stm32-firmware-from-chat-chat agent stm32er chat agent stm32er 4. Visual Workflow Designer 4-visual-workflow-designer-agentic control panel /agentic control panel/ 4.1. Canvas anatomy 41-canvas-anatomy 4.2. Tutorial: your first flow 3 agents 42-tutorial-your-first-flow-3-agents 4.3. Saving and loading 43-saving-and-loading-flw-files .flw files 4.4. Validate and Start now compile the live canvas 44-validate-and-start-now-compile-the-live-canvas 4.5. Pause, Resume, Stop 45-pause-resume-stop 4.6. FlowHypervisor watchdog 46-flowhypervisor-watchdog 4.7. FlowCreator let an LLM design the flow 47-flowcreator-let-an-llm-design-the-flow 4.8. Parametrizer chain outputs into the next agent's config 48-parametrizer-chain-outputs-into-the-next-agents-config 4.9. Gatewayer external triggers 49-gatewayer-external-triggers 5. ACPX — External Coding-Agent CLIs as Tools 5-acpx--external-coding-agent-clis-as-tools 6. Unreal MCP — Driving Unreal Engine 5 from Tlamatini 6-unreal-mcp--driving-unreal-engine-5-from-tlamatini 6.1. What Unreal MCP is 61-what-unreal-mcp-is 6.2. The MCP plugin source the 62-the-mcp-plugin-source-the-mcp-git-location MCP git location 6.3. Installing and enabling the plugin inside your UE5 project 63-installing-and-enabling-the-plugin-inside-your-ue5-project 6.4. The command catalog up to 53 commands across 9 categories 64-the-command-catalog-up-to-53-commands-across-9-categories 6.5. Using Unreal MCP from the chat 65-using-unreal-mcp-from-the-chat-chat agent unrealer chat agent unrealer 6.6. Using Unreal MCP on the canvas the visual 66-using-unreal-mcp-on-the-canvas-the-visual-unrealer-node Unrealer node 6.7. What the agent actually does, end-to-end 67-what-the-agent-actually-does-end-to-end 6.8. Exec Report integration 68-exec-report-integration 6.9. Bullet-proof checklist for Unreal Engine users 69-bullet-proof-checklist-for-unreal-engine-users 6.10. Troubleshooting Unreal MCP 610-troubleshooting-unreal-mcp 7. Building a Frozen Distribution 7-building-a-frozen-distribution 8. Configuration 8-configuration-tlamatiniagentconfigjson Tlamatini/agent/config.json 9. Architecture at a Glance 9-architecture-at-a-glance 10. Embedding-Memory Pre-Flight Guard GPU hosts 10-embedding-memory-pre-flight-guard-gpu-hosts 11. Orphan-Process Cleanup 11-orphan-process-cleanup-conhostexe-reaper conhost.exe reaper 12. Troubleshooting 12-troubleshooting 13. Versioning 13-versioning 14. Contributing & License 14-contributing--license Tlamatini Nahuatl for "one who knows" is a Django/Channels app you run on your own machine. It packages a hybrid RAG pipeline, a Multi-Turn tool-calling LLM loop, an ACPX runtime that spawns external coding-agent CLIs as child processes, an Unreal MCP client that drives Unreal Engine 5 from chat or canvas, and a drag-and-drop workflow designer with 68 agent types — into one local install. Backends: Ollama local , Anthropic Claude cloud , Qwen vision Ollama . License: GPL-3.0 · Repo: https://github.com/XAIHT/Tlamatini.git https://github.com/XAIHT/Tlamatini.git · Platform tested: Windows 11 cross-platform for source mode . Real RAG over your code — FAISS + BM25 hybrid retrieval, code-aware metadata extraction, Reciprocal Rank Fusion, context budgeting, OOM fallback. Multi-Turn mode — the LLM becomes an operator : shell, Python, APIs, SQL, file ops, screenshots, keyboard/mouse automation, email, Telegram, WhatsApp, STM32 firmware build/flash — chained in one conversation. ACPX — delegate sub-tasks to external CLIs claude , cursor-agent , codex , gemini , qwen-code , plus 8 more and relay output between them. Visual workflow designer — design .flw flows once, run them unattended, schedule with Croner, watch them with FlowHypervisor. Self-aware — a first-person self-knowledge map Tlamatini.md is injected into the LLM's prompt on every chain, so Tlamatini can answer accurately about her own architecture, runtime modes, ports, and pages. Builds packaged with --self-modify ship her own source tree TlamatiniSourceCode/ so she can read, inspect, and modify herself. Everything runs locally. The whole app packages into a one-click Windows .exe distribution Part §7 7-building-a-frozen-distribution . First system-usage walkthrough https://www.youtube.com/watch?v=CkvDPSd c-g Loading a complete project and summarizing its source code https://www.youtube.com/watch?v=Lrpbt dPIXw Installing OpenCV end-to-end in Multi-Turn https://www.youtube.com/watch?v=bBlqbZVK-Wk Uninstalling Poco — Exec Report and matching flow https://www.youtube.com/watch?v=E5vi0q5FxXQ Implementing a FlowCreator-aided agentic flow https://www.youtube.com/watch?v=3Pno6s4xVsE A complete Cybersec enhancement with Tlamatini https://www.youtube.com/watch?v=4MyRXBahHuU&t=41s This is the fastest way to be productive: clone, install, run. No installer, no admin, no frozen build. Five minutes. | Requirement | Recommended | Notes | |---|---|---| | Python | 3.12.10 | The only version Tlamatini is fully tested on. | | OS | Windows 11 | Linux/macOS work for chat + designer; Mouser/Keyboarder are Windows-leaning. | | RAM | 16 GB+ | 32 GB comfortable for bigger embedding models. | | Disk | ~10 GB | Most is local LLM models. | | LLM server | Ollama | Default. Cloud Claude/Gemini also supported. | You do not need administrator rights for any of the steps below. Open PowerShell normally do not Run as Administrator , then: $env:OLLAMA INSTALL DIR = "$env:LOCALAPPDATA\Programs\Ollama" irm https://ollama.com/install.ps1 | iex Close the window, open a fresh PowerShell, and verify: ollama --version ollama serve leave running in its own window if it's not already up Invoke-WebRequest http://127.0.0.1:11434/api/tags -UseBasicParsing Tlamatini expects Ollama at http://127.0.0.1:11434 . ollama pull Nomic-Embed-Text:latest ollama pull glm-5:cloud ollama pull qwen3.5:cloud ollama pull gpt-oss:120b-cloud ollama pull qwen3.5:397b-cloud ollama pull llama3.2-vision:11b | Tag | Used for | |---|---| Nomic-Embed-Text:latest | RAG embeddings default — small VRAM footprint, ~600 MB resident | glm-5:cloud | Default chat + Multi-Turn unified-agent + MCP file-search | qwen3.5:cloud | Default vision Image-Interpreter | gpt-oss:120b-cloud | Several workflow-agent templates Monitor-Log, Notifier, Prompter, Summarizer, … | qwen3.5:397b-cloud | Default FlowCreator | llama3.2-vision:11b | Local vision fallback | You can substitute any tag — just edit Tlamatini/agent/config.json see §8.1 81-llm-and-unified-agent or the relevant agent's config.yaml . Optional: swap to a higher-detail embedding model.If your retrieval quality on dense, technical corpora is not good enough with the default, you can switch to qwen3-embedding:8b from theConfig → Modelsmenu inside the app or by editing embeding-model in config.json and reconnecting .Use with caution: qwen3-embedding:8b is roughly10× heavier in VRAMthan Nomic-Embed-Text:latest ~6.24 GB resident vs ~600 MB on a Q4 K M quant and will trip the embedding-memory pre-flight guard see §10 on 8 GB consumer GPUs. Pull it first with ollama pull qwen3-embedding:8b . Four of the six default model tags in §2.3 23-pull-the-default-models carry the :cloud suffix — glm-5:cloud , qwen3.5:cloud , gpt-oss:120b-cloud , and qwen3.5:397b-cloud . Those are Ollama Cloud models: they live on Ollama's servers, not on your machine, and ollama pull only registers a stub that proxies inference to the cloud. Reaching that cloud requires a logged-in Ollama account and a subscription tier that allows the workload you intend to run. The plan structure prices are deliberately omitted from this README because they change — check https://ollama.com/pricing for the current numbers : | Plan | Cloud-model access | Why it matters for Tlamatini | |---|---|---| Free | 1 cloud model concurrently, light usage. Local open-weights models are unlimited. | Enough to try a single cloud model for a one-shot chat. Not enough for Tlamatini's default config, which pins different cloud models for chat glm-5:cloud , FlowCreator qwen3.5:397b-cloud , several workflow agents gpt-oss:120b-cloud , and vision qwen3.5:cloud — so a real Multi-Turn run typically needs 2–3 cloud models loaded at once. | Pro | 3 concurrent cloud models, ~50× the Free monthly quota, access to the larger cloud-only models, ability to upload / share private models. | The realistic minimum for running Tlamatini out-of-the-box with its shipped cloud-model defaults — Multi-Turn + Exec Report + occasional Image-Interpreter calls. | Max | 10 concurrent cloud models, ~5× the Pro quota, designed for sustained heavy agentic workloads. | Recommended for long-running ACPX relays, FlowHypervisor-supervised flows, and Croner-driven unattended runs that chain many cloud calls per hour. | If you do not want to subscribe , you can run Tlamatini entirely on local open-weights models. Edit Tlamatini/agent/config.json chained-model , unified agent model , mcp file search model , flow creator model , image interpreter model and every agent config.yaml that names a :cloud tag, and swap them for a model you have pulled locally for example, llama3.1:8b , qwen2.5-coder:14b , mistral-nemo:12b . Performance and quality will scale with your GPU/CPU — Multi-Turn and ACPX both work fine on a sufficiently large local model. API keys are separate. This subscription only governs :cloud Ollama models. The ACPX runtime can additionally spawn external coding-agent CLIs that bring their own credentials Anthropic API key for claude , OpenAI key for codex , Google key for gemini , etc. — those are configured in Tlamatini/agent/config.json under acpx.agents.