{"slug": "ai-engineer-in-vancouver-bc-production-ai-built-in-the-open", "title": "AI Engineer in Vancouver, BC — Production AI, Built in the Open", "summary": "Rafael Lopes, a production AI engineer in Vancouver, BC, builds and ships hybrid-RAG pipelines, distributed LLM inference across four compute architectures, and a sovereign research copilot on a self-hosted homelab, documenting everything in the open. His platform serves live traffic from a K3s cluster with no cloud compute, using GitOps and Cloudflare Tunnel for edge security.", "body_md": "## What I Build\n\nI'm Rafael Lopes — \"Rafa\" — a production AI engineer based in Vancouver, British Columbia. I don't write *about* AI from the sidelines; I ship it. The systems below all serve live traffic from a self-hosted cluster in one room:\n\n- A\n**hybrid-RAG pipeline** over 69,000+ curated technical chunks (BM25 + TF-IDF + weighted RRF + cross-encoder rerank), with an automated quality gate that strips fabricated quotes before anything publishes. **Distributed LLM inference** across four compute architectures — ARM, AMD ROCm, NVIDIA CUDA, and Apple Silicon — pooling memory over the llama.cpp RPC protocol for models too large for one GPU., a sovereign research copilot for Canadian HPC — every byte of the inference path stays local, with a live ledger proving zero foreign hops per query.[exaflop.ca](http://exaflop.ca)\n\n## The Stack\n\nThe whole platform is documented, not described:\n\n**How the briefs are made**→ the retrieval → synthesis → quality-gate → publish pipeline, with the real numbers.** The infrastructure**→ a four-architecture K3s homelab, GitOps via Argo CD, Cloudflare Tunnel + Zero Trust at the edge — no cloud compute.** A from-scratch RAG build**→ the actual BM25/TF-IDF/RRF code and measured retrieval quality.\n\n## The Daily Brief\n\nEvery weekday I publish a cross-domain engineering brief — AI, web performance, system design, security, and the career arc — synthesized from the corpus, cited to source, and shipped through the same quality gate. The archive is the proof of consistency: nobody fakes a dated, cited, cross-domain brief every working day.\n\n## The Infrastructure\n\nNo managed Kubernetes, no hosted CI, no hyperscaler in the data path. A Raspberry Pi runs the K3s control plane; an AMD-ROCm workstation does the GPU heavy lifting; an x86 box self-hosts GitLab and the registry; a Mac M3 Max joins as an RPC peer. Every change goes git → CI → Argo CD → live. The platform that runs this blog is the same one that runs the research copilot.\n\n## Available For\n\nVancouver-based and remote-friendly. Open to:\n\n**Consulting** on production RAG, LLM inference, and AI platform/SRE work.**Speaking** on sovereign/local-first AI, web performance for AI consumers, and homelab-scale inference.**Collaboration** with teams shipping real AI infrastructure who want the receipts, not the hype.\n\nTeaching by doing — production AI, not commentary. The system is the proof.\n\n## FAQ\n\n**Who is the AI engineer in Vancouver behind this site?**\nRafael Lopes (\"Rafa\") — a production AI engineer based in Vancouver, British Columbia. He builds and ships RAG pipelines, distributed LLM inference, and a sovereign research copilot on a self-hosted homelab, and documents the results in the open.\n\n**What does a production AI engineer do?**\nBuilds AI systems that serve real traffic — retrieval pipelines, LLM inference, quality gates, and the platform/SRE work to run them — rather than writing about AI from the sidelines. Here, every claim links to a live system or a measured number.\n\n**What AI does Rafael Lopes build?**\nHybrid retrieval (BM25 + TF-IDF + weighted RRF + cross-encoder rerank), distributed LLM inference across four compute architectures over the llama.cpp RPC protocol, and [exaflop.ca](http://exaflop.ca) — a sovereign, local-first research copilot for Canadian HPC.\n\n**Where can I read more?**\nThe daily cross-domain engineering brief, the how-it-works pipeline, and the infrastructure write-up — all linked below and at [blog.r-lopes.com](https://blog.r-lopes.com).", "url": "https://wpnews.pro/news/ai-engineer-in-vancouver-bc-production-ai-built-in-the-open", "canonical_source": "https://blog.r-lopes.com/posts/ai-engineer-vancouver", "published_at": "2026-06-05 14:00:00+00:00", "updated_at": "2026-06-14 02:06:29.465232+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-infrastructure", "ai-research", "ai-tools"], "entities": ["Rafael Lopes", "exaflop.ca", "K3s", "Argo CD", "Cloudflare Tunnel", "llama.cpp", "AMD ROCm", "NVIDIA CUDA"], "alternates": {"html": "https://wpnews.pro/news/ai-engineer-in-vancouver-bc-production-ai-built-in-the-open", "markdown": "https://wpnews.pro/news/ai-engineer-in-vancouver-bc-production-ai-built-in-the-open.md", "text": "https://wpnews.pro/news/ai-engineer-in-vancouver-bc-production-ai-built-in-the-open.txt", "jsonld": "https://wpnews.pro/news/ai-engineer-in-vancouver-bc-production-ai-built-in-the-open.jsonld"}}