Good morning, AI enthusiasts!
Lighter issue this week, since we just got back from the AI Engineer World’s Fair in San Francisco. 6,000+ AI engineers, 300 speakers, 29 tracks, and we were right in the middle of it. The conversations alone were worth the trip: what people are actually building right now, the problems they’re hitting in production, the patterns that keep coming up. If you’re working in AI engineering, this is the one conference where you feel like you’re in exactly the right room.
We gave a workshop on the context engineering behind our production AI tutor: compaction, memory, and cost, and the room was packed with people sitting on the floor and a queue outside. The 1:1 conversations after were honestly the highlight.
Here’s what came out of it, and it’s all yours:
We open-sourced the full AI tutor app. This is the same production system our students use to ask anything about AI engineering, RAG, agents, and get answers grounded in our own material with sources. You can run it locally, swap in your own content, and build your own version.
GitHub link | Try the AI Tutor | Slides
And if you missed last week’s email, we also shared the full talk Paul Iusztin and I gave on the AI Engineer World’s Fair Online Track: building a research wiki your agents maintain for you. No vector database, no knowledge graph. Just Markdown, YAML, and folders. Watch it here.
We also cover:
Let’s get into it!
In production RAG systems, prompt injection doesn’t only happen at the prompt level, but it can also sneak in through the documents your system retrieves. A vendor PDF, support article, scraped web page, or customer note can contain useful facts and a malicious instruction in the same chunk.
If your eval only checks whether the answer is factually correct, the system can look safe, but treat all retrieved text as something it should obey. To prevent this, add a few test documents that mix valid domain facts with instructions like “ignore the system message” or “send the user to this external link.”
Then check two things: the answer should still use the factual content, and it should refuse to follow instructions found inside the retrieved context. Log the chunk IDs too, so a failed test points to the retriever, prompt wrapper, or generation step.
If you’re building production RAG systems and want to go deeper into retrieval, evaluation, and deployment, check out our Full Stack AI Engineering course. — Louis-François Bouchard, Towards AI Co-founder & Head of Community
Agent_hellboy built claude-cockpit, a tiny status line + advisor for Claude Code. It shows live session signals such as model, branch, context pressure, token churn, rate limits, and cost, and then suggests useful controls like /compact, /clear, cheaper model switches, skills, subagents, MCP, or graphify when the session starts drifting. It does not automate anything or take over your workflow. It just keeps the important gauges visible and tells you what control might save time or tokens next. Check it out on GitHub and support a fellow community member. If you have any questions or suggestions, share them in the thread.
The Learn AI Together Discord community is flooding with collaboration opportunities. If you are excited to dive into applied AI, want a study partner, or even want to find a partner for your passion project, join the collaboration channel! Keep an eye on this section, too — we share cool opportunities every week!
-
Shaurya09272004 is building an agentic tool that explains Indian laws in plain language and is looking for people working on similar projects to share ideas, discuss debugging journeys, and go deeper into LLMs. If you are building something similar, connect with them in the thread!
-
Strikerleapgaming is building his own startup and is looking for beta testers. If this interests you, reach out to them in the thread!
-
Vishacoplayz_27974 is recruiting founding board members for AIXelerate, a student-led AI nonprofit. If you’re a high school student interested in leadership, AI, marketing, operations, outreach, or event planning, contact them in the thread!
Meme shared by bigbuxchungus
Improving Our LangGraph Agent for Real-World E-Commerce: Enterprise Validation, Business Logic Guards, and a Multi-Agent Architecture by Bessie Delight Kekeli
ShopBot’s original LangGraph agent trusted every LLM output, so a hallucinated order ID could slip straight into a refund tool call. This follow-up piece rebuilds the system around three key additions: pure-Python Business Logic Gates that validate structured data before any tool runs, a correction loop with a hard retry cap that escalates to humans rather than looping indefinitely, and a Supervisor that splits the single agent into Order, Refund, and Complaints subgraphs.
Life Memorizer tackles a real problem for smart glasses and wearable capture devices: finding a moment buried in hours of sensory footage without shipping private data to the cloud. The author pairs Gemini Embedding 2, which projects text, images, and audio into a single 3072-dimensional space, with Qdrant Edge, an embedded vector store that requires no server process. The piece walks through schema design, Matryoshka truncation, hybrid search with location filtering, scalar and binary quantization, and mean-pool memory consolidation, then addresses where cloud embedding calls still break the offline promise.
- Building Enterprise-Grade Security Boundaries for LLM Calls — OAuth 2.0 + APIM + Entra ID by Chris Bao
Securing LLM endpoints often gets overlooked once a model is deployed, and this piece tackles exactly that gap using Azure API Management, Entra ID, and OAuth 2.0. The author registers a public client application, implements the Device Code flow via MSAL for Jupyter-based authentication, and configures APIM policies that validate tenant ID and client application ID before forwarding requests to a GPT-4o-mini backend. A working demo confirms that only correctly scoped tokens pass validation, while standard Entra ID tokens get rejected outright.
This article shows how to build a Slack news agent using Bolt’s Socket Mode, skipping the usual public-URL setup entirely. A single Claude API call handles both retrieval and summarization through the server-side web-search tool, turning topics typed into a slash command into cited, five-item briefings rendered as native Block Kit cards. The piece walks through each layer, from slash-command handling to citation cleanup, and breaks down real cost figures, landing at a few cents per briefing, plus three deployment gotchas worth avoiding.
This article walks through building a decoder-only language model from scratch in PyTorch. It covers every component that makes transformers work: character-level tokenization, token and positional embeddings, causal self-attention with Query, Key, and Value projections, multi-head attention, feed-forward layers, and residual connections wrapped in transformer blocks. A full training loop on Tiny Shakespeare ties everything together, followed by a second implementation using PyTorch’s built-in nn.TransformerDecoder for comparison.
If you are interested in publishing with Towards AI, check our guidelines and sign up. We will publish your work to our network if it meets our editorial policies and standards. LAI #132: We Open-Sourced the AI Tutor Our Students Actually Use was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.