Add Guardrails So Your AI App Doesn't Lie — A Two-Layer Approach with NVIDIA NIM

The article describes a two-layer approach to prevent AI assistants from generating false information, using NVIDIA NIM. The first layer tightens the system prompt to define what the assistant is allowed to discuss, while the second layer adds a separate LLM call that checks whether the generated answer is supported by the retrieved context before delivering it. The author, B Torkian from USC, explains that this method hardens the assistant's refusal to answer out-of-scope questions, transforming a demo into a more reliable system suitable for student use.

In Part 1 https://dev.to/torkian/build-your-first-ai-app-with-nvidia-nim-in-30-minutes-1i43 we got a USC campus assistant talking. In Part 2 we taught it to retrieve only the relevant context. Both posts ended with the same observation — when someone asked for the wifi password, the assistant refused. That refusal worked because we told it to. It would have just as happily made something up if we'd phrased the prompt differently. This post is about hardening that refusal so it's not luck. Two guardrail layers, both small enough to read in one sitting, neither requiring a framework. First, tighten the prompt so the assistant knows what it's allowed to talk about. Second, add a second LLM call that re-reads the answer and the context and decides whether to ship the answer or refuse. I'm B Torkian, NVIDIA Developer Champion at USC. This is the layer where a demo becomes something I'd actually let students use. What you're adding User question → retrieve top-k context from Part 2 → scoped prompt: model answers OR returns the exact fallback line → grounding check: a second NIM call asks "is the answer supported by the context?" → ship the answer, or replace it with the fallback line The chat call and the embedding setup carry over from Parts 1 and 2. Everything new in this post is fewer than 40 lines. Why guardrails are not optional The retrieval step from Part 2 narrowed what the model sees. It does nothing to stop the model from being clever with the data it has, or from drifting into topics outside the assistant's job. Two real failure modes I've seen in student demos: - Out-of-scope creep. Someone asks "can you write my breakup text?" The model is happy to oblige. The retriever pulled three USC chunks cosine just returns something , the prompt didn't forbid relationship advice, so the model wrote the text. - Confident-sounding hallucinations. The retrieved chunk says "Monday to Friday, 10 AM to 6 PM." The user asks about Saturday hours. The model decides the friendly answer is "Saturday hours are 11 AM to 4 PM" — a fabrication that sounds like a reasonable inference. The first failure is solved by prompt scope . The second is what the grounding check is for. Step 1 — Setup self-contained If you already have Workshops 1 + 2 running in the same Colab session, skip this cell. If you're starting fresh, paste this in — it bundles the client, the embedding model, the USC knowledge base, and the retriever from Parts 1 and 2 so the rest of this post stands on its own. python %pip install -q openai numpy import os, getpass from openai import OpenAI import numpy as np if not os.getenv "NVIDIA API KEY" : os.environ "NVIDIA API KEY" = getpass.getpass "Paste your NVIDIA API key starts with nvapi- : " client = OpenAI base url="https://integrate.api.nvidia.com/v1", api key=os.environ "NVIDIA API KEY" , MODEL = "meta/llama-3.1-8b-instruct" EMBED MODEL = "nvidia/nv-embedqa-e5-v5" def ask system prompt, user message : response = client.chat.completions.create model=MODEL, messages= {"role": "system", "content": system prompt}, {"role": "user", "content": user message}, , temperature=0.3, max tokens=400, return response.choices 0 .message.content knowledge base = {"title": "USC AI Club meeting", "text": "The USC AI Club meets every Thursday at 5 PM in the engineering building, room 204."}, {"title": "USC GPU lab hours", "text": "The USC GPU computing lab is open Monday to Friday from 10 AM to 6 PM."}, {"title": "NVIDIA Developer Program", "text": "USC students can join the NVIDIA Developer Program for free."}, {"title": "Next USC workshop", "text": "The next USC AI Club workshop will cover Retrieval Augmented Generation RAG ."}, {"title": "USC AI/ML office hours", "text": "Office hours for the USC AI/ML faculty are Tuesdays 2-4 PM."}, {"title": "USC robotics lab", "text": "The USC robotics lab requires safety training before students can use the soldering station."}, {"title": "USC tutoring", "text": "Peer tutoring for introductory Python at USC is available Wednesdays from 1 PM to 3 PM."}, def embed texts texts, input type="passage" : response = client.embeddings.create model=EMBED MODEL, input=texts, extra body={"input type": input type}, return np.array item.embedding, dtype=np.float32 for item in response.data def cosine similarity a, b : denom = np.linalg.norm a np.linalg.norm b if denom == 0: return 0.0 return float np.dot a, b / denom def retrieve context question, k=3 : q emb = embed texts question , input type="query" 0 scored = cosine similarity q emb, item "embedding" , item for item in knowledge base scored.sort key=lambda p: p 0 , reverse=True return "\n".join f"- {item 'text' }" for , item in scored :k for item, emb in zip knowledge base, embed texts i "text" for i in knowledge base , "passage" : item "embedding" = emb print f"Ready. Embedded {len knowledge base } chunks." That cell defines everything Workshops 1 and 2 produced. The Part 3 code below builds on ask , retrieve context , and the embedded knowledge base . Step 2 — Layer 1: prompt scope with a fixed fallback line FALLBACK = "I don't have that information — check with the USC AI Club." SCOPED SYSTEM PROMPT TEMPLATE = """You are a USC campus assistant for AI Club, GPU lab, NVIDIA program, workshop, office hour, robotics lab, and tutoring questions only. Rules: - Answer ONLY using the CONTEXT below. - If the user asks about anything outside this scope e.g. weather, jokes, personal advice, code generation, general world knowledge , reply with exactly: "{fallback}" - If the answer is not present in the context, reply with exactly: "{fallback}" - Do not invent names, dates, room numbers, links, passwords, schedules, policies, or instructions that are not in the context. CONTEXT: {context} """ Three things are doing work in this prompt: - A finite topic list. The assistant has a job description. "Anything outside this scope" gives the model a clear opt-out — it doesn't have to guess what's in-bounds. - One exact fallback string. Same wording, every time. This matters in Step 3 — the grounding check returns the same string when it overrides, so downstream code only has to recognize one shape. - An explicit don't-invent list. Models are pliable. Spelling out the dangerous categories room numbers, passwords, policies lowers hallucination noticeably with no extra calls. This layer alone catches most off-topic and most "the context didn't mention it" cases. Step 3 — Layer 2: a grounding check on every answer The scoped prompt is a request — the model can still ignore it. Layer 2 is a separate, narrower NIM call whose only job is to look at the context and the answer and decide whether the answer is supported. php def answer is grounded question: str, context: str, answer: str - bool: verdict = ask system prompt= "You are a strict grounding verifier. Read the CONTEXT and the " "ANSWER. Respond with only 'yes' or 'no'. Say 'yes' if every " "factual claim in the ANSWER is directly supported by the CONTEXT. " "Say 'no' otherwise — including if the ANSWER adds information not " "in the CONTEXT, even if that information sounds plausible." , user message= f"CONTEXT:\n{context}\n\n" f"QUESTION:\n{question}\n\n" f"ANSWER:\n{answer}\n\n" "Is every factual claim in the ANSWER supported by the CONTEXT?" , return verdict.strip .lower .startswith "yes" Three things to notice: - It's just another — same client, same hosted NIM model, no new infrastructure. Layer 2 costs one extra call per question. ask call - Yes/no only. Constraining the response shape makes the parsing reliable. If the verifier waffles "yes, but..." , we treat that as a fail by checking the start of the string only. - It can be wrong too. The verifier is itself an LLM. For workshop-grade safety this is fine; for production you'd add deterministic checks regex for room numbers, exact string match for fallback on top. Step 4 — Wire both layers into ask guarded php def ask guarded question: str - str: context = retrieve context question from Part 2 system prompt = SCOPED SYSTEM PROMPT TEMPLATE.format fallback=FALLBACK, context=context, answer = ask system prompt, question Layer 1 if not answer is grounded question, context, answer : return FALLBACK Layer 2 override return answer for question in "When does the USC AI Club meet?", in scope, in context "Can you write my breakup text?", OUT of scope "What is the wifi password?", in scope, NOT in context "What are the USC GPU lab Saturday hours?", invites a hallucination : print f"Q: {question}" print f"A: {ask guarded question }\n" Read the output carefully. - The AI Club question returns a real answer from the context. Both layers pass. - The breakup-text question hits Layer 1 — the scope rule catches it. - The wifi question also hits Layer 1 — nothing in the context mentions passwords, the scoped prompt forbids inventing them. - The Saturday-hours question is the one that earns its keep. The context says "Monday to Friday." A friendlier model would guess "closed on Saturday." Layer 2 reads that answer, sees "Saturday" is not in the context, and returns the fallback instead. Step 5 — What you actually built You took the retriever from Part 2 and put it inside two cheap, inspectable guardrails. The whole thing is still one Python file, still one hosted NIM endpoint, still no vector database. The mental model is: - Retrieval decides what the model sees . - Scoped prompt decides what the model is allowed to write . - Grounding check decides whether what the model wrote ships . Real production systems extend each of these — deterministic rule checks, structured output, confidence thresholds, dedicated safety models, human review queues. The shape stays the same. Every additional layer is a yes/no gate between the user's question and the final response. Get the code Repo: github.com/torkian/nvidia-nim-workshop https://github.com/torkian/nvidia-nim-workshop One-click Colab for Part 3: Open part3 guardrails.ipynb https://colab.research.google.com/github/torkian/nvidia-nim-workshop/blob/main/part3 guardrails.ipynb Local Python: part3 guardrails.py in the repo python3 part3 guardrails.py after pip install -r requirements.txt .MIT licensed. I run this at USC — fork it, swap the knowledge base for your school, your club, your project, and run it wherever you are. Previously / next in this series - Part 1: Build Your First AI App with NVIDIA NIM in 30 Minutes https://dev.to/torkian/build-your-first-ai-app-with-nvidia-nim-in-30-minutes-1i43 - Part 2: From Manual RAG to Real Retrieval — Embedding-Based RAG with NVIDIA NIM - Part 4 next : Run NIM on Your Own GPU — same OpenAI-compatible API, different endpoint. Useful when you want data locality, predictable latency, or a self-hosted dev loop.