Define the state of our agent A developer built CVChatly, an automated career coach, using RAG and Agentic Workflows to eliminate LLM hallucinations. The architecture uses a vector database for factual retrieval and a LangGraph-based cyclic workflow that verifies and refines outputs, ensuring advice is grounded in the user's actual resume data. Meta: Learn how to eliminate LLM hallucinations in career coaching apps using Agentic Workflows and RAG, as seen in the architecture of CVChatly. In my ten years working at the intersection of Human Resources and IT, I've seen a recurring pattern: the gap between a candidate's actual skill set and how an AI interprets it. When I began developing CVChatly , my goal was to create an automated career coach that didn't just "chat," but actually provided strategic, data-driven advice based on a user's specific professional history. However, I hit a wall immediately: Hallucinations . When you ask a standard LLM, "Based on my resume, what roles should I apply for?", the model often tries to be "too helpful." It begins inventing certifications the user doesn't have or suggesting roles that require a PhD when the user only has a Bachelor's. In HR, this isn't just a technical glitch; it's a failure of trust. If a career coach lies to a candidate, the entire value proposition vanishes. To solve this, I had to move beyond simple prompting. I needed an architecture that forced the LLM to ground its answers in factual data and verify its own logic. This led me to the implementation of RAG Retrieval-Augmented Generation and Agentic Workflows . Most developers start with a "System Prompt" like: “You are an expert career coach. Analyze the following resume and provide advice.” While this works for general summaries, it fails in the "last mile" of accuracy for three reasons: To fix this, I architected CVChatly using a modular approach where the LLM acts as a "reasoner" rather than a "database." The core of CVChatly relies on two pillars: a Vector Database for factual retrieval and a Graph-based Agentic Workflow for execution. Instead of feeding the entire resume into every prompt, I implemented a RAG pipeline. Here is the flow: text-embedding-3-small .RAG alone isn't enough. If the retriever pulls the wrong chunk, the LLM will still hallucinate based on that wrong data. This is where Agentic Workflows come in. Instead of a linear sequence, I used LangGraph to create a cyclic graph where the AI can loop back and correct itself. The CVChatly workflow follows this cycle: Plan $\rightarrow$ Retrieve $\rightarrow$ Synthesize $\rightarrow$ Verify $\rightarrow$ Refine. Below is a simplified implementation of the verification loop. The key is the verify facts node, which acts as a "critic" to ensure the output is supported by the retrieved documents. python import operator from typing import Annotated, List, TypedDict from langgraph.graph import StateGraph, END Define the state of our agent class AgentState TypedDict : query: str context: str response: str is accurate: bool iterations: int def retrieve context state: AgentState : Logic to query Pinecone for relevant resume sections query = state 'query' simulated retrieval = vector db.similarity search query return {"context": "User has 5 years of experience in Python and AWS, but no Java experience."} def generate advice state: AgentState : Generate the coaching response based on retrieved context context = state 'context' query = state 'query' response = llm.invoke f"Based on {context}, answer: {query}" response = "You should apply for Java Developer roles." Simulated hallucination return {"response": response} def verify facts state: AgentState : The 'Critic' node: Does the response contradict the context? context = state 'context' response = state 'response' In a real scenario, another LLM call checks for contradictions if "Java" in response and "no Java experience" in context: return {"is accurate": False} return {"is accurate": True} Construct the Graph workflow = StateGraph AgentState workflow.add node "retrieve", retrieve context workflow.add node "generate", generate advice workflow.add node "verify", verify facts workflow.set entry point "retrieve" workflow.add edge "retrieve", "generate" workflow.add edge "generate", "verify" Conditional logic: if not accurate, go back to generate workflow.add conditional edges "verify", lambda x: "generate" if not x "is accurate" and x.get "iterations", 0 < 3 else END app = workflow.compile In this architecture, the verify facts node acts as a quality gate. If the AI suggests a skill the user doesn't possess, the loop triggers a re-generation. This significantly reduces the hallucination rate by forcing the model to confront its own errors. Building this taught me that "technical correctness" isn't the only challenge. In HR, nuance is everything. I implemented three specific strategies to handle complex coaching scenarios: I explicitly instructed the agent: "If the retrieved context does not contain the answer, state that you don't have enough information. Do not guess." This prevents the model from filling in gaps with "likely" but false information. To make the coaching actionable, the agent doesn't just look at the resume; it performs a Gap Analysis . It retrieves the job description JD , retrieves the resume, and identifies the delta. Career coaching is a conversation. I used Checkpointers in LangGraph to maintain the state across multiple turns, ensuring that if a user says "What about the first role I mentioned?", the agent remembers the context without needing to re-process the entire document. How do we know it's working? I implemented a "Ground Truth" evaluation dataset. I took 50 resumes and 50 specific questions with known correct answers. | Metric | Single Prompt Baseline | RAG Naive | Agentic RAG CVChatly | |---|---|---|---| Hallucination Rate | 35% | 12% | 2% | Fact Accuracy | 60% | 82% | 96% | Relevance | 70% | 85% | 92% | The jump from 12% to 2% hallucination is what makes the difference between a "toy" and a "tool." As I continue to evolve CVChatly, the next step is Multi-Agent Orchestration . Imagine one agent acting as the "Recruiter" critiquing the resume , another as the "Career Coach" suggesting improvements , and a third as the "Fact-Checker" ensuring everything is grounded in the resume . The shift from a "Chatbot" to an "Agentic System" is the most important transition any AI engineer can make right now. We are moving away from hoping the LLM gets it right and moving toward building systems that ensure the LLM gets it right. If you are building LLM-powered applications where accuracy is critical Legal, Medical, HR , follow these rules: While the backend logic is the engine, the user experience is the chassis. For CVChatly, ensuring that the frontend delivers these complex AI responses without lagging was key. When building high-traffic AI tools, I always recommend monitoring your site's performance and SEO health to ensure users can actually find and use your tool. If you're unsure how your current site is performing, I highly recommend using inspect-my-site.com https://inspect-my-site.com to get a comprehensive audit of your technical SEO and performance metrics. A great AI backend is useless if your site's loading speed or SEO prevents users from accessing it. What are you using to handle hallucinations in your LLM projects? Are you sticking to RAG, or have you moved toward agentic loops? Let's discuss in the comments About the Author: Maria Jose Gonzalez Antelo is a professional content writer and AI solutions expert with nearly a decade of experience in IT Human Resources. She specializes in bridging the gap between technical infrastructure and human-centric organizational growth.