Define the state of our agent

wpnews.pro

Meta: Learn how to eliminate LLM hallucinations in career coaching apps using Agentic Workflows and RAG, as seen in the architecture of CVChatly.

In my ten years working at the intersection of Human Resources and IT, I've seen a recurring pattern: the gap between a candidate's actual skill set and how an AI interprets it. When I began developing CVChatly, my goal was to create an automated career coach that didn't just "chat," but actually provided strategic, data-driven advice based on a user's specific professional history.

However, I hit a wall immediately: Hallucinations.

When you ask a standard LLM, "Based on my resume, what roles should I apply for?", the model often tries to be "too helpful." It begins inventing certifications the user doesn't have or suggesting roles that require a PhD when the user only has a Bachelor's. In HR, this isn't just a technical glitch; it's a failure of trust. If a career coach lies to a candidate, the entire value proposition vanishes.

To solve this, I had to move beyond simple prompting. I needed an architecture that forced the LLM to ground its answers in factual data and verify its own logic. This led me to the implementation of RAG (Retrieval-Augmented Generation) and Agentic Workflows.

Most developers start with a "System Prompt" like: “You are an expert career coach. Analyze the following resume and provide advice.”

While this works for general summaries, it fails in the "last mile" of accuracy for three reasons:

To fix this, I architected CVChatly using a modular approach where the LLM acts as a "reasoner" rather than a "database."

The core of CVChatly relies on two pillars: a Vector Database for factual retrieval and a Graph-based Agentic Workflow for execution.

Instead of feeding the entire resume into every prompt, I implemented a RAG pipeline. Here is the flow:

text-embedding-3-small

.RAG alone isn't enough. If the retriever pulls the wrong chunk, the LLM will still hallucinate based on that wrong data. This is where Agentic Workflows come in. Instead of a linear sequence, I used LangGraph to create a cyclic graph where the AI can loop back and correct itself.

The CVChatly workflow follows this cycle:

Plan $\rightarrow$ Retrieve $\rightarrow$ Synthesize $\rightarrow$ Verify $\rightarrow$ Refine.

Below is a simplified implementation of the verification loop. The key is the verify_facts

node, which acts as a "critic" to ensure the output is supported by the retrieved documents.

import operator
from typing import Annotated, List, TypedDict
from langgraph.graph import StateGraph, END

class AgentState(TypedDict):
    query: str
    context: str
    response: str
    is_accurate: bool
    iterations: int

def retrieve_context(state: AgentState):
    query = state['query']
    return {"context": "User has 5 years of experience in Python and AWS, but no Java experience."}

def generate_advice(state: AgentState):
    context = state['context']
    query = state['query']
    response = "You should apply for Java Developer roles." # Simulated hallucination
    return {"response": response}

def verify_facts(state: AgentState):
    context = state['context']
    response = state['response']

    if "Java" in response and "no Java experience" in context:
        return {"is_accurate": False}
    return {"is_accurate": True}

workflow = StateGraph(AgentState)

workflow.add_node("retrieve", retrieve_context)
workflow.add_node("generate", generate_advice)
workflow.add_node("verify", verify_facts)

workflow.set_entry_point("retrieve")
workflow.add_edge("retrieve", "generate")
workflow.add_edge("generate", "verify")

workflow.add_conditional_edges(
    "verify",
    lambda x: "generate" if not x["is_accurate"] and x.get("iterations", 0) < 3 else END
)

app = workflow.compile()

In this architecture, the verify_facts

node acts as a quality gate. If the AI suggests a skill the user doesn't possess, the loop triggers a re-generation. This significantly reduces the hallucination rate by forcing the model to confront its own errors.

Building this taught me that "technical correctness" isn't the only challenge. In HR, nuance is everything. I implemented three specific strategies to handle complex coaching scenarios:

I explicitly instructed the agent: "If the retrieved context does not contain the answer, state that you don't have enough information. Do not guess." This prevents the model from filling in gaps with "likely" but false information.

To make the coaching actionable, the agent doesn't just look at the resume; it performs a Gap Analysis. It retrieves the job description (JD), retrieves the resume, and identifies the delta.

Career coaching is a conversation. I used Checkpointers in LangGraph to maintain the state across multiple turns, ensuring that if a user says "What about the first role I mentioned?", the agent remembers the context without needing to re-process the entire document.

How do we know it's working? I implemented a "Ground Truth" evaluation dataset. I took 50 resumes and 50 specific questions with known correct answers.

Metric	Single Prompt (Baseline)	RAG (Naive)
Hallucination Rate
35%	12%	2%
Fact Accuracy
60%	82%	96%
Relevance
70%	85%	92%

The jump from 12% to 2% hallucination is what makes the difference between a "toy" and a "tool."

As I continue to evolve CVChatly, the next step is Multi-Agent Orchestration. Imagine one agent acting as the "Recruiter" (critiquing the resume), another as the "Career Coach" (suggesting improvements), and a third as the "Fact-Checker" (ensuring everything is grounded in the resume).

The shift from a "Chatbot" to an "Agentic System" is the most important transition any AI engineer can make right now. We are moving away from hoping the LLM gets it right and moving toward building systems that ensure the LLM gets it right.

If you are building LLM-powered applications where accuracy is critical (Legal, Medical, HR), follow these rules:

While the backend logic is the engine, the user experience is the chassis. For CVChatly, ensuring that the frontend delivers these complex AI responses without lagging was key. When building high-traffic AI tools, I always recommend monitoring your site's performance and SEO health to ensure users can actually find and use your tool. If you're unsure how your current site is performing, I highly recommend using inspect-my-site.com to get a comprehensive audit of your technical SEO and performance metrics. A great AI backend is useless if your site's speed or SEO prevents users from accessing it.

What are you using to handle hallucinations in your LLM projects? Are you sticking to RAG, or have you moved toward agentic loops? Let's discuss in the comments!

About the Author:

Maria Jose Gonzalez Antelo is a professional content writer and AI solutions expert with nearly a decade of experience in IT Human Resources. She specializes in bridging the gap between technical infrastructure and human-centric organizational growth.

source & further reading

dev.to — original article Looking to Collaborate with Developers on AI, Web, or Startup Projects I Wrote Integration Tests for My MCP Failure Library. Here's the Pattern That Caught 3 Hidden Bugs. Stop letting your AI agents hallucinate test failures

Define the state of our agent

Run your AI side-project on zahid.host