Building a Conversational Flight Booking Assistant from Scratch with LangGraph, OpenAI API and…

A developer built a conversational flight booking assistant from scratch using LangGraph and OpenAI API, designed to handle task-oriented dialogues for IndiGo Airlines workflows including flight booking, web check-in, and flight status enquiries. The assistant actively collects missing information, validates inputs, and maintains state across multiple turns, demonstrating patterns applicable to other goal-oriented AI systems like insurance claims and customer onboarding.

Non-members can read here for free https://medium.com/@alphaiterations/building-a-conversational-flight-booking-assistant-from-scratch-with-langgraph-openai-api-and-6fef2b4e8cc3?source=friends link&sk=de8f0bb16190286d51d4a2f8226d391e . Not all AI applications are solving the same kind of problem. Some tasks are fundamentally transformational . A summarization system converts a document into a summary. A translation system converts text from one language to another. An extraction pipeline converts unstructured content into structured data. Given the required input, the system can usually produce an output immediately. Others are information-seeking systems . Search, RAG, and knowledge assistants fall into this category. The user asks a question, the system retrieves or synthesizes information, and returns an answer. Even when conversation history is maintained, the interaction is largely user-driven: the user asks, the system responds. Flight booking is different. A booking assistant cannot complete its task with the information available in the first message. The user might say: “Book me a flight to Mumbai.” But a successful booking requires much more: departure city, travel dates, passenger count, trip type, flight selection, passenger details, and contact information. The assistant must actively collect missing information, validate inputs, recover from mistakes, maintain state across multiple turns, and guide the conversation toward a successful outcome. This class of problems is commonly known as Task-Oriented Dialogue TOD or Goal-Oriented AI Systems . Unlike information-retrieval agents, these systems are responsible for driving a business process to completion. The challenge is no longer generating a correct response; it is managing a workflow. In this article, we will build, a production-grade IndiGo Airlines booking assistant using LangGraph and OpenAI API. The assistant supports flight booking, web check-in, and flight status enquiries across Streamlit and Telegram while maintaining conversational state. More importantly, the patterns discussed here extend far beyond airline reservations. The same architectural principles apply to insurance claims, customer onboarding, appointment scheduling, loan applications, technical troubleshooting, and any workflow where an AI agent must progressively gather information and drive a task to completion. By the end of this article, you’ll understand how to design stateful, goal-oriented AI systems that move beyond answering questions and start completing real-world business processes. Airlines handle millions of customer interactions related to flight bookings, web check-ins, and flight status enquiries. Traditionally, these interactions are completed through websites, mobile apps, or customer support agents, requiring users to navigate multiple screens and forms before completing a task. The goal is to build an AI-powered assistant that allows customers to complete these workflows through a natural conversation. Instead of filling forms manually, users should be able to express their intent in plain language, while the assistant guides them through the process and completes the required actions. This project is inspired by IndiGo’s 6Eskai virtual assistant and focuses on implementing a simplified version of its flight booking, web check-in, and flight status workflows. As it is a conversational experience, there are various challenges like user might not tell all the information in one go, or user can provide city information for a question which was intended for travel date. Listing down the major challenges here: User: Book me a flight to MumbaiBot: Sure. What is your departure city?User: JaipurBot: What date would you like to travel?-----------------The assistant must identify missing information and dynamically collect it through follow-up questions. User: Book a flight from Bombay to Bangalore-----------------The assistant must resolve:Bombay → Mumbai BOM Bangalore → Bengaluru BLR User: Book a flight from Bombay to Bangalore-----------------The assistant must resolve:Bombay → Mumbai BOM Bangalore → Bengaluru BLR before performing the flight search. Bot: What date would you like to travel? User: to Jaipur on 12th July-------------------Here assistant should be able to understand that User has providedboth departure city and date User: Book a flight from Jaipur to Mumbai yesterday-----------------The assistant should detect that the travel date is invalid and request a valid future date.User: Book a flight to Springfield------------------If no supported airport can be resolved, the assistant should gracefully ask for clarification rather than failing the conversation. The assistant supports three workflows: It runs on two channels simultaneously: a Streamlit web UI and a Telegram bot . Both channels share a single compiled LangGraph state machine. Below is the tech-stack of the project: LLM : OpenAI GPT-4o-miniAgent framework : LangGraphWeb UI : StreamlitBot channel : python-telegram-bot Database : SQLite3 Complete Code is Present here: flight-booking-assistant agentic-ai-usecases/advanced/flight-booking-assistant at main · alphaiterations/agentic-ai-usecases https://github.com/alphaiterations/agentic-ai-usecases/tree/main/advanced/flight-booking-assistant - Python 3.11+ - An OpenAI API key - Basic familiarity with Python async and dictionary-based state git clone https://github.com/vijendrajain/agentic-ai-usecasescd advanced/flight-booking-assistantpip install -r requirements.txt Create a .env file at the project root: OPENAI API KEY=sk-…TELEGRAM BOT TOKEN=8 TELEGRAM BOT USERNAME=FlightBookingAIBot Note : Please refer below link to know how to create TELEGRAM BOT TOKEN and TELEGRAM BOT USERNAME on telegram: From BotFather to 'Hello World' https://core.telegram.org/bots/tutorial obtain-your-bot-token To keep the system maintainable and extensible, the codebase is divided into independent layers for conversational agents, workflow nodes, business services, shared utilities, and user-facing channels. Each layer has a well-defined responsibility within the booking workflow. Below is the repo structure: flight-booking-assistant/├── app.py Streamlit web UI entry point├── telegram bot.py Telegram bot adapter├── graph/│ ├── init .py Top-level StateGraph + dispatch route│ ├── booking subgraph.py Compiled booking flow subgraph│ └── pnr subgraph.py Compiled PNR / check-in / status subgraph├── state.py BookingState TypedDict 5 sub-types + Passenger ├── constants.py Step, Intent, Process enums; CITY TO CODE map├── config.py Settings loaded from .env├── nodes/ All node functions LLM-calling and pure-Python │ ├── router.py Intent classification booking / check-in / status │ ├── information extractor.py Slot, PNR, and passenger extraction│ ├── slot validator.py Per-field validation + retry counter updates│ ├── city lookup.py City name → IATA code resolution│ ├── conversation driver.py Slot sequencing, Phase 1 & 2 flow orchestration│ ├── flight selection.py Parses user's flight choice from numbered list│ ├── booking guardrail.py Guards against mid-flow process switching│ ├── confirmation.py Pre-search booking confirmation prompt│ ├── payment.py Payment step stub, ready for Stripe/Razorpay │ └── done.py Session teardown and final response├── services/│ ├── flight search.py SQLite query + dynamic pricing calculation│ ├── pnr lookup.py PNR / check-in / flight status database lookup│ ├── booking save.py Persists confirmed bookings to the database│ └── session store.py Session persistence across page refreshes├── utils/│ ├── llm.py call llm json wrapper + observability logging│ ├── db.py SQLite connection and query helpers│ ├── formatting.py Flight list and message formatters│ ├── user messages.py User-facing string constants│ └── prompts/ LLM prompt templates split by concern │ ├── extraction.py Slot, PNR, passenger extraction prompts│ ├── conversation.py Routing, retry, persona prompts│ └── classification.py Intent classification prompts├── indigo airline.db Pre-loaded SQLite database 17 tables ├── sessions.db Session state persistence across page refreshes├── create airline db.py Script used to seed the database└── requirements.txt Each directory has a single responsibility: agents/ holds LLM-calling nodes, nodes/ holds pure-Python nodes, services/ handles database queries, and utils/ provides shared infrastructure. The two channel adapters app.py for streamlit and telegram bot.py for telegram contain no business logic. They call booking graph.invoke and display the result. To support realistic airline workflows, We first create a synthetic airline database instead of relying on mocked responses. The dataset is generated from publicly available IndiGo route information and populated with synthetic customers, bookings, passengers, payments, baggage records, flight instances, and delay data. Here we use python faker library. The dataset is fully configurable, allowing the size and coverage of the airline network to be adjusted through a few parameters: create airline db.pyclass DBConfig: Reproducibility random seed = 42 Date window — anchored to today so data is always futuristic schedule start = datetime.now schedule years = 2 flight instance years = 2 flight instance sample weeks = 1 Airport filter "all" → every IndiGo airport airport list is ignored "selected" → only airports in airport list airport list type = "selected" airport list = "DEL", "BOM", "BLR", "MAA", "HYD", "CCU", major metros "AMD", "PNQ", "COK", "GOI", "JAI", "LKO", tier-2 "NAG", "IXC", "PAT", "BBI", "SXR", Volume num customers = 100 num bookings = 500 How many flights from the schedule to generate instances for. Lower this to reduce FlightInstances rows and DB size. None = use all flights in the schedule. max flights for instances = 20 Paths db path = os.path.join os.path.dirname file , "indigo airline.db" routes url = "https://raw.githubusercontent.com/alphaiterations/data-for-agents" "/main/airlines-data/airline routes.json" This makes it easy to generate anything from a lightweight demo dataset to a much larger airline reservation system by simply changing a few configuration values. Instead of manually maintaining flight routes, the generator extracts only IndiGo-operated routes from publicly available airline route data before creating the flight schedule. create airline db.pyfor carrier in route "carriers" : if carrier.get "iata" == "6E": indigo routes.append { "origin": airport code, "destination": dest, "distance km": route.get "km" , "duration mins": route.get "min" , } Multiple daily departures are generated automatically for every supported route, creating a realistic flight schedule. create airline db.pydeparture times = "06:00", "09:30", "12:00", "15:30", "18:00", "21:00" for route in unique routes: for departure time in departure times: create flight ... Flight schedules describe recurring flights, while flight instances represent a specific flight on a particular date. A small percentage of instances are randomly assigned delays, enabling realistic flight status demonstrations. for days ahead in range 0, instance days, step days : current date = start + timedelta days=days ahead create flight instance ... if random.random < 0.05: create delay ... In the terminal navigate to flight-booking-assistant folder and run below command to create the synthetic db. cd flight-booking-assistantpython create airline db.py Output Config: seed=42, start=2026-06-20, schedule years=2, customers=100, bookings=500Removed existing database: /Users/current user/agentic-ai-usecases/advanced/flight-booking-assistant/indigo airline.dbFetching airline routes.json from: https://raw.githubusercontent.com/alphaiterations/data-for-agents/main/airlines-data/airline routes.jsonRoutes data fetched successfully.IndiGo routes extracted 17 selected airports : 210Database schema created successfully.Flight schedule created: 1260 flights.Days of operation inserted for all flights.Generating 100 synthetic customers...100 customers inserted.Generating 500 bookings...500 bookings inserted.Generating flight instances and delays 2 years, sampled every 7d ... Generating instances for 20 flights...Flight instances and 110 delays inserted.============================================================DATABASE SUMMARY - INDIGO AIRLINE BOOKING SYSTEM============================================================ Customers 100 FlightSchedule 1,260 DaysOfOperation 8,820 PNRs 500 Bookings 500 Passengers 1,220 Itineraries 500 ItineraryLegs 500 PassengerBaggage 1,220 FlightInstances 2,100 FlightDelays 110 Payments 500------------------------------------------------------------ TOTAL 17,330Database file : /Users/current user/agentic-ai-usecases/advanced/flight-booking-assistant/indigo airline.dbDatabase size : 1.59 MB============================================================ This creates indigo airline.db in the root folder. The three tables the agent queries most are FlightSchedule, FlightInstances, and PNRs. FlightSchedule is the source of truth for what routes exist and when flights depart. FlightInstances links a schedule entry to a specific date and carries the live status field On Time , Delayed , Cancelled . PNRs is what the web check-in and flight status flows query: the user gives a PNR code and their last name, and the agent joins PNRs to Bookings to ItineraryLegs to return their itinerary. Here is the summary of all the tables: The flight search query is straightforward: utils/db.py simplified SELECT flight id, origin airport code, destination airport code, departure time, arrival time, flight duration minutesFROM FlightScheduleWHERE origin airport code = ? AND destination airport code = ?ORDER BY departure time LIMIT 12 One deliberate simplification : pricing is not stored in the database. It is computed dynamically from the flight index and duration at query time. This keeps the schema clean and avoids needing a fare table that would require constant updates for a demo. Pricing is dynamic and computed in Python The pricing formula lives in services/flight search.py , not stored as a fixed fare: services/flight search.pyprice = 2500 + index % 4 400 + duration mins // 10 50 This creates natural price variation across flights without needing a fare table, which keeps the demo realistic without requiring a live pricing API. In LangGraph, state acts as the shared memory of the workflow. Every node reads from the current state, performs its logic, and returns updates that are merged back into the state. For a flight booking assistant, state needs to capture much more than conversation history. The system must track user inputs, booking details, workflow progress, validation errors, selected flights, and session metadata across multiple turns. To keep the state manageable, I grouped related fields into logical categories: state.pyclass Passenger TypedDict : title: str "Mr" | "Mrs" | "Miss" | "Master" first name: str last name: str age category: str "adult" | "child"class BookingState SessionState, session id, user id, channel, last active at ConversationState, messages, step, process, intent, terminated, awaiting confirmation BookingEntities, departure city, destination city, travel date, passengers, email FlightState, flights, selected flight, booking leg, confirmation step ValidationState, slot attempts, cities changed, slot error, city error : pass This approach keeps ownership clear while allowing every node in the workflow to access a unified view of the conversation. The step field is the single source of truth for where you are in the conversation. It takes values like GREETING, COLLECT SLOTS, CONFIRM BOOKING, SHOW FLIGHTS, PAYMENT, DONE, and sub-steps like collect names and collect email. Every routing decision in the graph reads step first. The slot attempts field is a dict that tracks how many times the user has failed on each specific field. This is more surgical than a global counter: a user might nail the city name in one try but keep giving dates in the past, and you want to terminate only the date field after three failures, not the whole session. ValidationState in state.pyslot attempts: Dict str, int e.g. {"travel date": 2, "departure city": 1}name attempts: int separate counter for passenger name collectionflight select attempts: int retry count for flight selectionterminated: bool True when any field exceeds MAX SLOT ATTEMPTS = 3 Round-trip bookings require storing both legs. The booking leg field tracks whether you are currently booking the outbound or return flight, and selected outbound flight holds the confirmed first leg while the user picks the second: FlightState in state.pybooking leg: str "outbound" | "return"selected outbound flight: Dict stored after user confirms outbound legselected return flight: Dict stored after user confirms return leg Gotcha:Never store derived values in state if you can compute them from other fields. Early in the project I stored total passengers alongside adults and children. They drifted. Now only adults and children live in state; totals are computed on the fly. Every LLM call in this project is driven by a prompt from utils/prompts/, which is split into three modules by concern: USER MESSAGE │ ┌───────────────┼───────────────┐ │ │ │ ▼ ▼ ▼ EXTRACTION CLASSIFICATION CONVERSATION PROMPTS PROMPTS PROMPTS │ │ │ "What did the user "Which option "What should actually say?" did the user the bot say pick?" next?" │ │ │ Strict JSON Closed vocab Free text Fixed schema One of N Persona-bounded null = not said No elaboration Empathetic │ │ │ EXTRACTION CONTEXT CONFIRM INTENT SYSTEM PERSONA EXTRACTION PROMPT PROMPT ROUTING PROMPT PASSENGER FLIGHT OUT OF SCOPE EXTRACTION PROMPT SELECTION PROMPT PROMPT PNR EXTRACTION MID FLOW INTENT RETRY MESSAGE PROMPT PROMPT PROMPT CITY LOOKUP PROMPT Every LLM call fits into one of these three boxes. If you ever find yourself writing a prompt that wants to both extract a slot and decide what to say next — split it into two. Extraction prompts enforce strict “do not infer” rules and always return a fixed schema. Classification prompts enforce a closed vocabulary affirm / deny / modify, or a 0-based index . Conversation prompts are the only place the model is allowed to produce free-form text. Keeping these three categories separate makes it easier to audit what each prompt is allowed to do. Before writing a single prompt, let’s understand what we are dealing with. An LLM does not “think” the way you do. When you ask it to extract flight information from a user message, it tries to produce the most plausible-looking response based on its training. That sounds helpful and it is but it creates three specific problems in a booking system: Problem 1: Output format is unpredictable. Ask the LLM to return JSON and it might wrap the response in a markdown code fence json ... . Ask for a number and it might say "The answer is 2 adults." For a booking system that parses LLM output programmatically, this breaks things silently. Problem 2: The model infers what users “probably meant.” A user says “Jaipur to Mumbai on the 15th, 2 adults.” The model might helpfully set trip type = "one-way" because most single-date bookings are one-way. But you never want the model to make that call — you want the user to confirm it explicitly. If the model guesses right, the confirmation step gets silently skipped. Problem 3: Null and zero mean different things, but the model may treat them the same. A user says “2 adults, no kids.” The model might return children: null because it treats "no kids" as absence of information. But null means "we haven't asked yet" and 0 means "user confirmed no children." If children stays null, the bot will ask the passenger count question all over again. These three problems drive every single prompt in this project. The fundamental insight is this: not all LLM outputs are equal. Some need to be strict and parseable. Some need to choose from a fixed set. Only some need to be expressive. If you mix these requirements in one prompt, you get a confused model. So we organise all eleven prompts into three separate families based on what kind of output they produce: Every user message the bot receives leads to one of three questions: ┌──────────────────────────────────────────────────────────────────┐ │ "What did the user actually say?" → Extraction Prompts │ │ Strict JSON, fixed schema, null for anything not stated. │ │ The model is a parser, not a reasoner. │ ├──────────────────────────────────────────────────────────────────┤ │ "Which option did the user pick?" → Classification Prompts │ │ Closed vocabulary. Pick exactly one bucket. No elaboration. │ ├──────────────────────────────────────────────────────────────────┤ │ "What should the bot say next?" → Conversation Prompts │ │ Free text allowed, but always bounded by a shared persona. │ └──────────────────────────────────────────────────────────────────┘ This separation is the most important decision in the entire prompt layer. It determines how you write each prompt, what you constrain, and how you call the model. Let us go through each family. Before any prompt runs, two settings eliminate the first problem unpredictable format at the API call level: php utils/llm.pydef call llm json prompt: str - dict: response = get client .chat.completions.create model="gpt-4o-mini", messages= {"role": "user", "content": prompt} , response format={"type": "json object"}, forces valid JSON — no markdown, no prose temperature=0, deterministic, not creative timeout=30, return json.loads response.choices 0 .message.content.strip response format={"type": "json object"} tells the model at the API level: your entire output must be valid JSON. No markdown fences, no "Sure Here's the JSON:" preamble. The API enforces this — the model physically cannot violate it. temperature=0 removes creativity. You do not want the model to "interpret" what "July 15th" means. You want it to return "2025-07-15" the same way every single time. The mental model: Think of these prompts like a strict forms clerk. They extract exactly what is on the form, leave everything else blank, and never fill in fields based on assumption. All three extraction prompts share one foundational rule as a preamble: EXTRACTION CONTEXT — The "No Inference" Contract utils/prompts/extraction.pyEXTRACTION CONTEXT = """You are a data extraction engine for the IndiGo Airlines booking system.Extract ONLY what is explicitly stated by the user. Do not infer, assume, or embellish.Return valid JSON only — no extra text, no markdown.""" This is the most important declaration in the project. Without it, GPT-4o-mini will helpfully fill in what it “probably” knows. When the model guesses a field, the slot validator sees a value and considers it “collected.” The retry logic never triggers. The user never confirms. The booking moves forward with wrong data — silently. Every extraction prompt starts with EXTRACTION CONTEXT. EXTRACTION PROMPT — Phase 1: Collecting Flight Slots This prompt handles the first phase of booking: origin, destination, dates, trip type, and passenger count. Here it is in full: utils/prompts/extraction.pyEXTRACTION PROMPT = f"""{EXTRACTION CONTEXT}Today's date: {{today date}}Conversation so far:{{conversation history}}Latest user message: "{{user input}}"Extract any flight booking information mentioned. Return ONLY valid JSON with these fields use null for anything not mentioned :{{{{ "departure city": "city name or null", "destination city": "city name or null", "travel date": "YYYY-MM-DD or null", "return date": "YYYY-MM-DD or null only for round-trip ", "trip type": "one-way or round-trip or null", "adults": "integer or null", "children": "integer or null"}}}}children extraction rules:- Extract the number of children from phrases like "2 adults 1 child", "1 kid", "no children", "0 kids".- If the user explicitly says no children / 0 children, set children to 0.- Children age range: 2–16 years EU or 2–12 years others .Date inference rules today is {{today date}} :- If the user gives a date without a year, infer the year as follows: - Use the current year if the resulting date is today or in the future. - If the date has already passed this year, use next year. - If today is in November or December and the mentioned month is January or February, always use next year.trip type rules:- Only extract trip type if the user EXPLICITLY uses words like "one-way", "one way", "round trip", "round-trip", "return", "two-way".- NEVER infer trip type from a date, city, or any other context. If not explicitly stated, return null.Common Indian cities to recognise: Mumbai, Delhi, Bangalore, Chennai, Hyderabad, Pune, Jaipur, Goa, Kolkata, Kochi, Lucknow, Ahmedabad, Surat, Indore, etc.If a city is mentioned with a direction keyword like "from" it is departure city; with "to" or "visit" it is destination city.""" Three rules in this prompt deserve extra attention: The children null vs. 0 rule. null means "we haven't asked yet." 0 means "user confirmed no children." If the user says "2 adults, no kids" and the model returns children: null, the slot validator thinks the passenger question is still open and asks again — even though the user already answered it. The rule forces the model to distinguish between "not mentioned" and "explicitly zero." The trip type non-inference rule. Without it, the model sees two dates and decides it must be a round-trip — because that is the most common reason people give two dates. That guess silently skips the explicit trip type confirmation step. The rule is blunt: if the user did not say the words, return null. The date inference rules. Users almost never say the year. “July 15th” could be this July or next July. The rules resolve this deterministically: use the current year if the date is in the future, bump to next year if it has already passed, and always use next year for January or February mentions in November or December. PASSENGER EXTRACTION PROMPT — Phase 2: Step-Aware Extraction After the user selects a flight, the conversation enters Phase 2: a four-step sequence collecting flight confirmation, WhatsApp consent, passenger names, and email. Each step needs different extraction logic. The naive approach: write four separate prompts. The problem: you copy the EXTRACTION CONTEXT preamble four times and they drift out of sync. The better approach: one prompt that handles all four steps by embedding step-specific rules, with {current step} injected at call time: utils/prompts/extraction.pyPASSENGER EXTRACTION PROMPT = f"""{EXTRACTION CONTEXT}The user is booking a flight and is currently at step: "{{current step}}"Assistant's question: "{{assistant message}}"User message: "{{user input}}"Extract ONLY the information relevant to this step. Return ONLY valid JSON — no extra text, no markdown.Step rules:"flight confirm" — user is asked to confirm the selected flight Yes/No . {{{{\"flight confirmed\": true}}}} if user confirms yes, okay, confirm, sure, proceed, looks good, etc. {{{{\"flight confirmed\": false}}}} if user declines no, cancel, change, different, back, etc. {{{{\"flight confirmed\": null}}}} if unclear"whatsapp consent" — user is asked for WhatsApp communication consent Yes/No . {{{{\"whatsapp consent\": true}}}} if user agrees yes, okay, sure, fine, agree, etc. {{{{\"whatsapp consent\": false}}}} if user declines no, don't, decline, not now, etc. {{{{\"whatsapp consent\": null}}}} if unclear"collect names" — user is providing the names of all passengers. Parse every name into a structured object and return: {{{{ "passengers": {{{{\"title\": \"Mr/Mrs/Miss/Master\", \"first name\": \"...\", \"last name\": \"...\", \"age category\": \"adult/child/infant\"}}}}, ... }}}} Inference rules: - title: use the prefix if explicitly given Mr / Mrs / Miss / Master ; for unlabelled names, default to "Mr" for typical male names and "Miss" for female names; use "Master" for male children. - age category: use "child" if the user says child, kid, jr, junior, or similar; use "infant" if the user says infant, baby, or similar; default to "adult" otherwise. - If no names could be identified at all, return {{{{\"passengers\": null}}}}."collect email" — user is providing their email address. {{{{\"email\": \"<email address extracted from the message \"}}}} {{{{\"email\": null}}}} if no valid email address is found""" The current step gets injected at call time: prompt = PASSENGER EXTRACTION PROMPT.format current step=state "step" , assistant message=state "assistant message" , user input=user input, One prompt, one call llm json call, four different behaviours — selected entirely by the step value in state. No branching in Python code, no four separate prompt strings to maintain. PNR EXTRACTION PROMPT — PNR Code Extraction The web check-in and flight status flows need just one thing from the user: their PNR code. This prompt is deliberately minimal: utils/prompts/extraction.pyPNR EXTRACTION PROMPT = f"""{EXTRACTION CONTEXT}The user was asked to provide their PNR number.Latest user message: "{{user input}}"PNR codes are alphanumeric strings, typically 6–7 characters long e.g. I000004, S000030, ABC123 . Extract the FULL code exactly as given — do not truncate or modify it.Return ONLY valid JSON:{{{{\"pnr\": \"<PNR code in uppercase \"}}}}If no valid PNR is found in the message return:{{{{\"pnr\": null}}}}""" Short prompt, one field, one rule. The model does not need conversation history, step logic, or persona here — just the raw user message and a pattern description. The mental model: Think of these prompts like a multiple-choice answer sheet. The model gets a list of options and must pick exactly one. No elaboration, no hedging, no “it depends.” User message arrives mid-conversation │ ▼ ┌─────────────────────────────────────────────┐ │ Classification Prompt │ │ "Here are your only valid options: │ │ affirm / deny / modify │ │ │ │ Pick exactly one. Return JSON." │ └─────────────────────────────────────────────┘ │ ▼ {"intent": "affirm"} ← no prose, no explanation Classification prompts differ from extraction prompts in one key way: the model is not parsing what the user said — it is deciding which pre-defined bucket the user’s message falls into. CONFIRM INTENT PROMPT — Affirm, Deny, or Modify? Used after the bot shows the user a travel summary and asks them to confirm. Three outcomes are possible: utils/prompts/classification.pyCONFIRM INTENT PROMPT = """The user has been shown their travel summary and asked to confirm or make changes.Assistant message: "{assistant message}"User message: "{user input}"Classify the user's intent into exactly one of three options and return ONLY valid JSON:{{"intent": "affirm"}} — user wants to proceed e.g. yes, ok, sure, proceed, go ahead, don't stop, sounds good {{"intent": "deny"}} — user wants to cancel or start over e.g. no, cancel, don't proceed, start over {{"intent": "modify"}} — user wants to change something e.g. change destination, update date, different city """ The examples inside each bucket do a lot of the work. Without them, “sounds good” might confuse the model. “Don’t stop” — which means proceed — could easily be mistaken for “deny.” Explicit examples anchor the model to the intended interpretation. FLIGHT SELECTION PROMPT — Which Flight Did the User Choose? After the bot shows a numbered list of flights, the user might say “flight 1”, “the cheapest one”, “the 9 AM flight”, or even the flight number directly “6E0863” . This prompt maps all of those to a 0-based index: utils/prompts/classification.pyFLIGHT SELECTION PROMPT = """Available flights sequence number shown as Flight N, index is N-1 :{flights}Assistant's last message: "{assistant message}"User said: "{user input}"Identify which flight the user wants. They may say "flight 1", "flight 5", "the cheapest", a flight number like "6E0863", or a departure time.When the user says "flight N", the selected index is N-1 e.g. "flight 5" → selected index 4 .Return ONLY valid JSON:{{"selected index": <0-based integer index into the flights list }}If you cannot determine a valid selection return:{{"selected index": null}}""" The N-1 rule is explicit in the prompt. Without it, the model may return selected index: 1 when the user says "flight 1" — off by one from the 0-based Python list. A subtle bug, caught by one sentence in a prompt. MID FLOW INTENT PROMPT — Is the User Continuing or Changing Their Mind? When a user has already seen available flights, they might suddenly say “actually, I want to go to Goa instead.” This prompt distinguishes that from a normal flight selection: utils/prompts/classification.pyMID FLOW INTENT PROMPT = """The user is in the middle of a flight booking. They have already seen available flights.Current step: {step}Last assistant message: "{assistant message}"User said: "{user input}"Classify the user's intent:- "modify" : user wants to change any booking detail destination, departure city, date, passengers Examples: "change destination to goa", "different date", "I want to go to Delhi instead", "update departure"- "continue" : user is selecting a flight or answering the current question Examples: "flight 1", "the cheapest", "yes", "no", "9:00 AM"Return ONLY valid JSON — no extra text:{{"intent": "modify"}}{{"intent": "continue"}}""" Two buckets, explicit examples. Without the examples, “I want to go to Delhi instead” is ambiguous — it contains a city name, exactly like a normal slot answer. The examples train the model on the intent behind the words, not just the words themselves. CITY LOOKUP PROMPT — Resolving Messy City Names Users say “Bombay” instead of “Mumbai”, “Bengaluru” or “Bangalore”, “New Delhi” or “Delhi.” A pure string match fails all of these. This prompt resolves the user’s input against a pre-computed list of fuzzy candidates: utils/prompts/classification.pyCITY LOOKUP PROMPT = """A user mentioned the city "{input city}" while booking a flight.Here are the airport cities we serve that most closely match that input:{candidates}Your task:- If one of the candidates clearly matches what the user intended same city, common alias, or obvious typo , return that candidate's exact name.- If the list is empty or none of the candidates is a plausible match, return nullReturn ONLY valid JSON — no extra text:{{"resolved city": "<exact candidate name or null "}}""" The candidates list is generated in Python using substring and fuzzy matching before the prompt runs. The model’s job is only to pick from a short, already-relevant list — not to search 50 cities from scratch. This two-step approach Python narrows candidates → LLM picks the best match is faster and more reliable than asking the LLM to search the full city list directly. The mental model: This is the only family where the model produces natural language. But “free text” does not mean “no constraints.” Every conversation prompt is bounded by a shared persona constant. Extraction prompts → strict JSON outputClassification prompts → pick from closed options │ ▼Conversation prompts → free text, always shaped by SYSTEM PERSONA SYSTEM PERSONA — The Shared Personality This constant is imported by every conversation prompt. It defines the bot’s personality once, so every message the bot produces sounds like the same voice: utils/prompts/conversation.pySYSTEM PERSONA = """You are 6ESkai, the official virtual booking assistant of IndiGo Airlines.Personality:- Friendly, professional, calm, concise- Speak exactly like an airline customer support chat- Never use emojis- Never use slangRules:- Ask only one question at a time- Never assume information which is not provided by the user- Never invent flight details- Do not repeat confirmed information""" If you want to change the bot’s tone — say, make it more formal, or add “always address users by their first name” — you edit this one constant and every conversation prompt inherits the change automatically. No hunting through eight prompt strings. ROUTING PROMPT — Classifying Intent for New Sessions When a user sends their very first message, the bot does not know what they want. This prompt classifies the message into one of five intents: utils/prompts/conversation.pyROUTING PROMPT = """{system}Conversation so far:{conversation history}Latest user message:"{user input}"Classify the LATEST user message into exactly one of these intents and return ONLY valid JSON — no extra text, no markdown.Intent rules:- "greeting" : user says hello, hi, asks what you can do, or sends a generic opener- "book flight" : user wants to book a ticket OR is continuing an active booking conversation e.g. replying with a date, city, passenger count, yes/no to a booking question - "web checkin" : user wants to check in online, do web check-in, or mentions web check-in- "flight status": user wants to know flight status, arrival/departure info, or terminal details- "out of scope" : message is completely unrelated to IndiGo services AND there is no active booking in progress weather, jokes, coding, politics, general knowledge, etc. IMPORTANT: If the conversation history shows an active booking in progress assistant asked for date, city,passengers, confirmation, etc. , treat the user's reply as "book flight" even if it looks like a bare value e.g. "23rd May", "2 adults", "one-way", "yes" .For "greeting" include a "reply" field with the standard welcome message.For "book flight", "web checkin", "flight status", and "out of scope" omit the "reply" field.Output format choose one :{{"intent": "greeting", "reply": "Hello I am 6ESkai, your IndiGo virtual booking assistant.\\nHow can I help you today?\\n- Book a flight ticket\\n- Flight Status\\n- Web Check-in"}}{{"intent": "book flight"}}{{"intent": "web checkin"}}{{"intent": "flight status"}}{{"intent": "out of scope"}}""" The IMPORTANT note is the key clause. Without conversation history in the prompt, "23rd May" looks like an out-of-scope message — it is not a clear booking request on its own. With history included, the model can see that the bot previously asked "What is your travel date?" — so "23rd May" is unambiguously a booking reply. OUT OF SCOPE PROMPT — Politely Redirecting Off-Topic Messages utils/prompts/conversation.pyOUT OF SCOPE PROMPT = """{system}The user sent a message that is unrelated to IndiGo's services:"{user input}"Politely let them know you can only assist with IndiGo-related topics, then remind them of the available options.Keep the response to 2–3 lines. Do not offer any help outside IndiGo services.Remind them:- Book a flight ticket- Flight Status- Web Check-in""" Short and deliberate. The {system} injection ensures the persona is consistent even in refusals. The "2–3 lines" constraint prevents a lengthy apology where a brief redirect is all that is needed. RETRY MESSAGE PROMPT — The Only Free-Text Non-Schema Prompt When slot extraction fails — the user gives a date in the past, a city with no airport, or a passenger count that does not add up — the bot needs to re-ask. It cannot do this robotically “Invalid input. Please try again.” . It needs empathy. This is the only prompt in the project that asks for pure natural language output with no JSON schema: utils/prompts/conversation.pyRETRY MESSAGE PROMPT = """{system}The user is booking a flight. The assistant previously asked for: {slot label}The user's response could not be understood. Error: {error}User said: "{user input}"Write a short, natural, empathetic follow-up 1-3 sentences that:- Acknowledges the issue without being repetitive or robotic- Re-asks for the same information clearly- Stays in the tone of an airline customer support chat- Does NOT use emojisEnd with the original question phrased naturally. Do not include options like "Option - Yes".""" The {slot label} and {error} values come from the slot validator, not from the model. This separation is important: the validator decides what went wrong and why; the model decides how to say it kindly. Mixing these responsibilities into one prompt would make either the error detection or the empathy unreliable. Here is what happens in a naive multi-turn agent without careful graph design: Every time the user sends a message — whether it is their first message or their twentieth — the system starts at the beginning and runs an LLM call to classify intent. That LLM call costs tokens, takes time, and can go wrong. More importantly, mid-session the intent is already known. The user is mid-booking. They just said “From Jaipur.” Running an intent classifier at this point is not just wasteful — it is a point of failure. There is a second problem: with ten nodes in a booking flow, how do you resume exactly at the right node mid-conversation? If the user is at “collect passenger names” and sends a new message, you need to land at collect names directly — not replay the entire booking flow from scratch. These two problems drive the entire graph architecture. The central idea is: the step field in state is the single source of truth for where you are. Before any LLM runs, the graph reads step and routes accordingly. This makes routing: The graph is organised into two levels: Level 1: Top-Level Graph 3 nodes Reads state "step" and routes to the right subgraph. Fires the LLM-based router ONLY for brand-new sessions.Level 2: Subgraphs booking + pnr Each is a complete, self-contained state machine. Has its own internal dispatch table. All routing functions inside are pure Python. Let us trace how a user message flows through this structure. Every user message enters the graph at dispatch route . This function reads the current step and decides where to go — before any LLM runs: graph/ init .py BOOKING STEPS = { Step.COLLECT SLOTS, Step.EXTRACTED, Step.CITY VALIDATED, Step.CONFIRM BOOKING, Step.SEARCH FLIGHTS, Step.SEARCH RETURN FLIGHTS, Step.SHOW FLIGHTS, Step.FLIGHT CONFIRM, Step.WHATSAPP CONSENT, Step.COLLECT NAMES, Step.COLLECT EMAIL, Step.PAYMENT, Step.DONE,} PNR STEPS = {Step.COLLECT PNR}def dispatch route state: BookingState - str: step = state.get "step", Step.GREETING if step in BOOKING STEPS: return "booking" if step in PNR STEPS: return "pnr" return "router" Three outcomes: The LLM only fires on the very first turn of a new session. Every turn after that bypasses it entirely. After the router runs, route after router reads the classified intent and sends the session to the right subgraph: php graph/ init .pydef route after router state: BookingState - str: intent = state.get "intent", "" if intent == Intent.BOOK FLIGHT: return "booking" if intent in Intent.WEB CHECKIN, Intent.FLIGHT STATUS : return "pnr" return END Here is the complete top-level graph: graph/ init .pydef create graph : g = StateGraph BookingState g.add node "router", route g.add node "booking", booking subgraph g.add node "pnr", pnr subgraph g.set conditional entry point dispatch route, { "router": "router", "booking": "booking", "pnr": "pnr", } g.add conditional edges "router", route after router, { "booking": "booking", "pnr": "pnr", END: END, } g.add edge "booking", END g.add edge "pnr", END return g.compile booking graph = create graph Three nodes. Four edges. The top-level graph contains zero business logic — it is a traffic cop: Every user message │ ▼ dispatch route ← pure Python, reads state "step" , no LLM │ ┌────┼─────────────────────────────────────────┐ │ │ │ ▼ ▼ ▼ step step in BOOKING STEPS step in PNR STEPS is mid-booking mid-PNR lookup GREET │ │ │ └──────────────────┐ └──────────────┐ ▼ ▼ ▼"router" "booking" "pnr" LLM runs, skip router entirely skip router entirely classifies intent │ ▼route after router ← pure Python, reads state "intent" │ ├─ book flight → "booking" ├─ web checkin → "pnr" ├─ flight status → "pnr" └─ greeting/out → END The booking subgraph handles the complete flight booking flow across ten nodes: slot collection, validation, city resolution, flight search, flight selection, passenger details, and payment. Every turn into this subgraph passes through one entry point: booking guardrail. Why a guardrail at the entry point? Without it, mid-flow changes break things. A user who has just seen available flights might say “actually, change destination to Goa.” Without the guardrail, that message arrives at select flight, which tries to parse it as a flight selection and fails. The guardrail intercepts, detects the modification intent using a quick LLM call, resets the relevant state fields, and sends the user back to slot collection — cleanly, without a full restart. After the guardrail, a lookup table dispatches to the correct node based on step: graph/booking subgraph.py STEP TO NODE = { Step.SHOW FLIGHTS: "select", Step.CONFIRM BOOKING: "confirm", Step.PAYMENT: "payment", Step.DONE: "done", Step.FLIGHT CONFIRM: "info extractor", Step.WHATSAPP CONSENT: "info extractor", Step.COLLECT NAMES: "info extractor", Step.COLLECT EMAIL: "info extractor",}def dispatch state: BookingState - str: step = state.get "step", "" if step == Step.PAYMENT MODIFY CONFIRM: return END return STEP TO NODE.get step, "info extractor" This is better than a long if/elif chain for two reasons. First, it is a data structure — adding a new step means adding one line to the dict, not editing a chain of conditions. Second, it has a clean default: any slot-collection step that is not listed falls through to "info extractor", which is the correct starting node for new slot turns. The routing functions between nodes are all pure Python — no LLM, no I/O, no side effects: php graph/booking subgraph.pydef after info extractor state: BookingState - str: if state.get "step" == Step.EXTRACTED: return "validate slots" return "conversation driver"def after validate slots state: BookingState - str: if state.get "slot error" : return "conversation driver" if state.get "cities updated" : return "city lookup" return "conversation driver"def after conversation driver state: BookingState - str: if state.get "terminated" : return END if state.get "step" == Step.PAYMENT: return "payment" if state.get "step" in Step.SEARCH FLIGHTS, Step.SEARCH RETURN FLIGHTS : return "search" return ENDdef after confirmation state: BookingState - str: step = state.get "step", "" if step in Step.SEARCH FLIGHTS, Step.SEARCH RETURN FLIGHTS : return "search" if step == Step.COLLECT SLOTS: return "info extractor" return END Each function answers exactly one question: given what just happened what is now in state , where do we go? No branching logic, no LLM, no database queries. If the bot ends up in the wrong node, you read the routing function and trace the state backward. Debugging is just reading Python. Here is the complete booking subgraph: graph/booking subgraph.pydef create booking graph : g = StateGraph BookingState g.add node "booking guardrail", booking guardrail g.add node "info extractor", extract information g.add node "validate slots", validate slots g.add node "city lookup", lookup cities g.add node "conversation driver", drive conversation g.add node "confirm", confirm intent g.add node "search", flight search agent g.add node "select", select flight g.add node "payment", build payment summary g.add node "done", done g.set entry point "booking guardrail" g.add conditional edges "booking guardrail", dispatch, { "info extractor": "info extractor", "select": "select", "confirm": "confirm", "payment": "payment", "done": "done", END: END, } g.add conditional edges "info extractor", after info extractor, { "validate slots": "validate slots", "conversation driver": "conversation driver", } g.add conditional edges "validate slots", after validate slots, { "city lookup": "city lookup", "conversation driver": "conversation driver", } g.add edge "city lookup", "conversation driver" g.add conditional edges "conversation driver", after conversation driver, { "payment": "payment", "search": "search", END: END, } g.add conditional edges "confirm", after confirmation, { "search": "search", "info extractor": "info extractor", END: END, } g.add edge "search", END g.add edge "select", END g.add edge "payment", END g.add edge "done", END return g.compile booking subgraph = create booking graph The booking flow visualised end-to-end: Incoming turn │ ▼booking guardrail ← checks step, intercepts mid-flow changes │ dispatch ← reads STEP TO NODE lookup table │ ├─ COLLECT SLOTS / new → info extractor ├─ SHOW FLIGHTS → select → END ├─ CONFIRM BOOKING → confirm ├─ PAYMENT → payment → END ├─ DONE → done → END └─ FLIGHT CONFIRM / WHATSAPP CONSENT / COLLECT NAMES / COLLECT EMAIL → info extractor │ after info extractor │ ┌─────────┴─────────────────┐ │ │ step == EXTRACTED everything else │ │ ▼ ▼ validate slots conversation driver │ │ after validate slots after conversation driver │ │ ┌───────┴──────────┐ ┌─────────┼──────────┐ │ │ │ │ │ city error? cities term- PAYMENT? SEARCH? │ changed? inated? │ │ ▼ │ │ ▼ ▼ conversation driver city lookup END payment search ↑ │ │ │ └──────────────┘ END END The PNR subgraph handles web check-in and flight status queries. It is simpler — three nodes, two routing decisions — but follows the exact same philosophy: state-driven dispatch, pure Python routing: graph/pnr subgraph.py PNR PROCESSES = {Process.WEB CHECKIN, Process.FLIGHT STATUS}def dispatch state: BookingState - str: if state.get "step" == Step.COLLECT PNR: return "info extractor" return "conversation driver"def after info extractor state: BookingState - str: if state.get "process" in PNR PROCESSES and state.get "pnr" : return "pnr lookup" return "conversation driver"def create pnr graph : g = StateGraph BookingState g.add node "conversation driver", drive conversation g.add node "info extractor", extract information g.add node "pnr lookup", pnr lookup agent g.set conditional entry point dispatch, { "conversation driver": "conversation driver", "info extractor": "info extractor", } g.add edge "conversation driver", END g.add conditional edges "info extractor", after info extractor, { "pnr lookup": "pnr lookup", "conversation driver": "conversation driver", } g.add edge "pnr lookup", END return g.compile pnr subgraph = create pnr graph Incoming turn │ ▼ dispatch │ ┌──┴─────────────────────────────────┐ │ │ ▼ ▼conversation driver info extractor asks for PNR → END │ after info extractor │ ┌───────────┴───────────────┐ │ │ ▼ ▼ conversation driver → END pnr lookup → END re-ask if PNR not found Put it all together and every user turn follows this path: User sends a message Streamlit or Telegram — same code path │ ▼ booking graph.invoke state ← single compiled entry point │ ▼ dispatch route state ← pure Python, reads state "step" │ ┌────┼──────────────────────────────────────────────┐ │ │ │ ▼ ▼ ▼ GREET step in BOOKING STEPS step in PNR STEPS │ │ │ ▼ ▼ ▼router booking subgraph pnr subgraph LLM └─ booking guardrail └─ dispatch └─ dispatch → 10 nodes └─ 3 nodes │ ▼route after router ├─ book flight → booking subgraph ├─ web checkin → pnr subgraph └─ flight status → pnr subgraph The LLM for intent classification fires exactly once per session — on the very first message. Everything after that is a dictionary lookup or a boolean check on state fields. Add a new booking step? Add one line to BOOKING STEPS and one entry to STEP TO NODE. The rest of the routing stays untouched. The easiest mistake in agentic AI is making one agent do too many things. A node that extracts slots, validates them, and decides what question to ask next is hard to test, hard to debug, and impossible to trust. When it fails, you don’t know which of the three jobs failed. The design principle here is simple: every node has exactly one job. Some nodes call the LLM. Most do not. The LLM is only involved when the task genuinely requires language understanding. Everything else — validation, routing, city code lookup, price calculation — runs in plain Python. Nodes in this project:LLM nodes language understanding required ├── router → classifies intent ├── info extractor → extracts slots / names / PNR from user message ├── city lookup → resolves messy city names to canonical names ├── conversation driver → generates retry messages and slot questions └── flight selection → maps natural language to a flight index Pure Python nodes no LLM ├── booking guardrail → reads step, dispatches or intercepts ├── slot validator → checks dates, passenger counts ├── confirm → routes affirm/deny/modify ├── flight search → runs SQL, formats results ├── payment → assembles booking summary └── done → saves booking, generates PNR Here is a subtle failure mode. The user is mid-booking. Destination is already “Mumbai.” They say “From Jaipur.” The LLM extraction prompt sees one city and might return it as destination city because it appeared without a "from" keyword in isolation . Now you have destination city = "Jaipur" — overwriting the user's earlier answer. The fix is pure Python, no extra LLM call: nodes/information extractor.pyhas departure = bool extracted.get "departure city" has destination = bool extracted.get "destination city" if has departure or has destination and not has departure and has destination : extracted city = extracted.get "departure city" or extracted.get "destination city" if extracted city and state.get "destination city" and extracted city = state.get "destination city" : extracted "departure city" = extracted city extracted "destination city" = None elif extracted city and state.get "departure city" and extracted city = state.get "departure city" : extracted "destination city" = extracted city extracted "departure city" = None The logic: if exactly one city was extracted, and one city is already confirmed in state, the new city must be the other field. No additional LLM reasoning needed — just state and a conditional. Validation is a rule-based job, not a language job. The slot validator runs checks in Python and writes errors back to state. No LLM call: php nodes/slot validator.pydef validate slots state: dict - dict: errors = travel date str = state.get "travel date" if travel date str: try: td = date.fromisoformat travel date str if td < date.today : errors.append f"Travel date {travel date str} is in the past. Please provide a future date." state "travel date" = None except ValueError: errors.append f"'{travel date str}' is not a valid date. Please provide a date like 25 June." state "travel date" = None return date str = state.get "return date" if state.get "trip type" == "round-trip" and return date str: try: rd = date.fromisoformat return date str travel date ok = state.get "travel date" if travel date ok and rd < date.fromisoformat travel date ok : errors.append f"Return date {return date str} cannot be before travel date {travel date ok}." state "return date" = None elif rd < date.today : errors.append f"Return date {return date str} is in the past." state "return date" = None except ValueError: errors.append f"'{return date str}' is not a valid return date." state "return date" = None adults = state.get "adults" if adults is not None and adults < 1: errors.append "At least 1 adult passenger is required." state "adults" = None children = state.get "children" if children is not None and children < 0: errors.append "Number of children cannot be negative." state "children" = None state "slot error" = "\n".join errors if errors else "" return state When slot error is set, the routing function after this node sends execution to conversation driver, which calls RETRY MESSAGE PROMPT to generate a natural re-ask. The validator produces the error message; the LLM produces the empathy. City names from users are messy: “Bombay”, “New Delhi”, “Bengaluru”, “Banglore” typo . A pure string match fails all of these. But asking the LLM to search 50 cities from scratch is overkill and error-prone. The solution is a two-step approach: User types: "Bombay" │ ▼ get candidate cities "Bombay" ← Python substring + fuzzy match → "Mumbai" ← short candidate list │ ▼ CITY LOOKUP PROMPT with candidates → {"resolved city": "Mumbai"} ← LLM picks from short list │ ▼ city to code "Mumbai" → "BOM" ← dict lookup, no LLM php nodes/city lookup.pydef lookup cities state: dict - dict: for field in state.get "cities changed", : raw = state.get field if not raw: continuecandidates = get candidate cities raw Python fuzzy match if not candidates: state "city error" = f"Sorry, '{raw}' does not appear to have a serviced airport." state field = None continue prompt = CITY LOOKUP PROMPT.format input city=raw, candidates=", ".join candidates result = call llm json prompt resolved = result.get "resolved city" if resolved: state field = resolved.title code field = "departure airport code" if field == "departure city" else "destination airport code" state code field = city to code resolved else: label = "departure" if field == "departure city" else "destination" state "city error" = f"Sorry, '{raw}' does not appear to have a serviced airport. Please provide a valid {label} city." state field = None state "step" = "CITY VALIDATED" return state If resolution fails, city error is set. The conversation driver reads it on the next turn, calls RETRY MESSAGE PROMPT, and asks for the city again — incrementing slot attempts "departure city" . After three failures, terminated = True. At any point in the booking flow, the bot needs to decide: what do I ask next? This sounds like a job for an LLM. It is not. If you let the LLM decide what question to ask, you get inconsistent ordering. Sometimes it asks for the date before the destination. Sometimes it skips passengers. Sometimes it asks two things at once. Users get confused. Your slot completion rate drops. The design principle is: question ordering is data, not model output. A fixed list of required fields, checked in a fixed order, produces a predictable question sequence every time. The LLM is only called to generate the wording of the question — not to decide which question to ask. The get missing flight slots function returns the list of unfilled slots in a deterministic order: php nodes/conversation driver.pydef get missing flight slots state: dict - list: missing = for s in "destination city", "departure city", "travel date", "trip type" : if not state.get s : missing.append s if state.get "trip type" == "round-trip" and not state.get "return date" : missing.append "return date" if not state.get "adults" or state.get "children" is None: missing.append "passengers" return missing The conversation driver takes the first item from this list and asks for it. If the user provides three slots in one message, the extractor fills them all, and the next call to get missing flight slots returns only what is still missing. The driver never asks for something it already has. After the user selects a flight, the flow enters Phase 2. The confirmation step field drives a fixed four-stage sequence — no LLM needed to decide the order: flight confirm → whatsapp consent → collect names → collect email Each stage calls PASSENGER EXTRACTION PROMPT with the current step, extracts the relevant piece of information, stores it in state, and advances confirmation step to the next stage. A real usability problem: a user booking for 3 passengers might send names in separate messages “Mr John Smith” then “Mrs Jane Smith” then “Miss Amy Smith” . The naive approach — reject anything that doesn’t give all names at once — frustrates users. The extractor accumulates names across turns: nodes/information extractor.pyexisting = state.get "passengers" or combined = existing + p for p in val if p not in existing if expected 0 and len combined < expected: state "passenger error" = f"Need {expected - len combined } more name s ." state "passengers" = combined save partial progresselif expected 0 and len combined expected: state "passenger error" = f"I counted {len combined } name s but expected {expected}. " f"Please clarify which {expected} passengers are travelling." state "passengers" = reset on ambiguity - too many is worse than too fewelse: state "passengers" = combined state "passenger error" = "" clear error on success The conversation driver reads the partial list and tells the user what it already has before asking for the rest: nodes/conversation driver.pyif passengers: collected = ", ".join f"{p 'title' } {p 'first name' } {p 'last name' }" for p in passengers remaining = expected - len passengers state "assistant message" = f"Got it I have recorded: {collected}.\n\n" f"Please provide the name s for the remaining {remaining} passenger s .\n" "eg: Mr./Mrs./Miss First Name Last Name" The name attempts counter only increments when the total count is still wrong after extraction. Sending names incrementally is not a failure — it is normal input. Three failures means three genuinely bad attempts, not three partial messages. Every field that can fail has its own retry counter in slot attempts: ValidationState in state.pyslot attempts: Dict str, int {"travel date": 2, "departure city": 0}name attempts: intflight select attempts: intterminated: bool This is more surgical than a single session-level counter. A user might get the departure city right on the first try but keep giving dates in the past. With per-field tracking, only the date field terminates after three failures — the session stays alive. With a global counter, the third bad date would kill a session that otherwise had good data. The PNR lookup serves two different flows web check-in and flight status from the same database query. The process field in state determines how the result is formatted: php services/pnr lookup.pydef pnr lookup agent state: dict - dict: pnr = state.get "pnr", "" .strip .upper process = state.get "process", "" info = fetch pnr info pnr if not info: state "assistant message" = f"We could not find any booking with PNR {pnr}. " "Please check the PNR and try again." state "step" = "COLLECT PNR" state "pnr" = "" return state if process == "flight status": state "assistant message" = format flight status info else: state "assistant message" = format web checkin info state "step" = "PNR DONE" return state One fetch, two formatters, zero LLM calls. The database join handles the complexity; Python handles the formatting. The LLM is not involved in lookup or display — only in the PNR extraction step upstream. The payment node assembles the booking summary from state — no database writes yet, no LLM calls: php nodes/payment.pydef build payment summary state: dict - dict: adults = state.get "adults" or 1 outbound = state.get "selected outbound flight" or {} return flight = state.get "selected return flight" or {} is round trip = bool outbound and return flight The done node then writes the booking to the database and generates the PNR: php nodes/done.pydef done state: dict - dict: result = save booking state pnr code = result.get "pnr code", "" transaction id = result.get "transaction id", "" if is round trip: total = outbound.get "price", 0 + return flight.get "price", 0 adults ... format round-trip summary else: flight = state.get "selected flight", {} total = flight.get "price", 0 adults ... format one-way summary state "assistant message" = response state "step" = "DONE" return state if pnr code: state "assistant message" = f"Booking Confirmed \n" f"-----------------------------------\n" f"Your PNR: {pnr code}\n" f"Transaction ID: {transaction id}\n" f"-----------------------------------\n\n" "A confirmation has been sent to your email and WhatsApp.\n" "Use your PNR for web check-in or to check flight status." state "step" = "SHOW MENU" return state Separating payment summary payment node from booking persistence done node means you can show the user a full price breakdown before writing anything to the database — which is how real booking systems work. You build an agent. It works in your tests. Then a real user says “the bot got stuck after I said yes.” You have no idea what happened. The failure mode is familiar: agents fail quietly. An LLM returns a slightly different JSON shape, the parser silently returns None, the routing function sends execution to the wrong node, and the user sees a confusing message. Without a trace, you are guessing. The design principle: instrument at the wrapper layer, not at the node layer. Every LLM call already passes through call llm json. Every non-LLM node already calls log node. This means zero manual instrumentation is needed in any node — you get a complete per-turn trace automatically. The hardest part of distributed tracing is knowing who called what . Here, the LLM wrapper walks the Python call stack to find the calling node’s filename — no node name parameter needed anywhere: php utils/llm.pydef caller node - str: for frame info in inspect.stack 2: : fname = os.path.basename frame info.filename if fname = "llm.py": node = fname.replace ".py", "" return f"{node}.{frame info.function}" return "unknown" When information extractor.py calls call llm json , the stack walk returns "information extractor. extract flight slots". The log entry is automatically attributed to the correct node and function — even if the same prompt is called from multiple places. Every call llm json call appends a structured entry to run logs: php utils/llm.pydef call llm json prompt: str - dict: node = caller node t0 = time.time response = get client .chat.completions.create model="gpt-4o-mini", messages= {"role": "user", "content": prompt} , response format={"type": "json object"}, temperature=0, timeout=30, latency ms = round time.time - t0 1000 raw = response.choices 0 .message.content.strip usage = response.usage run logs.append { "node": node, "model": "gpt-4o-mini", "prompt": prompt, "output": raw, "prompt tokens": usage.prompt tokens, "completion tokens": usage.completion tokens, "total tokens": usage.total tokens, "latency ms": latency ms, "call type": "json", } return json.loads raw Pure Python nodes use log node for the same structured format: nodes/slot validator.pylog node "slot validator", { "fields checked": {"travel date": travel date str, "adults": adults}, "errors": errors, "outcome": "invalid" if errors else "valid",}, latency ms=round time.time - t0 1000 At the end of each turn, llm module.get logs returns the full ordered list of everything that ran — LLM calls and Python nodes alike. The Streamlit UI renders every assistant message with a second “Logs” tab. When a user’s turn is stored in st.session state.chat, it carries the log list alongside the reply text: app.pyllm module.reset logs result = booking graph.invoke st.session state.booking state run logs = llm module.get logs st.session state.chat.append { "role": "assistant", "content": reply, "logs": run logs, full trace attached to this message "state snapshot": state snapshot result , state at end of turn "total latency ms": total latency ms,} When the chat history is rendered, messages with logs get two tabs instead of one: app.pytab answer, tab logs = st.tabs "Assistant", "Logs" with tab answer: st.write msg "content" with tab logs: render logs msg "logs" , msg.get "state snapshot" , msg.get "total latency ms", 0 The logs tab shows turn latency, LLM call count, node call count, and expandable entries for every individual call — prompt in, output out, token counts, latency. When a user says “the bot got confused,” you open the Logs tab for that message and the answer is right there. Most chatbot tutorials end with a Streamlit app. Then you want to add Telegram. Now you have two codebases to maintain — or you copy-paste the business logic and everything drifts. The design principle: channels are thin adapters. The graph knows nothing about channels. All business logic — routing, extraction, validation, search, payment — lives in the LangGraph state machine. The channels only do three things: get the user’s message, call booking graph.invoke , and display the result. Streamlit app.py Telegram telegram bot.py │ │ └──────────────┬────────────────────────┘ ▼ booking graph.invoke state │ ▼ result "assistant message" │ ┌───────────┴───────────┐ ▼ ▼ st.write reply await message.reply text reply Adding a WhatsApp channel means writing one new adapter file. The booking logic, prompts, validation, and database queries stay exactly as they are. The two channels store state differently, but the state shape is identical. Streamlit stores state in st.session state keyed by UUID, with overflow persistence to sessions.db. On page refresh, the session is restored from the database: app.pydef init session : if "booking state" not in st.session state: existing sid = st.query params.get "sid" restored = load state existing sid if existing sid else None if restored and not is session expired restored : st.session state.booking state = restored else: state = copy.deepcopy INITIAL STATE state "session id" = str uuid.uuid4 state "last active at" = now iso st.session state.booking state = state st.query params "sid" = st.session state.booking state "session id" The session ID is stored in the URL as ?sid=.... After a page refresh, the URL carries the session ID, the app restores state from sessions.db, and the user continues exactly where they left off. Telegram stores state in an in-memory dict keyed by chat id. No persistence across bot restarts — appropriate for a demo. Both adapters enforce a 30-minute inactivity timeout. If last active at is more than 30 minutes old, the state resets and the user starts over: php app.pydef is session expired state: dict - bool: last = state.get "last active at" if not last: return False try: last dt = datetime.fromisoformat last delta = datetime.now timezone.utc - last dt return delta.total seconds SESSION TTL SECONDS except Exception: return False When a session expires, the adapter resets to INITIAL STATE but preserves user id and channel — so the user gets a fresh booking context without losing their identity. The expired session is deleted from sessions.db. Run the app with streamlit run app.py and open http://localhost:8501. http://localhost:8501. Query 1 typical booking : “I want to travel from Jaipur to Mumbai” The assistant extracts the slots from the first message, then asks for travel dates. Query 2 edge case: round-trip : “Book a round trip from Jaipur to Hyderabad, 15th July to 20th July, 2 adults 1 child” The two-phase flow handles outbound selection, stores it, then immediately searches and presents return flights. Query 3 failure case: unknown city : “Book a flight to Springfield” The city lookup finds no candidates, sets a city error, re-asks for the city, and tracks the attempt. After three failures it terminates with a customer care number. Query 4 Check flight Status : “I want to know the status of my flight” The assistant asks for PNR number and once the PNR is provided, assistant shares the status of the flight. Query 5 Web Checkin : “I want to do Web-checkin” The assistant asks for PNR number and once the PNR is provided, assistant shares the steps for web-checkin. Observability On streamlit, each answer comes with a logs tab where users can see the path of the entire query as shown in below screenshot. Run the app with python run telegram bot.py and open telegram and search for your bot. Below GIF shows the complete end to end session of flight booking which is mostly similar to Indigo 6ESKAI on WhatsApp . In this article, we built a production-style flight booking assistant using LangGraph and OpenAI APIs that supports flight booking, web check-in, and flight status enquiries through natural conversations. More importantly, we explored a class of AI applications where the primary challenge is not answering a question, but managing a business workflow. Unlike transformation or information-seeking systems, task-oriented dialogue systems must progressively gather information, maintain conversational state, validate user inputs, recover from errors, and guide users toward a successful outcome. While the implementation focused on an airline booking assistant, the architectural patterns remain the same across many real-world applications. Customer onboarding, insurance claims, appointment scheduling, loan applications, technical troubleshooting, and countless enterprise workflows all require the same combination of workflow orchestration, structured state management, and LLM-powered language understanding. I hope this article provides a practical blueprint for building production-ready Agentic AI systems that go beyond simple chatbots and are capable of completing real business processes. The complete source code for this project is available on GitHub. Feel free to explore it, experiment with new workflows, and extend it to solve your own domain-specific problems. Thank you for reading the article. AgenticAI is complex and chaotic but getting started doesn’t have to be. I focus on making that first step simpler for you. Follow along https://medium.com/@alphaiterations for regular updates and more such articles. Feel free to connect on Linkedin https://www.linkedin.com/in/jainvijendra/ if you’re on a similar path. And if you’re still curious, there’s more to explore. Building a Conversational Flight Booking Assistant from Scratch with LangGraph, OpenAI API and… https://pub.towardsai.net/building-a-conversational-flight-booking-assistant-from-scratch-with-langgraph-openai-api-and-6fef2b4e8cc3 was originally published in Towards AI https://pub.towardsai.net on Medium, where people are continuing the conversation by highlighting and responding to this story.