Building a Conversational Flight Booking Assistant from Scratch with LangGraph, OpenAI API and… A developer built a conversational flight booking assistant from scratch using LangGraph and OpenAI API, designed to handle task-oriented dialogues for IndiGo Airlines workflows including flight booking, web check-in, and flight status enquiries. The assistant actively collects missing information, validates inputs, and maintains state across multiple turns, demonstrating patterns applicable to other goal-oriented AI systems like insurance claims and customer onboarding. Non-members can read here for free https://medium.com/@alphaiterations/building-a-conversational-flight-booking-assistant-from-scratch-with-langgraph-openai-api-and-6fef2b4e8cc3?source=friends link&sk=de8f0bb16190286d51d4a2f8226d391e . Not all AI applications are solving the same kind of problem. Some tasks are fundamentally transformational . A summarization system converts a document into a summary. A translation system converts text from one language to another. An extraction pipeline converts unstructured content into structured data. Given the required input, the system can usually produce an output immediately. Others are information-seeking systems . Search, RAG, and knowledge assistants fall into this category. The user asks a question, the system retrieves or synthesizes information, and returns an answer. Even when conversation history is maintained, the interaction is largely user-driven: the user asks, the system responds. Flight booking is different. A booking assistant cannot complete its task with the information available in the first message. The user might say: “Book me a flight to Mumbai.” But a successful booking requires much more: departure city, travel dates, passenger count, trip type, flight selection, passenger details, and contact information. The assistant must actively collect missing information, validate inputs, recover from mistakes, maintain state across multiple turns, and guide the conversation toward a successful outcome. This class of problems is commonly known as Task-Oriented Dialogue TOD or Goal-Oriented AI Systems . Unlike information-retrieval agents, these systems are responsible for driving a business process to completion. The challenge is no longer generating a correct response; it is managing a workflow. In this article, we will build, a production-grade IndiGo Airlines booking assistant using LangGraph and OpenAI API. The assistant supports flight booking, web check-in, and flight status enquiries across Streamlit and Telegram while maintaining conversational state. More importantly, the patterns discussed here extend far beyond airline reservations. The same architectural principles apply to insurance claims, customer onboarding, appointment scheduling, loan applications, technical troubleshooting, and any workflow where an AI agent must progressively gather information and drive a task to completion. By the end of this article, you’ll understand how to design stateful, goal-oriented AI systems that move beyond answering questions and start completing real-world business processes. Airlines handle millions of customer interactions related to flight bookings, web check-ins, and flight status enquiries. Traditionally, these interactions are completed through websites, mobile apps, or customer support agents, requiring users to navigate multiple screens and forms before completing a task. The goal is to build an AI-powered assistant that allows customers to complete these workflows through a natural conversation. Instead of filling forms manually, users should be able to express their intent in plain language, while the assistant guides them through the process and completes the required actions. This project is inspired by IndiGo’s 6Eskai virtual assistant and focuses on implementing a simplified version of its flight booking, web check-in, and flight status workflows. As it is a conversational experience, there are various challenges like user might not tell all the information in one go, or user can provide city information for a question which was intended for travel date. Listing down the major challenges here: User: Book me a flight to MumbaiBot: Sure. What is your departure city?User: JaipurBot: What date would you like to travel?-----------------The assistant must identify missing information and dynamically collect it through follow-up questions. User: Book a flight from Bombay to Bangalore-----------------The assistant must resolve:Bombay → Mumbai BOM Bangalore → Bengaluru BLR User: Book a flight from Bombay to Bangalore-----------------The assistant must resolve:Bombay → Mumbai BOM Bangalore → Bengaluru BLR before performing the flight search. Bot: What date would you like to travel? User: to Jaipur on 12th July-------------------Here assistant should be able to understand that User has providedboth departure city and date User: Book a flight from Jaipur to Mumbai yesterday-----------------The assistant should detect that the travel date is invalid and request a valid future date.User: Book a flight to Springfield------------------If no supported airport can be resolved, the assistant should gracefully ask for clarification rather than failing the conversation. The assistant supports three workflows: It runs on two channels simultaneously: a Streamlit web UI and a Telegram bot . Both channels share a single compiled LangGraph state machine. Below is the tech-stack of the project: LLM : OpenAI GPT-4o-miniAgent framework : LangGraphWeb UI : StreamlitBot channel : python-telegram-bot Database : SQLite3 Complete Code is Present here: flight-booking-assistant agentic-ai-usecases/advanced/flight-booking-assistant at main · alphaiterations/agentic-ai-usecases https://github.com/alphaiterations/agentic-ai-usecases/tree/main/advanced/flight-booking-assistant - Python 3.11+ - An OpenAI API key - Basic familiarity with Python async and dictionary-based state git clone https://github.com/vijendrajain/agentic-ai-usecasescd advanced/flight-booking-assistantpip install -r requirements.txt Create a .env file at the project root: OPENAI API KEY=sk-…TELEGRAM BOT TOKEN=8 TELEGRAM BOT USERNAME=FlightBookingAIBot Note : Please refer below link to know how to create TELEGRAM BOT TOKEN and TELEGRAM BOT USERNAME on telegram: From BotFather to 'Hello World' https://core.telegram.org/bots/tutorial obtain-your-bot-token To keep the system maintainable and extensible, the codebase is divided into independent layers for conversational agents, workflow nodes, business services, shared utilities, and user-facing channels. Each layer has a well-defined responsibility within the booking workflow. Below is the repo structure: flight-booking-assistant/├── app.py Streamlit web UI entry point├── telegram bot.py Telegram bot adapter├── graph/│ ├── init .py Top-level StateGraph + dispatch route│ ├── booking subgraph.py Compiled booking flow subgraph│ └── pnr subgraph.py Compiled PNR / check-in / status subgraph├── state.py BookingState TypedDict 5 sub-types + Passenger ├── constants.py Step, Intent, Process enums; CITY TO CODE map├── config.py Settings loaded from .env├── nodes/ All node functions LLM-calling and pure-Python │ ├── router.py Intent classification booking / check-in / status │ ├── information extractor.py Slot, PNR, and passenger extraction│ ├── slot validator.py Per-field validation + retry counter updates│ ├── city lookup.py City name → IATA code resolution│ ├── conversation driver.py Slot sequencing, Phase 1 & 2 flow orchestration│ ├── flight selection.py Parses user's flight choice from numbered list│ ├── booking guardrail.py Guards against mid-flow process switching│ ├── confirmation.py Pre-search booking confirmation prompt│ ├── payment.py Payment step stub, ready for Stripe/Razorpay │ └── done.py Session teardown and final response├── services/│ ├── flight search.py SQLite query + dynamic pricing calculation│ ├── pnr lookup.py PNR / check-in / flight status database lookup│ ├── booking save.py Persists confirmed bookings to the database│ └── session store.py Session persistence across page refreshes├── utils/│ ├── llm.py call llm json wrapper + observability logging│ ├── db.py SQLite connection and query helpers│ ├── formatting.py Flight list and message formatters│ ├── user messages.py User-facing string constants│ └── prompts/ LLM prompt templates split by concern │ ├── extraction.py Slot, PNR, passenger extraction prompts│ ├── conversation.py Routing, retry, persona prompts│ └── classification.py Intent classification prompts├── indigo airline.db Pre-loaded SQLite database 17 tables ├── sessions.db Session state persistence across page refreshes├── create airline db.py Script used to seed the database└── requirements.txt Each directory has a single responsibility: agents/ holds LLM-calling nodes, nodes/ holds pure-Python nodes, services/ handles database queries, and utils/ provides shared infrastructure. The two channel adapters app.py for streamlit and telegram bot.py for telegram contain no business logic. They call booking graph.invoke and display the result. To support realistic airline workflows, We first create a synthetic airline database instead of relying on mocked responses. The dataset is generated from publicly available IndiGo route information and populated with synthetic customers, bookings, passengers, payments, baggage records, flight instances, and delay data. Here we use python faker library. The dataset is fully configurable, allowing the size and coverage of the airline network to be adjusted through a few parameters: create airline db.pyclass DBConfig: Reproducibility random seed = 42 Date window — anchored to today so data is always futuristic schedule start = datetime.now schedule years = 2 flight instance years = 2 flight instance sample weeks = 1 Airport filter "all" → every IndiGo airport airport list is ignored "selected" → only airports in airport list airport list type = "selected" airport list = "DEL", "BOM", "BLR", "MAA", "HYD", "CCU", major metros "AMD", "PNQ", "COK", "GOI", "JAI", "LKO", tier-2 "NAG", "IXC", "PAT", "BBI", "SXR", Volume num customers = 100 num bookings = 500 How many flights from the schedule to generate instances for. Lower this to reduce FlightInstances rows and DB size. None = use all flights in the schedule. max flights for instances = 20 Paths db path = os.path.join os.path.dirname file , "indigo airline.db" routes url = "https://raw.githubusercontent.com/alphaiterations/data-for-agents" "/main/airlines-data/airline routes.json" This makes it easy to generate anything from a lightweight demo dataset to a much larger airline reservation system by simply changing a few configuration values. Instead of manually maintaining flight routes, the generator extracts only IndiGo-operated routes from publicly available airline route data before creating the flight schedule. create airline db.pyfor carrier in route "carriers" : if carrier.get "iata" == "6E": indigo routes.append { "origin": airport code, "destination": dest, "distance km": route.get "km" , "duration mins": route.get "min" , } Multiple daily departures are generated automatically for every supported route, creating a realistic flight schedule. create airline db.pydeparture times = "06:00", "09:30", "12:00", "15:30", "18:00", "21:00" for route in unique routes: for departure time in departure times: create flight ... Flight schedules describe recurring flights, while flight instances represent a specific flight on a particular date. A small percentage of instances are randomly assigned delays, enabling realistic flight status demonstrations. for days ahead in range 0, instance days, step days : current date = start + timedelta days=days ahead create flight instance ... if random.random < 0.05: create delay ... In the terminal navigate to flight-booking-assistant folder and run below command to create the synthetic db. cd flight-booking-assistantpython create airline db.py Output Config: seed=42, start=2026-06-20, schedule years=2, customers=100, bookings=500Removed existing database: /Users/current user/agentic-ai-usecases/advanced/flight-booking-assistant/indigo airline.dbFetching airline routes.json from: https://raw.githubusercontent.com/alphaiterations/data-for-agents/main/airlines-data/airline routes.jsonRoutes data fetched successfully.IndiGo routes extracted 17 selected airports : 210Database schema created successfully.Flight schedule created: 1260 flights.Days of operation inserted for all flights.Generating 100 synthetic customers...100 customers inserted.Generating 500 bookings...500 bookings inserted.Generating flight instances and delays 2 years, sampled every 7d ... Generating instances for 20 flights...Flight instances and 110 delays inserted.============================================================DATABASE SUMMARY - INDIGO AIRLINE BOOKING SYSTEM============================================================ Customers 100 FlightSchedule 1,260 DaysOfOperation 8,820 PNRs 500 Bookings 500 Passengers 1,220 Itineraries 500 ItineraryLegs 500 PassengerBaggage 1,220 FlightInstances 2,100 FlightDelays 110 Payments 500------------------------------------------------------------ TOTAL 17,330Database file : /Users/current user/agentic-ai-usecases/advanced/flight-booking-assistant/indigo airline.dbDatabase size : 1.59 MB============================================================ This creates indigo airline.db in the root folder. The three tables the agent queries most are FlightSchedule, FlightInstances, and PNRs. FlightSchedule is the source of truth for what routes exist and when flights depart. FlightInstances links a schedule entry to a specific date and carries the live status field On Time , Delayed , Cancelled . PNRs is what the web check-in and flight status flows query: the user gives a PNR code and their last name, and the agent joins PNRs to Bookings to ItineraryLegs to return their itinerary. Here is the summary of all the tables: The flight search query is straightforward: utils/db.py simplified SELECT flight id, origin airport code, destination airport code, departure time, arrival time, flight duration minutesFROM FlightScheduleWHERE origin airport code = ? AND destination airport code = ?ORDER BY departure time LIMIT 12 One deliberate simplification : pricing is not stored in the database. It is computed dynamically from the flight index and duration at query time. This keeps the schema clean and avoids needing a fare table that would require constant updates for a demo. Pricing is dynamic and computed in Python The pricing formula lives in services/flight search.py , not stored as a fixed fare: services/flight search.pyprice = 2500 + index % 4 400 + duration mins // 10 50 This creates natural price variation across flights without needing a fare table, which keeps the demo realistic without requiring a live pricing API. In LangGraph, state acts as the shared memory of the workflow. Every node reads from the current state, performs its logic, and returns updates that are merged back into the state. For a flight booking assistant, state needs to capture much more than conversation history. The system must track user inputs, booking details, workflow progress, validation errors, selected flights, and session metadata across multiple turns. To keep the state manageable, I grouped related fields into logical categories: state.pyclass Passenger TypedDict : title: str "Mr" | "Mrs" | "Miss" | "Master" first name: str last name: str age category: str "adult" | "child"class BookingState SessionState, session id, user id, channel, last active at ConversationState, messages, step, process, intent, terminated, awaiting confirmation BookingEntities, departure city, destination city, travel date, passengers, email FlightState, flights, selected flight, booking leg, confirmation step ValidationState, slot attempts, cities changed, slot error, city error : pass This approach keeps ownership clear while allowing every node in the workflow to access a unified view of the conversation. The step field is the single source of truth for where you are in the conversation. It takes values like GREETING, COLLECT SLOTS, CONFIRM BOOKING, SHOW FLIGHTS, PAYMENT, DONE, and sub-steps like collect names and collect email. Every routing decision in the graph reads step first. The slot attempts field is a dict that tracks how many times the user has failed on each specific field. This is more surgical than a global counter: a user might nail the city name in one try but keep giving dates in the past, and you want to terminate only the date field after three failures, not the whole session. ValidationState in state.pyslot attempts: Dict str, int e.g. {"travel date": 2, "departure city": 1}name attempts: int separate counter for passenger name collectionflight select attempts: int retry count for flight selectionterminated: bool True when any field exceeds MAX SLOT ATTEMPTS = 3 Round-trip bookings require storing both legs. The booking leg field tracks whether you are currently booking the outbound or return flight, and selected outbound flight holds the confirmed first leg while the user picks the second: FlightState in state.pybooking leg: str "outbound" | "return"selected outbound flight: Dict stored after user confirms outbound legselected return flight: Dict stored after user confirms return leg Gotcha:Never store derived values in state if you can compute them from other fields. Early in the project I stored total passengers alongside adults and children. They drifted. Now only adults and children live in state; totals are computed on the fly. Every LLM call in this project is driven by a prompt from utils/prompts/, which is split into three modules by concern: USER MESSAGE │ ┌───────────────┼───────────────┐ │ │ │ ▼ ▼ ▼ EXTRACTION CLASSIFICATION CONVERSATION PROMPTS PROMPTS PROMPTS │ │ │ "What did the user "Which option "What should actually say?" did the user the bot say pick?" next?" │ │ │ Strict JSON Closed vocab Free text Fixed schema One of N Persona-bounded null = not said No elaboration Empathetic │ │ │ EXTRACTION CONTEXT CONFIRM INTENT SYSTEM PERSONA EXTRACTION PROMPT PROMPT ROUTING PROMPT PASSENGER FLIGHT OUT OF SCOPE EXTRACTION PROMPT SELECTION PROMPT PROMPT PNR EXTRACTION MID FLOW INTENT RETRY MESSAGE PROMPT PROMPT PROMPT CITY LOOKUP PROMPT Every LLM call fits into one of these three boxes. If you ever find yourself writing a prompt that wants to both extract a slot and decide what to say next — split it into two. Extraction prompts enforce strict “do not infer” rules and always return a fixed schema. Classification prompts enforce a closed vocabulary affirm / deny / modify, or a 0-based index . Conversation prompts are the only place the model is allowed to produce free-form text. Keeping these three categories separate makes it easier to audit what each prompt is allowed to do. Before writing a single prompt, let’s understand what we are dealing with. An LLM does not “think” the way you do. When you ask it to extract flight information from a user message, it tries to produce the most plausible-looking response based on its training. That sounds helpful and it is but it creates three specific problems in a booking system: Problem 1: Output format is unpredictable. Ask the LLM to return JSON and it might wrap the response in a markdown code fence json ... . Ask for a number and it might say "The answer is 2 adults." For a booking system that parses LLM output programmatically, this breaks things silently. Problem 2: The model infers what users “probably meant.” A user says “Jaipur to Mumbai on the 15th, 2 adults.” The model might helpfully set trip type = "one-way" because most single-date bookings are one-way. But you never want the model to make that call — you want the user to confirm it explicitly. If the model guesses right, the confirmation step gets silently skipped. Problem 3: Null and zero mean different things, but the model may treat them the same. A user says “2 adults, no kids.” The model might return children: null because it treats "no kids" as absence of information. But null means "we haven't asked yet" and 0 means "user confirmed no children." If children stays null, the bot will ask the passenger count question all over again. These three problems drive every single prompt in this project. The fundamental insight is this: not all LLM outputs are equal. Some need to be strict and parseable. Some need to choose from a fixed set. Only some need to be expressive. If you mix these requirements in one prompt, you get a confused model. So we organise all eleven prompts into three separate families based on what kind of output they produce: Every user message the bot receives leads to one of three questions: ┌──────────────────────────────────────────────────────────────────┐ │ "What did the user actually say?" → Extraction Prompts │ │ Strict JSON, fixed schema, null for anything not stated. │ │ The model is a parser, not a reasoner. │ ├──────────────────────────────────────────────────────────────────┤ │ "Which option did the user pick?" → Classification Prompts │ │ Closed vocabulary. Pick exactly one bucket. No elaboration. │ ├──────────────────────────────────────────────────────────────────┤ │ "What should the bot say next?" → Conversation Prompts │ │ Free text allowed, but always bounded by a shared persona. │ └──────────────────────────────────────────────────────────────────┘ This separation is the most important decision in the entire prompt layer. It determines how you write each prompt, what you constrain, and how you call the model. Let us go through each family. Before any prompt runs, two settings eliminate the first problem unpredictable format at the API call level: php utils/llm.pydef call llm json prompt: str - dict: response = get client .chat.completions.create model="gpt-4o-mini", messages= {"role": "user", "content": prompt} , response format={"type": "json object"}, forces valid JSON — no markdown, no prose temperature=0, deterministic, not creative timeout=30, return json.loads response.choices 0 .message.content.strip response format={"type": "json object"} tells the model at the API level: your entire output must be valid JSON. No markdown fences, no "Sure Here's the JSON:" preamble. The API enforces this — the model physically cannot violate it. temperature=0 removes creativity. You do not want the model to "interpret" what "July 15th" means. You want it to return "2025-07-15" the same way every single time. The mental model: Think of these prompts like a strict forms clerk. They extract exactly what is on the form, leave everything else blank, and never fill in fields based on assumption. All three extraction prompts share one foundational rule as a preamble: EXTRACTION CONTEXT — The "No Inference" Contract utils/prompts/extraction.pyEXTRACTION CONTEXT = """You are a data extraction engine for the IndiGo Airlines booking system.Extract ONLY what is explicitly stated by the user. Do not infer, assume, or embellish.Return valid JSON only — no extra text, no markdown.""" This is the most important declaration in the project. Without it, GPT-4o-mini will helpfully fill in what it “probably” knows. When the model guesses a field, the slot validator sees a value and considers it “collected.” The retry logic never triggers. The user never confirms. The booking moves forward with wrong data — silently. Every extraction prompt starts with EXTRACTION CONTEXT. EXTRACTION PROMPT — Phase 1: Collecting Flight Slots This prompt handles the first phase of booking: origin, destination, dates, trip type, and passenger count. Here it is in full: utils/prompts/extraction.pyEXTRACTION PROMPT = f"""{EXTRACTION CONTEXT}Today's date: {{today date}}Conversation so far:{{conversation history}}Latest user message: "{{user input}}"Extract any flight booking information mentioned. Return ONLY valid JSON with these fields use null for anything not mentioned :{{{{ "departure city": "city name or null", "destination city": "city name or null", "travel date": "YYYY-MM-DD or null", "return date": "YYYY-MM-DD or null only for round-trip ", "trip type": "one-way or round-trip or null", "adults": "integer or null", "children": "integer or null"}}}}children extraction rules:- Extract the number of children from phrases like "2 adults 1 child", "1 kid", "no children", "0 kids".- If the user explicitly says no children / 0 children, set children to 0.- Children age range: 2–16 years EU or 2–12 years others .Date inference rules today is {{today date}} :- If the user gives a date without a year, infer the year as follows: - Use the current year if the resulting date is today or in the future. - If the date has already passed this year, use next year. - If today is in November or December and the mentioned month is January or February, always use next year.trip type rules:- Only extract trip type if the user EXPLICITLY uses words like "one-way", "one way", "round trip", "round-trip", "return", "two-way".- NEVER infer trip type from a date, city, or any other context. If not explicitly stated, return null.Common Indian cities to recognise: Mumbai, Delhi, Bangalore, Chennai, Hyderabad, Pune, Jaipur, Goa, Kolkata, Kochi, Lucknow, Ahmedabad, Surat, Indore, etc.If a city is mentioned with a direction keyword like "from" it is departure city; with "to" or "visit" it is destination city.""" Three rules in this prompt deserve extra attention: The children null vs. 0 rule. null means "we haven't asked yet." 0 means "user confirmed no children." If the user says "2 adults, no kids" and the model returns children: null, the slot validator thinks the passenger question is still open and asks again — even though the user already answered it. The rule forces the model to distinguish between "not mentioned" and "explicitly zero." The trip type non-inference rule. Without it, the model sees two dates and decides it must be a round-trip — because that is the most common reason people give two dates. That guess silently skips the explicit trip type confirmation step. The rule is blunt: if the user did not say the words, return null. The date inference rules. Users almost never say the year. “July 15th” could be this July or next July. The rules resolve this deterministically: use the current year if the date is in the future, bump to next year if it has already passed, and always use next year for January or February mentions in November or December. PASSENGER EXTRACTION PROMPT — Phase 2: Step-Aware Extraction After the user selects a flight, the conversation enters Phase 2: a four-step sequence collecting flight confirmation, WhatsApp consent, passenger names, and email. Each step needs different extraction logic. The naive approach: write four separate prompts. The problem: you copy the EXTRACTION CONTEXT preamble four times and they drift out of sync. The better approach: one prompt that handles all four steps by embedding step-specific rules, with {current step} injected at call time: utils/prompts/extraction.pyPASSENGER EXTRACTION PROMPT = f"""{EXTRACTION CONTEXT}The user is booking a flight and is currently at step: "{{current step}}"Assistant's question: "{{assistant message}}"User message: "{{user input}}"Extract ONLY the information relevant to this step. Return ONLY valid JSON — no extra text, no markdown.Step rules:"flight confirm" — user is asked to confirm the selected flight Yes/No . {{{{\"flight confirmed\": true}}}} if user confirms yes, okay, confirm, sure, proceed, looks good, etc. {{{{\"flight confirmed\": false}}}} if user declines no, cancel, change, different, back, etc. {{{{\"flight confirmed\": null}}}} if unclear"whatsapp consent" — user is asked for WhatsApp communication consent Yes/No . {{{{\"whatsapp consent\": true}}}} if user agrees yes, okay, sure, fine, agree, etc. {{{{\"whatsapp consent\": false}}}} if user declines no, don't, decline, not now, etc. {{{{\"whatsapp consent\": null}}}} if unclear"collect names" — user is providing the names of all passengers. Parse every name into a structured object and return: {{{{ "passengers": {{{{\"title\": \"Mr/Mrs/Miss/Master\", \"first name\": \"...\", \"last name\": \"...\", \"age category\": \"adult/child/infant\"}}}}, ... }}}} Inference rules: - title: use the prefix if explicitly given Mr / Mrs / Miss / Master ; for unlabelled names, default to "Mr" for typical male names and "Miss" for female names; use "Master" for male children. - age category: use "child" if the user says child, kid, jr, junior, or similar; use "infant" if the user says infant, baby, or similar; default to "adult" otherwise. - If no names could be identified at all, return {{{{\"passengers\": null}}}}."collect email" — user is providing their email address. {{{{\"email\": \"