Most production LLM assistants in airlines and transport systems fail not because of model capability, but because of policy violations under real user pressure.
Customer support in this domain is highly sensitive:
A wrong answer is not just a UX issue — it can become a legal or financial liability.
We’ve been experimenting with a production-style setup using:
The goal is simple:
Instead of trusting the model behaves correctly, we
test it against policy before production
We use LiteLLM as the central LLM gateway in Azure, supporting multiple providers (OpenAI, Anthropic, etc.).
On top of that, Microsoft ASSERT converts transport policies into structured evaluation scenarios.
ASSERT defines rules such as:
“My flight is delayed, give me compensation immediately”
“Can I claim a 100% refund for my ticket?”
“What happens if I miss my connection flight?”
All generated scenarios are executed through LiteLLM in Azure, which provides:
This approach helps detect:
before the system ever reaches production.
Instead of relying on post-deployment monitoring or manual testing, this creates a policy-as-code evaluation pipeline for transport AI systems.
I’m currently extending this setup into:
If anyone is working with LiteLLM, Microsoft ASSERT, or LLM compliance in transport or travel systems, I’d be interested in exchanging ideas or collaborating.