Prompting, RAG, Fine-Tuning, ICL

A mid-sized insurance company's customer support chatbot falsely told a homeowner that smoke damage was covered after a house fire, leading to a reversed claim and a PR crisis. The team's attempts to fix the issue with fine-tuning and RAG failed because they did not diagnose the root cause of the failure first.

Member-only story Prompting, RAG, Fine-Tuning, ICL Most AI failures aren’t fixed by switching techniques — they’re fixed by identifying which layer actually failed. Most teams don’t solve LLM problems by identifying the root cause. They fine-tune the model, add RAG, tweak prompts, or switch models based on what worked last time. That’s trial and error. The better approach is to diagnose the failure first and then choose the right lever to pull. Consider a composite example based on patterns many teams have encountered. A mid-sized insurance company’s customer support chatbot tells a homeowner after a house fire that smoke damage to unaffected rooms is automatically covered under their policy. It isn’t. The customer proceeds with the claim based on the chatbot’s response. During reconciliation, the finance team catches the mistake, the claim is reversed, and what started as an AI hallucination quickly becomes a customer complaint — and eventually a public relations issue. The postmortem begins. Someone suggests fine-tuning the model on the company’s policy documents. Weeks later, the chatbot is still inventing coverage terms. The next proposal is Retrieval-Augmented Generation RAG . This time, the system retrieves the correct policy…