{"slug": "building-a-multi-modal-evidence-review-agent-for-damage-claims", "title": "Building a Multi-Modal Evidence Review Agent for Damage Claims", "summary": "A developer built a multi-modal evidence review agent for damage claims using Python, OpenAI GPT-4o, and structured prompting. The system processes text, images, and historical context to produce consistent, explainable decisions for car, laptop, and package claims. The multi-stage pipeline outperformed a single-shot approach on sample data, particularly for conflicting evidence and prompt injection attempts.", "body_md": "GitHub:`Arul1998/hackerrank-orchestrate-solution`\n\nInsurance and warranty claims appear straightforward: customers describe the issue and upload photos. In reality, evidence is often incomplete, contradictory, or even intentionally misleading. Building an AI system that produces consistent, explainable decisions requires reasoning across text, images, and historical context — not simply running a vision model.\n\nI built this for the **HackerRank Orchestrate** June 2026 challenge — a 24-hour hackathon to design a system that verifies damage claims across **cars**, **laptops**, and **packages**.\n\nThe complete source code, prompts, evaluation scripts, and report are available on GitHub:\n\n🔗 [https://github.com/Arul1998/hackerrank-orchestrate-solution](https://github.com/Arul1998/hackerrank-orchestrate-solution)\n\nBuilt with **Python, OpenAI GPT-4o, GPT-4o-mini, structured prompting, and CSV-based orchestration**.\n\nIn practice, automated claim review is messy:\n\nStructured outputs are easier to validate, audit, integrate into downstream systems, and compare against human review. That is why the challenge requires a fixed CSV schema with fields like `claim_status`\n\n, `risk_flags`\n\n, `severity`\n\n, and image-grounded justifications.\n\nThe system reads `claims.csv`\n\n, inspects local images, and produces `output.csv`\n\n— one structured decision per claim.\n\nFor every claim row, the agent outputs:\n\n| Field | Meaning |\n|---|---|\n`evidence_standard_met` |\nAre the images sufficient to evaluate the claim? |\n`claim_status` |\n`supported` , `contradicted` , or `not_enough_information`\n|\n`issue_type` / `object_part`\n|\nWhat damage is visible, and where? |\n`risk_flags` |\nQuality, mismatch, manipulation, or history risks |\n`supporting_image_ids` |\nWhich images actually back the decision |\n`severity` |\n`none` → `high`\n|\n\nImages are treated as the **primary evidence** because they directly represent the reported damage. Chat transcripts provide context, while historical claims influence risk assessment without overriding visual evidence.\n\nThese principles guided every architectural and prompt decision:\n\n`not_enough_information`\n\n) rather than guessing.I compared two strategies:\n\nThe multi-stage pipeline won on the sample set, especially for wrong-object photos, conflicting multi-image evidence, and prompt-injection attempts.\n\n```\ntext\n┌─────────────┐     ┌──────────────────┐     ┌──────────────────────┐\n│ User claim  │────▶│ Claim extraction │────▶│ Structured intent    │\n│ (chat text) │     │ (GPT-4o mini)    │     │ issue, part, summary │\n└─────────────┘     └──────────────────┘     └──────────┬───────────┘\n                                                      │\n┌─────────────┐     ┌──────────────────┐                │\n│ Images 1..N │────▶│ Per-image VLM    │◀─────────────┘\n│             │     │ (GPT-4o)         │\n└─────────────┘     └────────┬─────────┘\n                             │\n                    ┌────────▼──────────┐\n                    │ Decision synthesis│\n                    │ (GPT-4o mini)     │\n                    └────────┬──────────┘\n                             │\n                    ┌────────▼──────────┐\n                    │ Structured output │\n                    │ output.csv        │\n                    └───────────────────┘\n```\n\n", "url": "https://wpnews.pro/news/building-a-multi-modal-evidence-review-agent-for-damage-claims", "canonical_source": "https://dev.to/arul_cornelious/building-a-multi-modal-evidence-review-agent-for-damage-claims-2nc6", "published_at": "2026-06-30 16:20:48+00:00", "updated_at": "2026-06-30 16:49:04.505860+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "computer-vision", "ai-agents"], "entities": ["OpenAI", "GPT-4o", "GPT-4o-mini", "HackerRank", "GitHub", "Arul1998"], "alternates": {"html": "https://wpnews.pro/news/building-a-multi-modal-evidence-review-agent-for-damage-claims", "markdown": "https://wpnews.pro/news/building-a-multi-modal-evidence-review-agent-for-damage-claims.md", "text": "https://wpnews.pro/news/building-a-multi-modal-evidence-review-agent-for-damage-claims.txt", "jsonld": "https://wpnews.pro/news/building-a-multi-modal-evidence-review-agent-for-damage-claims.jsonld"}}