cd /news/artificial-intelligence/building-a-multi-modal-evidence-revi… Β· home β€Ί topics β€Ί artificial-intelligence β€Ί article
[ARTICLE Β· art-45152] src=dev.to β†— pub= topic=artificial-intelligence verified=true sentiment=↑ positive

Building a Multi-Modal Evidence Review Agent for Damage Claims

A developer built a multi-modal evidence review agent for damage claims using Python, OpenAI GPT-4o, and structured prompting. The system processes text, images, and historical context to produce consistent, explainable decisions for car, laptop, and package claims. The multi-stage pipeline outperformed a single-shot approach on sample data, particularly for conflicting evidence and prompt injection attempts.

read2 min views1 publishedJun 30, 2026

GitHub:Arul1998/hackerrank-orchestrate-solution

Insurance and warranty claims appear straightforward: customers describe the issue and upload photos. In reality, evidence is often incomplete, contradictory, or even intentionally misleading. Building an AI system that produces consistent, explainable decisions requires reasoning across text, images, and historical context β€” not simply running a vision model.

I built this for the HackerRank Orchestrate June 2026 challenge β€” a 24-hour hackathon to design a system that verifies damage claims across cars, laptops, and packages.

The complete source code, prompts, evaluation scripts, and report are available on GitHub:

πŸ”— https://github.com/Arul1998/hackerrank-orchestrate-solution

Built with Python, OpenAI GPT-4o, GPT-4o-mini, structured prompting, and CSV-based orchestration.

In practice, automated claim review is messy:

Structured outputs are easier to validate, audit, integrate into downstream systems, and compare against human review. That is why the challenge requires a fixed CSV schema with fields like claim_status

, risk_flags

, severity

, and image-grounded justifications.

The system reads claims.csv

, inspects local images, and produces output.csv

β€” one structured decision per claim.

For every claim row, the agent outputs:

Field Meaning
evidence_standard_met
Are the images sufficient to evaluate the claim?
claim_status
supported , contradicted , or not_enough_information
issue_type / object_part
What damage is visible, and where?
risk_flags
Quality, mismatch, manipulation, or history risks
supporting_image_ids
Which images actually back the decision
severity
none β†’ high

Images are treated as the primary evidence because they directly represent the reported damage. Chat transcripts provide context, while historical claims influence risk assessment without overriding visual evidence.

These principles guided every architectural and prompt decision:

not_enough_information

) rather than guessing.I compared two strategies:

The multi-stage pipeline won on the sample set, especially for wrong-object photos, conflicting multi-image evidence, and prompt-injection attempts.

text
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ User claim  │────▢│ Claim extraction │────▢│ Structured intent    β”‚
β”‚ (chat text) β”‚     β”‚ (GPT-4o mini)    β”‚     β”‚ issue, part, summary β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                      β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                β”‚
β”‚ Images 1..N │────▢│ Per-image VLM    β”‚β—€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚             β”‚     β”‚ (GPT-4o)         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚ Decision synthesisβ”‚
                    β”‚ (GPT-4o mini)     β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚ Structured output β”‚
                    β”‚ output.csv        β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
── more in #artificial-intelligence 4 stories Β· sorted by recency
── more on @openai 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/building-a-multi-mod…] indexed:0 read:2min 2026-06-30 Β· β€”