GitHub:Arul1998/hackerrank-orchestrate-solution
Insurance and warranty claims appear straightforward: customers describe the issue and upload photos. In reality, evidence is often incomplete, contradictory, or even intentionally misleading. Building an AI system that produces consistent, explainable decisions requires reasoning across text, images, and historical context β not simply running a vision model.
I built this for the HackerRank Orchestrate June 2026 challenge β a 24-hour hackathon to design a system that verifies damage claims across cars, laptops, and packages.
The complete source code, prompts, evaluation scripts, and report are available on GitHub:
π https://github.com/Arul1998/hackerrank-orchestrate-solution
Built with Python, OpenAI GPT-4o, GPT-4o-mini, structured prompting, and CSV-based orchestration.
In practice, automated claim review is messy:
Structured outputs are easier to validate, audit, integrate into downstream systems, and compare against human review. That is why the challenge requires a fixed CSV schema with fields like claim_status
, risk_flags
, severity
, and image-grounded justifications.
The system reads claims.csv
, inspects local images, and produces output.csv
β one structured decision per claim.
For every claim row, the agent outputs:
| Field | Meaning |
|---|---|
evidence_standard_met |
|
| Are the images sufficient to evaluate the claim? | |
claim_status |
|
supported , contradicted , or not_enough_information |
|
issue_type / object_part |
|
| What damage is visible, and where? | |
risk_flags |
|
| Quality, mismatch, manipulation, or history risks | |
supporting_image_ids |
|
| Which images actually back the decision | |
severity |
|
none β high |
|
Images are treated as the primary evidence because they directly represent the reported damage. Chat transcripts provide context, while historical claims influence risk assessment without overriding visual evidence.
These principles guided every architectural and prompt decision:
not_enough_information
) rather than guessing.I compared two strategies:
The multi-stage pipeline won on the sample set, especially for wrong-object photos, conflicting multi-image evidence, and prompt-injection attempts.
text
βββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββββββ
β User claim ββββββΆβ Claim extraction ββββββΆβ Structured intent β
β (chat text) β β (GPT-4o mini) β β issue, part, summary β
βββββββββββββββ ββββββββββββββββββββ ββββββββββββ¬ββββββββββββ
β
βββββββββββββββ ββββββββββββββββββββ β
β Images 1..N ββββββΆβ Per-image VLM ββββββββββββββββ
β β β (GPT-4o) β
βββββββββββββββ ββββββββββ¬ββββββββββ
β
ββββββββββΌβββββββββββ
β Decision synthesisβ
β (GPT-4o mini) β
ββββββββββ¬βββββββββββ
β
ββββββββββΌβββββββββββ
β Structured output β
β output.csv β
βββββββββββββββββββββ