Multi-Field RAG Enhances Maritime Accident Root Cause Analysis

Seongjin Kim and a co-author proposed a multi-field hybrid retrieval-augmented generation (RAG) framework to automate maritime accident root cause analysis, according to an arXiv submission. The system, built on 13,329 Korea Maritime Safety Tribunal reports from 1971 to 2025, uses structured "incident cards" and field-aware hybrid retrieval to improve NormRecall@100 from 0.18 to 0.55 and raise an LLM-as-a-judge quality score from 3.34 to 3.72 over a baseline. The framework aims to speed precedent search and improve consistency in root cause analysis drafting for regulated, document-heavy industries.

Multi-Field RAG Enhances Maritime Accident Root Cause Analysis According to the arXiv submission arXiv:2606.13249 , Seongjin Kim and one other author present a multi-field hybrid retrieval-augmented generation RAG framework for automated maritime root cause analysis. The paper builds a structured knowledge base of 13,329 Korea Maritime Safety Tribunal KMST adjudication reports spanning 1971-2025 , creating indexed "incident cards" with three fields: Summary , Causes , and Disposition . The authors report a field-aware hybrid retrieval that fuses sparse and dense rankings via RRF Reciprocal Rank Fusion , improving NormRecall@100 from 0.18 to 0.55 , and raising an LLM-as-a-judge quality score from 3.34 to 3.72 over an LLM-only baseline, per the arXiv abstract. The paper suggests that field-aware RAG can speed precedent search and improve consistency in RCA drafting, according to the submission. Editorial analysis: For practitioners, the results indicate that domain-structured indexing plus hybrid retrieval can materially raise retrieval recall and downstream generation quality in regulated, document-heavy verticals such as maritime safety. What happened According to the arXiv submission arXiv:2606.13249 , Seongjin Kim and one other author propose a multi-field hybrid retrieval-augmented generation RAG pipeline aimed at automating maritime accident root cause analysis RCA . The paper constructs a structured knowledge base from 13,329 Korea Maritime Safety Tribunal KMST reports covering 1971-2025 , converting adjudications into indexed "incident cards" with three explicit fields: Summary , Causes , and Disposition , and pairing entries with a hierarchical L1/L2 cause taxonomy, per the submission. The authors evaluate a field-aware hybrid retrieval strategy that fuses sparse and dense rankings using RRF Reciprocal Rank Fusion and report improvements in retrieval and generation metrics: NormRecall@100 increases from 0.18 to 0.55 , and an LLM-as-a-judge score rises from 3.34 to 3.72 versus an LLM-only baseline, according to the abstract. Technical details Editorial analysis - technical context: The approach combines three practical elements commonly used in applied RAG systems: 1 structured, multi-field indexing to preserve document semantics across distinct report components; 2 hybrid retrieval that merges sparse e.g., BM25 and dense embedding ranks; and 3 fusion via RRF to produce consolidated candidate lists. The paper measures retrieval using ceiling-normalized recall and nDCG based on a metadata-derived proxy relevance score, a pragmatic choice given the absence of large-scale expert relevance annotations reported in the submission. Context and significance Editorial analysis: For practitioners working on vertical RAG, this paper provides an empirical case that domain-specific document structuring plus hybrid ranking can substantially lift recall and improve downstream LLM outputs. The magnitude of the reported retrieval improvement 0.18 to 0.55 NormRecall@100 is notable for workflows where precedent discovery is the bottleneck. The use of a multi-field index mirrors common legal and regulatory IR patterns where different document segments carry distinct evidentiary weight. What to watch Editorial analysis: Observers should look for follow-up artifacts from the authors-released code, index schemas, embedding model choices, and evaluation scripts-that would enable reproducibility and transfer to other regulated domains. Additional signals of practical impact would include human-in-the-loop evaluations with investigators, error analyses showing failure modes across cause taxonomy levels, and comparisons using expert relevance labels rather than metadata proxies. Scoring Rationale The paper reports substantive, domain-specific retrieval and generation gains using a large, real-world KMST dataset, which is notable for practitioners building vertical RAG systems, but it is not a frontier-model or broadly generalizable release. Practice interview problems based on real data 1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with. Try 250 free problems /problems