Fine-Tuning Vision-Language Models for Understanding Current Damage and Scoring Priority with Quality Guard Agent

wpnews.pro

cd /news/computer-vision/fine-tuning-vision-language-models-f… · home › topics › computer-vision › article

[ARTICLE · art-16010] src=arxiv.org ↗ pub=2026-05-28T04:00Z topic=computer-vision verified=true sentiment=↑ positive

Fine-Tuning Vision-Language Models for Understanding Current Damage and Scoring Priority with Quality Guard Agent

Researchers in Japan fine-tuned the LLaVA-1.5-7B vision-language model on up to 4,000 bridge damage images to automate damage assessment and repair priority scoring, addressing significant inter-rater variability in mandatory five-year visual inspections. The study found that 2,000 training samples achieved near-optimal validation loss in 2.9 hours, with quality-curated mid-scale data outperforming larger, noisier corpora. A two-stage Quality Guard using a Swallow-8B small language model rejects low-quality outputs before scoring, reducing inference time by 70.2% to 10.06 seconds per image.

read1 min views12 publishedMay 28, 2026

arXiv:2605.27452v1 Announce Type: new Abstract: Bridge inspection in Japan requires mandatory visual assessments every five years, yet qualitative damage ratings (levels a-e) assigned by different engineers exhibit significant inter-rater variability -- a critical barrier to consistent infrastructure management. The aging of skilled engineers further threatens inspection capacity. This paper presents a methodology for automating bridge damage understanding and repair priority scoring using fine-tuned Vision-Language Models (VLMs). We fine-tune LLaVA-1.5-7B with QLoRA on up to 4,000 paired bridge damage images and inspection text records, then evaluate on a fixed test set of 800 images. The model outputs natural language descriptions identifying structural members and damage patterns, from which a rule-based scoring engine calculates a five-level repair priority index. A progressive training study (1k/2k/3k/4k samples) reveals that 2k training samples achieve near-optimal validation loss in only 2.9 hours of training; beyond 2k, validation loss improves by no more than 0.2% per doubling of training samples, exhibiting clear diminishing returns. Furthermore, semantic similarity on the held-out test set peaks at 3k (0.6909) and degrades at 4k (0.6739), indicating that quality-curated mid-scale data outperforms larger but noisier corpora. Inference optimization combining torch.compile() and batch processing (batch_size=8) achieves 10.06 seconds per image -- a 70.2% reduction over the unoptimized baseline. Our approach contributes to data governance in bridge inspection, reduces inter-rater variability, and provides AI-assisted triage to augment expert engineers in inspection workflows. Furthermore, we introduce a two-stage Quality Guard using a fine-tuned Swallow-8B SLM to reject low-quality VLM outputs before priority scoring, preventing spurious scores from damaged or unrecognised images.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/fine-tuning-vision-langu…

Read original on arxiv.org → arxiv.org/abs/2605.27452

mentioned entities

LLaVA-1.5-7B

QLoRA

Japan

metadata

slugfine-tuning-vision-language-models-for-understanding-current-damage-and-scoring

topic#computer-vision

secondary4 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevOpen House 2026 Day 1: real-time…

next →New poll points to possible Bece…

── more in #computer-vision 4 stories · sorted by recency

github.com · 14 Jul · #computer-vision

NanoSASRec

pub.towardsai.net · 14 Jul · #computer-vision

I Gave an AI Tutor a Memory That Survives Restarts — Here’s the Tiered Architecture (and Tested…

developer.nvidia.com · 14 Jul · #computer-vision

Lessons From the Leaderboard: What 5,000+ Kagglers Taught Us About Improving AI Reasoning

machinebrief.com · 14 Jul · #computer-vision

AI's Black Box: The Power of Weight-Adjusted Gradients

── more on @llava-1.5-7b 3 stories trending now

wpnews · 23 May · #artificial-intelligence

AccessLens — a blind person's lanyard, powered by Gemma 4 on-device

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 21 May · #developer-tools

Antigravity CLI: A Hands-On Guide to Google's Terminal Coding Agent

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required