Building a Self-Verifying FTIR Agent with Qwen Function Calling

wpnews.pro

cd /news/artificial-intelligence/building-a-self-verifying-ftir-agent… · home › topics › artificial-intelligence › article

[ARTICLE · art-40365] src=dev.to ↗ pub=2026-06-26T04:40Z topic=artificial-intelligence verified=true sentiment=↑ positive

Building a Self-Verifying FTIR Agent with Qwen Function Calling

A developer built ChemSpectra Agent, an FTIR spectral analysis system using Qwen-3.7-Max function calling, for the Qwen Cloud Hackathon. The agent autonomously selects from five analysis tools, cross-validates evidence, and triggers self-verification when confidence is low or conflicts are detected. The system demonstrates an agent that checks its own work, catching errors that single-pass analysis misses.

read3 min views1 publishedJun 26, 2026

Built for Track 4: Autopilot Agent — #QwenCloudHackathon

Most AI "agents" are API wrappers with a system prompt. Upload data, call one endpoint, return the result. No verification, no reasoning about what went wrong, no ability to self-correct.

For the Qwen Cloud Hackathon, I built ChemSpectra Agent — an FTIR spectral analysis system where Qwen-3.7-Max autonomously selects tools, cross-validates evidence across multiple results, and triggers self-verification when confidence is low. The key insight: an agent that checks its own work catches errors that single-pass analysis misses.

The agent has access to 5 analysis tools, each hitting a different endpoint of the FTIR.fun spectral library (130,000+ reference spectra):

Tool	Purpose
`identify_material`
Match spectrum against reference library, return ranked candidates
`explain_peaks`
Explain what chemical bond vibration each peak represents
`assign_functional_groups`
Map peaks to functional groups (C=O, O-H, N-H, etc.)
`match_library_topk`
Rapid top-K screening without deep analysis
`search_public_results`
Search publicly shared analysis cases (via MCP)

Instead of hardcoding which tools to call, I define these as Qwen Function Calling schemas and let the model decide:

AGENT_TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "identify_material",
            "description": "Match spectrum against 130,000+ reference spectra...",
            "parameters": {
                "type": "object",
                "properties": {
                    "top_k": {"type": "integer", "default": 10},
                    "sample_type": {"type": "string"},
                },
            },
        },
    },
]

response = Generation.call(
    api_key=DASHSCOPE_API_KEY,
    model="qwen3.7-max",
    messages=messages,
    tools=AGENT_TOOLS,       # Qwen decides which to call
    result_format="message",
)

The result: different questions trigger different tool combinations. "What is this material?" → identify_material

explain_peaks

. "Deformulate this sample" → all three analytical tools. "Quick screening" → just match_library_topk

. The LLM decides, not the developer.

The agent runs a Think → Act → Observe loop, up to 6 iterations:

tool_calls

— which tools to invoke and with what parametersIn practice, most analyses complete in 2-3 iterations. Qwen's enable_thinking=True

mode shows the full chain-of-thought reasoning, so you can see why it chose each tool.

After the ReAct loop, the agent doesn't just return results. It runs two automated checks:

Confidence estimation — calculated from match scores, candidate score gaps, and functional group coverage:

def _estimate_confidence(self, session):
    scores = []
    id_result = session.tool_results.get("identify_material", {})
    if id_result.get("matches"):
        top_sim = id_result["matches"][0].get("similarity", 0)
        scores.append(top_sim)
        if len(id_result["matches"]) >= 2:
            gap = top_sim - id_result["matches"][1].get("similarity", 0)
            scores.append(min(1.0, gap * 5))  # larger gap = more confident

Evidence conflict detection — compares outputs across tools. If identify_material

says "PET" but assign_functional_groups

found no ester groups, that's a contradiction:

expected_groups = {
    "pet": ["ester", "c=o", "aromatic"],
    "nylon": ["amide", "n-h", "c=o"],
    "polyethylene": ["c-h", "ch2", "methylene"],
    "silicone": ["si-o", "si-c", "siloxane"],
}

When confidence < 0.75 or conflicts are detected, the agent automatically triggers a verification round. Qwen is told exactly what went wrong:

ISSUES DETECTED:
- functional_group_mismatch: material="pet", missing=["ester", "aromatic"]
- low_confidence: 0.62 (threshold: 0.75)

Qwen then autonomously calls additional tools to investigate. After verification, confidence is recalculated. In testing, I've seen confidence traces like [0.62, 0.84]

— a 35% improvement from one verification round.

When Qwen's structured JSON output fails to parse (it happens — LLMs sometimes wrap JSON in markdown code blocks), the error and original output are sent back to Qwen with context:

repair_messages = messages + [
    {"role": "assistant", "content": raw},
    {"role": "user", "content": f"Parse error: {raw[:200]!r}\nReturn ONLY valid JSON."},
]
raw_retry = self._call_qwen(repair_messages)

Near-100% recovery rate. No silent failures.

In regulated industries — pharmaceutical QC under FDA 21 CFR Part 11, forensic substance identification, environmental contaminant detection — an AI that returns wrong results without flagging uncertainty is dangerous. ChemSpectra Agent's self-verification turns "AI that gives answers" into "AI that checks its work." The confidence trace provides an audit trail that fits existing compliance frameworks.

All LLM reasoning — tool selection, synthesis, verification, self-repair, follow-up chat, report generation — runs through Alibaba Cloud's dashscope

SDK with qwen3.7-max

. Six distinct call sites, one provider.

source & further reading

dev.to — original article Claude Code Costs, Act III — The ecosystem of options for spending less Agentic AI: The Future of Autonomous Decision-Making in 2025 Claude Code Costs, Act II — Where the big hidden costs are

~/api · this article 200

$curl api.wpnews.pro/v1/news/building-a-self-verifyin…

Read original on dev.to → dev.to/bob_lee_2d29b22cee96eb65f/building-a-self…

mentioned entities

Qwen

Qwen-3.7-Max

ChemSpectra Agent

FTIR.fun

Qwen Cloud Hackathon

metadata

slugbuilding-a-self-verifying-ftir-agent-with-qwen-function-calling

topic#artificial-intelligence

secondary4 topics

sentimentpositive

canonicaldev.to

navigation

← prevAsian shares plunge as traders s…

next →Ludwig Spec Driven Development M…

── more in #artificial-intelligence 4 stories · sorted by recency

dev.to · 26 Jun · #artificial-intelligence

How I Built a Self-Verifying AI Agent with DynamoDB and ReAct Reasoning

dev.to · 26 Jun · #artificial-intelligence

Your team's real engineering record is the AI sessions you delete every day

dev.to · 26 Jun · #artificial-intelligence

When Your Coding Agent Needs a Scribe, Not a Memory Engine

dev.to · 26 Jun · #artificial-intelligence

The Day My Research Assistant Finally Got a Memory

── more on @qwen 3 stories trending now

wpnews · 19 Oct · #developer-tools

Windows Script to clean up and remove all ASUS software

wpnews · 1 Nov · #developer-tools

Custom Zig Test Runner, better ouput, timing display, and support for special "tests:beforeAll" and "tests:afterAll" tests

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required