{"slug": "building-a-self-verifying-ftir-agent-with-qwen-function-calling", "title": "Building a Self-Verifying FTIR Agent with Qwen Function Calling", "summary": "A developer built ChemSpectra Agent, an FTIR spectral analysis system using Qwen-3.7-Max function calling, for the Qwen Cloud Hackathon. The agent autonomously selects from five analysis tools, cross-validates evidence, and triggers self-verification when confidence is low or conflicts are detected. The system demonstrates an agent that checks its own work, catching errors that single-pass analysis misses.", "body_md": "*Built for Track 4: Autopilot Agent — #QwenCloudHackathon*\n\nMost AI \"agents\" are API wrappers with a system prompt. Upload data, call one endpoint, return the result. No verification, no reasoning about what went wrong, no ability to self-correct.\n\nFor the Qwen Cloud Hackathon, I built **ChemSpectra Agent** — an FTIR spectral analysis system where Qwen-3.7-Max autonomously selects tools, cross-validates evidence across multiple results, and triggers self-verification when confidence is low. The key insight: an agent that checks its own work catches errors that single-pass analysis misses.\n\nThe agent has access to 5 analysis tools, each hitting a different endpoint of the FTIR.fun spectral library (130,000+ reference spectra):\n\n| Tool | Purpose |\n|---|---|\n`identify_material` |\nMatch spectrum against reference library, return ranked candidates |\n`explain_peaks` |\nExplain what chemical bond vibration each peak represents |\n`assign_functional_groups` |\nMap peaks to functional groups (C=O, O-H, N-H, etc.) |\n`match_library_topk` |\nRapid top-K screening without deep analysis |\n`search_public_results` |\nSearch publicly shared analysis cases (via MCP) |\n\nInstead of hardcoding which tools to call, I define these as Qwen Function Calling schemas and let the model decide:\n\n```\nAGENT_TOOLS = [\n    {\n        \"type\": \"function\",\n        \"function\": {\n            \"name\": \"identify_material\",\n            \"description\": \"Match spectrum against 130,000+ reference spectra...\",\n            \"parameters\": {\n                \"type\": \"object\",\n                \"properties\": {\n                    \"top_k\": {\"type\": \"integer\", \"default\": 10},\n                    \"sample_type\": {\"type\": \"string\"},\n                },\n            },\n        },\n    },\n    # ... 4 more tools\n]\n\nresponse = Generation.call(\n    api_key=DASHSCOPE_API_KEY,\n    model=\"qwen3.7-max\",\n    messages=messages,\n    tools=AGENT_TOOLS,       # Qwen decides which to call\n    result_format=\"message\",\n)\n```\n\nThe result: different questions trigger different tool combinations. \"What is this material?\" → `identify_material`\n\n+ `explain_peaks`\n\n. \"Deformulate this sample\" → all three analytical tools. \"Quick screening\" → just `match_library_topk`\n\n. The LLM decides, not the developer.\n\nThe agent runs a Think → Act → Observe loop, up to 6 iterations:\n\n`tool_calls`\n\n— which tools to invoke and with what parametersIn practice, most analyses complete in 2-3 iterations. Qwen's `enable_thinking=True`\n\nmode shows the full chain-of-thought reasoning, so you can see *why* it chose each tool.\n\nAfter the ReAct loop, the agent doesn't just return results. It runs two automated checks:\n\n**Confidence estimation** — calculated from match scores, candidate score gaps, and functional group coverage:\n\n``` python\ndef _estimate_confidence(self, session):\n    scores = []\n    id_result = session.tool_results.get(\"identify_material\", {})\n    if id_result.get(\"matches\"):\n        top_sim = id_result[\"matches\"][0].get(\"similarity\", 0)\n        scores.append(top_sim)\n        if len(id_result[\"matches\"]) >= 2:\n            gap = top_sim - id_result[\"matches\"][1].get(\"similarity\", 0)\n            scores.append(min(1.0, gap * 5))  # larger gap = more confident\n    # ... more signals from other tools\n```\n\n**Evidence conflict detection** — compares outputs across tools. If `identify_material`\n\nsays \"PET\" but `assign_functional_groups`\n\nfound no ester groups, that's a contradiction:\n\n```\nexpected_groups = {\n    \"pet\": [\"ester\", \"c=o\", \"aromatic\"],\n    \"nylon\": [\"amide\", \"n-h\", \"c=o\"],\n    \"polyethylene\": [\"c-h\", \"ch2\", \"methylene\"],\n    \"silicone\": [\"si-o\", \"si-c\", \"siloxane\"],\n}\n# If 2+ expected groups are missing → conflict\n```\n\nWhen confidence < 0.75 or conflicts are detected, the agent automatically triggers a verification round. Qwen is told exactly what went wrong:\n\n```\nISSUES DETECTED:\n- functional_group_mismatch: material=\"pet\", missing=[\"ester\", \"aromatic\"]\n- low_confidence: 0.62 (threshold: 0.75)\n```\n\nQwen then autonomously calls additional tools to investigate. After verification, confidence is recalculated. In testing, I've seen confidence traces like `[0.62, 0.84]`\n\n— a 35% improvement from one verification round.\n\nWhen Qwen's structured JSON output fails to parse (it happens — LLMs sometimes wrap JSON in markdown code blocks), the error and original output are sent back to Qwen with context:\n\n```\nrepair_messages = messages + [\n    {\"role\": \"assistant\", \"content\": raw},\n    {\"role\": \"user\", \"content\": f\"Parse error: {raw[:200]!r}\\nReturn ONLY valid JSON.\"},\n]\nraw_retry = self._call_qwen(repair_messages)\n```\n\nNear-100% recovery rate. No silent failures.\n\nIn regulated industries — pharmaceutical QC under FDA 21 CFR Part 11, forensic substance identification, environmental contaminant detection — an AI that returns wrong results without flagging uncertainty is dangerous. ChemSpectra Agent's self-verification turns \"AI that gives answers\" into \"AI that checks its work.\" The confidence trace provides an audit trail that fits existing compliance frameworks.\n\nAll LLM reasoning — tool selection, synthesis, verification, self-repair, follow-up chat, report generation — runs through Alibaba Cloud's `dashscope`\n\nSDK with `qwen3.7-max`\n\n. Six distinct call sites, one provider.", "url": "https://wpnews.pro/news/building-a-self-verifying-ftir-agent-with-qwen-function-calling", "canonical_source": "https://dev.to/bob_lee_2d29b22cee96eb65f/building-a-self-verifying-ftir-agent-with-qwen-function-calling-5he9", "published_at": "2026-06-26 04:40:16+00:00", "updated_at": "2026-06-26 05:33:49.664899+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-agents", "ai-tools", "developer-tools"], "entities": ["Qwen", "Qwen-3.7-Max", "ChemSpectra Agent", "FTIR.fun", "Qwen Cloud Hackathon"], "alternates": {"html": "https://wpnews.pro/news/building-a-self-verifying-ftir-agent-with-qwen-function-calling", "markdown": "https://wpnews.pro/news/building-a-self-verifying-ftir-agent-with-qwen-function-calling.md", "text": "https://wpnews.pro/news/building-a-self-verifying-ftir-agent-with-qwen-function-calling.txt", "jsonld": "https://wpnews.pro/news/building-a-self-verifying-ftir-agent-with-qwen-function-calling.jsonld"}}