Building JING: How a Carpenter Used Qwen Cloud to Create a Multi-Agent AI System for Blue-Collar Workers

wpnews.pro

Artificial Intelligence is radically changing our world, transforming how we work across every industry. As a carpenter with many years of experience, I love my trade, but I am also fascinated by technology. For a long time, I wondered: How can AI improve a traditional carpentry business? Can these two opposite worlds complement each other to increase productivity?

The answer is JING. I had been developing this concept for a while, but the real spark to build it came when I discovered the Qwen Cloud Hackathon Series 2026 organized by Devpost. It was the perfect opportunity to test my vision.

JING is a team of AI agents designed to handle the administrative burdens of a trade business—like budgeting, expense tracking, sourcing materials, and managing clients. By automating these time-consuming tasks, JING allows me to focus on the physical and creative aspects of my craft, such as designing better furniture and delivering faster results.

Having JING manage the logistics and present spectacular results has been transformative. But why stop at carpentry? My goal is to bring this concept to all artisans—electricians, plumbers, and welders—who work with their hands but need a powerful tool to streamline their workflow and boost productivity.

JING isn't just one AI model. It's a society of 8 specialized agents, each excelling at one specific task, with built-in debate and conflict resolution:

🧠 JING-MASTER: Strategic planner (Qwen-Max)

👷 JING-FOREMAN: Execution coordinator (Qwen-Plus)

👁️ JING-EYE: Vision specialist (Qwen-VL-Max)

📖 JING-SCRIBE: Documentation specialist (Qwen-Plus)

🧰 JING-KIT: Logistics specialist (Qwen-Plus)

💼 JING-STEWARD: Financial guardian (Qwen-Plus)

🔊 JING-VOICE: Voice interface (Qwen-Audio-Turbo)

⚖️ JING-REFEREE: Debate arbiter & consensus builder (Qwen-Plus) NEW

Each agent has a clear role, and they work together through task decomposition, structured debate, and coordination—it's the power of AI at the service of people who deal with diverse problems in their daily work, assisting them in finding the best solution.

I evaluated several AI providers, but Qwen Cloud stood out decisively for three critical reasons:

As a carpenter, I know the value of every dollar. When building JING, I needed an AI provider that wouldn't bankrupt me with API costs. Here's the real comparison:

Cost per 1M Input Tokens (USD): Provider Model Cost per 1M Tokens

Qwen Cloud Qwen-Plus $0.40

Qwen Cloud Qwen-Max $1.04

Qwen Cloud Qwen-VL-Max $1.50 OpenAI GPT-4o $5.00

OpenAI GPT-4o-mini $0.15 Anthropic Claude 3.5 Sonnet $3.00

Anthropic Claude 3 Haiku $0.25

Google Gemini 1.5 Pro $3.50

Google Gemini 1.5 Flash $0.075

The verdict: Qwen-Plus is 12.5x cheaper than GPT-4o and 7.5x cheaper than Claude 3.5 Sonnet, while delivering comparable or superior performance for technical tasks.

A complete 8-agent workflow costs ~$0.056 with Qwen Cloud (measured by our benchmark suite)

That's over 700 jobs on the $40 hackathon budget

The single-agent baseline (one Qwen-Max call) costs ~$0.151 and produces lower-quality results

For a technician doing 5 jobs per day, that's $0.28/day with JING vs $0.76/day for a monolithic approach — and JING delivers better, more complete results. For a small business, that's transformative.

Qwen-VL-Max is one of the best vision models available. For JING-EYE, this means it can identify appliance brands, model numbers, and specific components from real-world photos taken under a sink with a flashlight. The structured JSON output format makes it easy for downstream agents (SCRIBE, KIT) to consume the results without parsing free text.

Why this matters: When a plumber is under a sink with a flashlight trying to photograph a model number, they need AI that can work with imperfect images. Qwen-VL-Max delivers structured, actionable output.

JING-MASTER uses Qwen-Max to create execution plans with complex task dependencies, parallelization strategies, and fallback plans — all as validated JSON. The Pydantic validation layer ensures that even if the model produces slightly malformed output, it's caught before reaching the execution layer.

The bottom line: Qwen Cloud gave me the performance I needed at a price point that makes JING viable as a real product, not just a hackathon demo. At ~$0.056 per complete 8-agent workflow, a technician can run over 700 jobs on a $40 budget.

The hardest part wasn't the AI. It was the coordination. How do you get JING-EYE to pass its diagnosis to JING-SCRIBE in a format SCRIBE can use? How do you prevent JING-KIT from listing tools before SCRIBE finishes the procedure? And most importantly — how do you handle it when agents disagree?

The answer: Pydantic-validated execution plans with explicit dependencies, wave-based execution, and a dedicated debate phase.

// JING-MASTER creates this plan:

{

"execution_strategy": {

"parallel_groups": [

["T1"], // Wave 1: JING-EYE analyzes the problem

["T2", "T3"], // Wave 2: SCRIBE + KIT in parallel

["T4"], // Debate Phase: JING-REFEREE resolves conflicts

["T5"], // Wave 3: STEWARD generates budget

["T6"] // Wave 4: VOICE speaks response

]

}

}

Each wave waits for the previous wave to complete. Within a wave, agents run in parallel via asyncio.gather. The debate phase is the differentiator — it catches contradictions before they reach the financial layer.

The moment JING went from "cool demo" to "real product" was when I added JING-STEWARD. Most AI tools stop at diagnosis. JING goes all the way to financial management.

JING-STEWARD generates professional budgets, handles digital signatures with timestamp and geolocation, and produces post-job financial summaries with profitability grades (A-F), charts, and actionable insights.

For the artisan, this transforms JING from a "helpful tool" into a complete business operating system. The most important innovation for Track 3 is JING-REFEREE, our inter-agent debate mechanism. Here's the problem it solves:

In most multi-agent systems, agents pass data in a feed-forward pipeline. JING-EYE says "this is a Moen faucet," JING-SCRIBE trusts that, JING-KIT trusts that. But what if EYE is wrong? What if SCRIBE's manual contradicts EYE's visual diagnosis? In a traditional pipeline, errors propagate silently.

JING-REFEREE changes this. After EYE, SCRIBE, and KIT complete their work, REFEREE runs a structured debate:

Conflict Detection: REFEREE compares agent outputs for contradictions — brand/model mismatches, severity disagreements, low-confidence claims

Structured Debate: Using Qwen-Plus as a neutral arbiter, each agent's claims are evaluated against the original technician input and cross-referenced for consistency

Confidence-Weighted Consensus: REFEREE produces a reconciled output where each claim is weighted by adjusted confidence scores — specificity beats generality, evidence beats assertion

Self-Reflection: After every execution wave, FOREMAN checks quality and adapts the strategy if critical tasks have failed

In our benchmark testing, REFEREE successfully resolves ~95% of inter-agent conflicts, producing a consensus result that scores 23% higher on quality than any individual agent's output alone.

This is what makes JING a true agent society — not just agents that work together, but agents that challenge each other to be better.

One of the biggest limitations of AI agents is that they start from zero every time. A technician works for "John Smith" today, comes back in 3 months for another job, and JING has no memory of the first interaction.

Not anymore. JING's Memory Service records every client interaction, job diagnosis, cost, and profitability grade in persistent local storage:

{

"client_name": "John Smith",

"total_jobs": 3,

"total_spent": 520.00,

"preferred_payment": "credit card",

"past_jobs": [

{"diagnosis": "Moen cartridge failure", "profit": 125.00, "grade": "A"},

{"diagnosis": "Water heater leak", "profit": 210.00, "grade": "B"},

{"diagnosis": "Kitchen sink clog", "profit": 95.00, "grade": "A"}

]

}

When JING-STEWARD generates a budget for a returning client, it remembers their history and can adjust pricing — offering loyalty discounts for repeat clients or flagging patterns (e.g., "this client's jobs have an average profit of $143 — maintain that margin").

This transforms JING from a stateless tool into a learning system that becomes more valuable with every job.

Most AI tools for tradespeople follow the same pattern: you bring a problem, the AI diagnoses it, and you move on. Each interaction is a standalone event. But running a service business is not about single jobs — it's about the big picture.

That's where the Artisan Dashboard comes in. It's JING's command center — a real-time business intelligence dashboard that gives tradespeople a complete bird's-eye view of their entire operation. Think of it as the cockpit for their business.

Other AI tools help you with one job. JING helps you run your entire business.

Powered by JING's persistent memory, the dashboard aggregates every completed job, every client interaction, every financial detail into a single, beautiful interface:

📊 Stats Cards: Total jobs, active jobs, revenue, average profit per job, and average grade — all at a glance. Filter by trade to see plumber, electrician, or HVAC data separately.

🍩 Job Status Pie Chart: A donut chart showing completed, in-progress, and pending jobs. Instantly see your workload balance.

📈 Monthly Revenue Bar Chart: Track revenue trends over time and identify your most profitable months.

💰 Profit per Job Chart: Horizontal bars showing profit for the last 10 completed jobs. Spot which jobs are worth pursuing.

📋 Full Jobs Table: Sortable, filterable table with client contact info, job descriptions, color-coded status badges, grades, and amounts.

🤖 JING-STEWARD Suggestion Card: AI-powered, contextual financial advice based on the artisan's actual business data. Profit margins shrinking? The Steward will tell you. Grade averages dropping? It'll flag it.

🔧 Trade Filters: Toggle between All Trades, Plumber, Electrician, or HVAC — each filter recalculates every stat, chart, and suggestion in real time.

The dashboard transforms JING from a reactive diagnostic tool into a proactive business operating system. A plumber doesn't just get told what's wrong with a pipe — they get a weekly snapshot of their business health, with AI-driven suggestions for where to focus next.

In the demo, after completing any trade scenario, users are taken directly to a filtered dashboard view showing only that trade's data. The journey flows naturally: diagnose → budget → sign → complete → see your business grow.

No other AI assistant for tradespeople offers a business dashboard. This is what makes JING not just an assistant, but a true partner.

When JING-EYE analyzes an image, it doesn't just return text. It returns the same image with arrows, circles, and labels pointing exactly to the problem. Judges will see the faucet photo with a red circle around the defective valve and an arrow saying "Replace here."

This isn't just diagnostic—it's instructional. The artisan sees exactly what to do, reducing errors and training time.

A giant button in the app that activates JING-VOICE permanently. The technician can talk while working, and JING responds via audio without them touching the phone. Perfect for when their hands are covered in grease or they're up on a ladder.

Real-world scenario: A plumber under a sink asks, "What's the part number again?" JING responds: "Moen one-two-two-five. Repeat: one-two-two-five." No need to wipe hands, no need to look at the screen.

JING doesn't just tell you what tools to bring. Based on the technician's history and the type of job, it predicts what else you might need.

Example: "You're going to repair an HVAC unit. Bring your standard kit, but also bring a manifold gauge because 40% of repairs on this model require checking refrigerant pressure." This reduces return trips and increases first-time fix rates—a huge competitive advantage for any service business.

One of my proudest technical achievements is the MCP (Model Context Protocol) integration. By exposing JING's capabilities as MCP tools, external systems can interact with the agent society seamlessly.

await session.call_tool( "jing_full_diagnosis",

{"image_source": "faucet.jpg", "voice_text": "This Moen is dripping"}

)

This means Claude Desktop, Cursor, or any custom client can invoke JING's full 7-agent workflow with a single call. It's not just a standalone app—it's a platform.

To validate the multi-agent approach, I built a benchmark suite (scripts/benchmark.py) that runs 10 trials of both JING (8 agents) and a single-agent baseline (one Qwen-Max model doing everything in one prompt). Here are the results:

Metric JING (8 Agents) Single-Agent Improvement ⏱️ Response Time ~11,066 ms ~45,205 ms 4.1x faster

🎯 Quality Score 92.5/100 69.0/100 34% better

💰 Cost per Job ~$0.056 ~$0.151 2.7x cheaper

📋 Completeness 100% 60% 67% more complete

Run the benchmark yourself: uv run python scripts/benchmark.py

These are verifiable, reproducible numbers — every metric is generated by the benchmark script, not estimated or claimed. The multi-agent architecture wins decisively on every dimension because specialized agents working in parallel + debating conflicts outperform a monolithic model on every metric.

JING is more than a hackathon project. It's a blueprint for how multi-agent systems can empower the real-world workforce. What we've already shipped in Phase 2:

Persistent memory: ✅ Every client interaction, diagnosis, and profitability grade is saved and recalled across sessions. JING learns from every job.

Agent debate & consensus: ✅ JING-REFEREE resolves inter-agent conflicts before they reach the financial layer.

Self-reflection & fallbacks: ✅ FOREMAN checks quality after every wave and executes fallback handlers when tasks fail.

Artisan Dashboard: ✅ Business intelligence dashboard with real-time stats, charts, and AI-powered financial suggestions — your entire business in one view.

Production deployment: ✅ Docker multi-stage build + docker-compose for Alibaba Cloud deployment.

108 automated tests: ✅ Unit tests for all models, agents, memory, and API endpoints.

The Phase 3 roadmap includes:

Mobile native app: React Native for true field-ready experience

Payment integration: Stripe for in-app invoicing and collection

Supplier integrations: Direct ordering from Home Depot, Lowe's, Amazon

Marketplace: Technicians sharing and rating repair procedures

Dashboard enhancements: Custom date ranges, export to PDF, multi-business support, forecasting with ML

AR training mode: Step-by-step visual guides overlaid on real equipment

Building JING taught me that the most powerful AI systems aren't those that replace humans—they're those that amplify human expertise.

As a carpenter, I know that the artisan's hands will always shape the world. But those hands deserve support. They deserve tools that handle the boring stuff, that catch what the eye might miss, that turn a day of frustration into a day of mastery.

JING is here to shape the artisan's day.

source & further reading

dev.to — original article Duckassist Your AI Agent just leaked your Stripe key. Here's how to stop it before the commit. Who Grades the Grader? Your LLM Judge Is an Unvalidated Model in Production

Building JING: How a Carpenter Used Qwen Cloud to Create a Multi-Agent AI System for Blue-Collar Workers

Run your AI side-project on zahid.host