I Ran Five Small Multimodal Models on a Jetson. The Fastest One Was Not the Best Baseline.

wpnews.pro

cd /news/artificial-intelligence/i-ran-five-small-multimodal-models-o… · home › topics › artificial-intelligence › article

[ARTICLE · art-32012] src=dev.to ↗ pub=2026-06-18T02:45Z topic=artificial-intelligence verified=true sentiment=· neutral

I Ran Five Small Multimodal Models on a Jetson. The Fastest One Was Not the Best Baseline.

A developer building WearEdge Pro, a wearable industrial edge AI runtime, tested five small multimodal models on a Jetson device to find the best baseline for an industrial edge agent. Gemma 4 E2B emerged as the best product baseline due to its reliability and workflow compliance, while Qwen2.5-VL proved a strong challenger for OCR-heavy tasks. SmolVLM2 was fastest but lacked grounding, and InternVL3 was too slow and risky for baseline use.

read3 min views33 publishedJun 18, 2026

I have been building WearEdge Pro, a wearable industrial edge AI runtime. Think of a frontline operator wearing a smart-glasses device, capturing a first-person image of a machine, and getting back a structured action card from a local Jetson box.

The key phrase is "structured action card." This is not a chat demo. In a factory setting, an answer needs an audit trail, a mode boundary, a human-confirmation gate, and a way to hand off to maintenance, quality, EHS, or work-instruction workflows.

I recently tested five compact multimodal models on the same Jetson path:

The goal was not to crown a universal benchmark champion. I wanted to know which model was the best current baseline for an industrial edge Agent runtime.

Every model was exposed through a local OpenAI-compatible llama.cpp endpoint on the Jetson. Each model got the same five prompts and images:

The main run used 560 image tokens, which matches the current WearEdge gateway budget. Qwen2.5-VL also got a 1024-image-token pass because grounding can improve with more visual tokens.

Model	Completion	Avg latency	Takeaway
Gemma 4 E2B	5/5	37.51s raw	Best product baseline
Qwen2.5-VL-3B	5/5	39.72s	Best OCR challenger
SmolVLM2-2.2B	5/5	12.84s	Fastest, but weak grounding
InternVL3-2B	5/5 only after ctx4096	80.35s	Too slow/risky for baseline
Qwen2.5-Omni-3B	5/5	50.09s	Interesting future audio/video branch

SmolVLM2 was the speed star. But the answers were often too generic for real operator guidance. In changeover and work-instruction tasks, it returned fields that looked more like placeholders than grounded industrial guidance.

Qwen2.5-VL was the most impressive challenger. It nailed a changeover OCR task with LABELER-FL1

and SKU-C500

, where Gemma had a machine-label typo. It also produced useful IQC defect scores. If I were building a pure OCR or visual inspection assistant, I would take Qwen very seriously.

InternVL3 reminded me that token speed is not the whole story. At 2048 context it failed three of five tasks with context errors. At 4096 context it finished, but the latency was high and one raw IQC answer had unsafe release-style wording.

Qwen2.5-Omni ran cleanly, but its strongest value is probably a future audio/video workflow rather than this current image+text industrial baseline.

Gemma 4 E2B did not win every micro-test. It stayed the baseline because it fit the product runtime:

In an industrial setting, "fast and fluent" is not enough. The model has to behave inside a system that can say: this came from this image, this route, this required field, this action boundary, and this audit record.

That is why Gemma remained the WearEdge baseline, while Qwen2.5-VL became the serious A/B challenger for OCR-heavy branches.

Edge AI model selection is not just a leaderboard exercise. The right question is:

Can this model run locally, understand the evidence, obey the workflow boundary, and produce an action that the system can audit?

For WearEdge Pro today, the answer is Gemma 4 E2B as the baseline, Qwen2.5-VL as the next challenger, and a clear path to keep testing without pretending every benchmark cell means the same thing.

Public artifact link: Benchmark results and public discussion: [https://www.hackster.io/ryanon2008/wearedge-pro-jetson-edge-ai-agent-50ec35](https://www.hackster.io/ryanon2008/wearedge-pro-jetson-edge-ai-agent-50ec35)

source & further reading

dev.to — original article I Gave an AI Two Empty Servers and One Prompt (Kimi K3) Stop Prompt Engineering, Start Context Engineering EU AI Act Article 50: What the 2026 Transparency Rules Mean for AI Teams

~/api · this article 200

$curl api.wpnews.pro/v1/news/i-ran-five-small-multimo…

Read original on dev.to → dev.to/ryan_hsu_wearedge/i-ran-five-small-multim…

mentioned entities

WearEdge Pro

Jetson

Gemma 4 E2B

Qwen2.5-VL

SmolVLM2

InternVL3

Qwen2.5-Omni

llama.cpp

metadata

slugi-ran-five-small-multimodal-models-on-a-jetson-the-fastest-one-was-not-the-best

topic#artificial-intelligence

secondary3 topics

sentimentneutral

canonicaldev.to

navigation

← prevGet – tiny agent that gets thing…

next →NEON-CITY/CosySim Local agent si…

── more in #artificial-intelligence 4 stories · sorted by recency

arxiv.org · 3 Aug · #artificial-intelligence

SCMA: Structure-Conditioned and Metal-Aware Flow Matching for CT Metal Artifact Reduction

arxiv.org · 3 Aug · #artificial-intelligence

WaiT for the Signal: Simple Frequency-Aware Flow-Matching

arxiv.org · 3 Aug · #artificial-intelligence

Predicting Steel Fatigue Life from Micrographs Using Physics-Informed Deep Learning

arxiv.org · 3 Aug · #artificial-intelligence

Adjudicated Captioning: Multi-Agent Alignment Scoring and Consensus-Distilled Beam Arbitration for Strict Zero-Shot Image Captioning

── more on @wearedge pro 3 stories trending now

wpnews · 2 Aug · #artificial-intelligence

I Ran 8 AI APIs Through the Same 50 Prompts — Here's the Real Cost Breakdown

wpnews · 2 Aug · #developer-tools

Agent-Browser – Browser Automation for AI

wpnews · 2 Aug · #artificial-intelligence

Payment Rail vs. Settlement Layer: What AEON's Coinbase x402 Partnership Actually Validates

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required