{"slug": "meet-openjarvis-a-local-first-framework-for-on-device-personal-ai-agents-with", "title": "Meet OpenJarvis: A Local-First Framework for On-Device Personal AI Agents with Tools, Memory, and Learning", "summary": "Researchers at Stanford University and Lambda Labs released OpenJarvis, an open-source framework that runs AI inference, agents, memory, and learning entirely on-device. The framework achieves performance within 3.2 percentage points of leading cloud models while reducing API costs by roughly 800 times and latency by about four times per query. OpenJarvis uses a modular architecture of five swappable primitives and an LLM-guided spec search that jointly optimizes across components, recovering 13 to 32 percentage points of the cloud-local performance gap at significantly lower optimization cost.", "body_md": "Researchers at Stanford University and Lambda Labs, have published the [research paper for OpenJarvis](https://arxiv.org/pdf/2605.17172v1), an open-source framework that runs inference, agents, memory, and learning entirely on-device.\n\nThe open-weight models configured through OpenJarvis land within 3.2 percentage points of the best cloud model on average, at roughly 800× lower marginal API cost per query and roughly 4× lower latency under the research’s benchmark protocol. This research work builds on the research team’s earlier [ Intelligence Per Watt study](https://arxiv.org/pdf/2511.07885), which reported that local models already handle 88.7% of single-turn chat and reasoning queries at interactive latency, with intelligence efficiency improving 5.3× from 2023 to 2025.\n\n**Model Overview & Access**\n\nOpenJarvis is not a single model. It is a framework that composes any supported model with a configurable agent stack, evaluated across 11 local models from four families.\n\n| Property | Value |\n|---|---|\nLicense | Apache 2.0 |\nFramework release | March 12, 2026 |\nPaper | arXiv:2605.17172 (posted May 16, 2026) |\nRepository | github.com/open-jarvis/OpenJarvis |\nStars / forks | ~5.4k / ~1.2k (June 2026) |\nLanguages | Python (~83%), Rust (~9%), TypeScript (~7%) |\nEvaluated models | 11 local models across 4 families: Qwen3.5, Gemma4, Nemotron, Granite |\nCloud baselines | Claude Opus 4.6, GPT-5.4, Gemini 3.1 Pro |\nSupported engines | Ollama, vLLM, SGLang, llama.cpp, Apple Foundation Models, Exo (among others) |\nContext window | Model-dependent |\nInstallation | Single command; ~3 minutes on broadband |\nHardware | Tested on 7 platforms, from Mac Mini M4 to NVIDIA DGX Spark |\n\n**Architecture: Five Primitives and a Spec**\n\nOpenJarvis decomposes a personal AI system into five typed primitives, composed through a single declarative configuration object called a **spec**.\n\n**Intelligence**— the model, weights, generation parameters, and quantization format.** Engine**— the inference runtime (Ollama, vLLM, SGLang, etc.), batching, KV-cache settings, and hardware path.** Agents**— the reasoning loop (ReAct or CodeAct), system prompts, tool-use policy, and turn limits.** Tools & Memory**— external interfaces, retrieval backends, 25+ data connectors, and 32+ messaging channels, with native MCP support and interchangeable memory backends.**Learning**— the optimizer that updates the spec from traces. This slot accepts LoRA, DSPy, GEPA, or LLM-guided spec search.\n\nEach primitive is independently swappable, and a spec serializes all five into a TOML file. Two specs can share the same agent and tool configuration and differ only in model and engine, so the same behavior runs on a Mac Mini and a workstation without rewriting prompts.\n\n**LLM-guided spec search** is the second contribution. It is a local–cloud collaboration: a frontier cloud model acts as a teacher at search time, reading traces, diagnosing failure clusters, and proposing edits across Intelligence, Engine, Agents, and Tools & Memory. An edit is accepted only if it improves the target failure cluster without causing meaningful regressions elsewhere — the research team calls this the **gate** (default tolerance 1%). The optimized spec then runs entirely on-device at inference time, with zero cloud calls. The teacher is used only at search time; at 100 queries per day, the amortized teacher cost falls below $0.001 per query within six months.\n\nPrior work (GEPA, DSPy, LoRA) optimizes one primitive at a time, and prompt optimizers alone recover only about 5 pp of the cloud–local gap. LLM-guided spec search recovers 13–32 pp because it edits across primitives jointly, at 7–11× lower optimization cost than single-primitive baselines. The four-primitive move space contributes 5.5–16.5 pp, and the LLM proposer adds about 10 pp on average over an evolutionary search at the same move space.\n\n**Capabilities & Performance**\n\nOpenJarvis was evaluated across 8 benchmarks spanning 508 tasks: tool calling (ToolCall-15), agentic workflows (PinchBench), coding (LiveCodeBench), customer service (τ-Bench V2, τ²-Bench Telecom), general assistance (GAIA), and deep research (LiveResearchBench, DeepResearchBench).\n\n**The swap test**: Replacing the intended cloud model with Qwen3.5-9B in existing frameworks (OpenClaw, Hermes Agent) drops accuracy by 25–39 pp. With the same model under an OpenJarvis spec, the residual drop shrinks to 5.6–16.5 pp — recovering 56–77% of the portability loss.\n\n**The accuracy frontier**: The best single local model, Qwen3.5-122B, reaches 80.3% average accuracy versus Claude Opus 4.6 at 83.5% — a 3.2 pp gap. Local specs match or exceed cloud on 4 of 8 benchmarks: ToolCall-15, PinchBench, LiveCodeBench, and τ-Bench V2.\n\n**Cost and latency**: Local configurations form the accuracy–efficiency frontier. Qwen3.5-122B delivers its 80.3% at roughly a thousandth of a cent per query, versus $0.009 per query for Claude Opus 4.6 — an approximately 800× marginal API-cost advantage. End-to-end latency drops by roughly 4× on the agentic workloads, though the paper notes single-shot prompts can favor cloud serving.\n\n**Search gains**: LLM-guided spec search improves the Qwen3.5-9B student to 100% on PinchBench, 83% on LiveCodeBench, and 91% on LiveResearchBench. Across the full eight-benchmark suite, average gains per student model range from 13.1 to 31.5 pp. The authors report that these gains survive their robustness checks (reward-weight variants, search-seed variance, and random restarts).\n\n**How to Use it**\n\nInstallation is one command. On macOS, Linux, or WSL2:\n\n```\ncurl -fsSL https://open-jarvis.github.io/OpenJarvis/install.sh | bash\n```\n\nWindows users run an equivalent PowerShell script (`irm … | iex`\n\n). The installer provisions `uv`\n\n, a Python virtual environment, Ollama, and a starter model in about three minutes on broadband. A desktop GUI ships as a `.dmg`\n\n, `.exe`\n\n, `.deb`\n\n, `.rpm`\n\n, or `.AppImage`\n\nfrom the releases page.\n\nAfter install, `jarvis`\n\nstarts a chat session. Starter presets cover common workflows:\n\n```\njarvis init --preset morning-digest-mac    # daily briefing with TTS\njarvis init --preset deep-research         # multi-hop research with citations\njarvis init --preset code-assistant        # agent with code execution and shell access\njarvis init --preset scheduled-monitor     # stateful agent on a schedule\n```\n\nThe framework ships with eight built-in agents across three execution modes — on-demand, scheduled, and continuous. It connects to 25+ data sources (Gmail, Calendar, iMessage, Notion, Obsidian, Slack, GitHub, and others) and exposes agents over 32+ messaging channels (WhatsApp, Telegram, Discord, iMessage, Signal, and others).\n\nSkills can be imported from external catalogs — about 150 from Hermes Agent and about 13,700 community skills from OpenClaw — all following the agentskills.io specification. A `jarvis optimize skills --policy dspy`\n\ncommand refines them from local trace history.\n\n**Marktechpost’s Visual Explainer**\n\n[marktechpost.com](https://www.marktechpost.com)\n\n**Key Takeaways**\n\n- OpenJarvis runs inference, agents, memory, and learning fully on-device, landing within 3.2 pp of the best cloud model at ~800× lower marginal API cost and ~4× lower latency.\n- A typed \"spec\" decomposes the stack into five swappable primitives — Intelligence, Engine, Agents, Tools & Memory, and Learning — serialized to portable TOML.\n- LLM-guided spec search uses a frontier cloud model as a search-time teacher to recover 13–32 pp of the cloud–local gap at 7–11× lower optimization cost, then runs locally with zero cloud calls.\n- Local specs match or exceed cloud on 4 of 8 benchmarks (ToolCall-15, PinchBench, LiveCodeBench, τ-Bench V2); the remaining gap concentrates on reasoning- and research-heavy tasks.\n\nCheck out the [ Paper ](https://arxiv.org/pdf/2605.17172v1)and\n\n**Also, feel free to follow us on**\n\n[Repo](https://github.com/open-jarvis/OpenJarvis).**and don’t forget to join our**[Twitter](https://x.com/intent/follow?screen_name=marktechpost)\n\n**and Subscribe to**\n\n[150k+ ML SubReddit](https://www.reddit.com/r/machinelearningnews/)**. Wait! are you on telegram?**\n\n[our Newsletter](https://www.aidevsignals.com/)\n\n[now you can join us on telegram as well.](https://t.me/machinelearningresearchnews)Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? [Connect with us](https://forms.gle/wbash1wF6efRj8G58)", "url": "https://wpnews.pro/news/meet-openjarvis-a-local-first-framework-for-on-device-personal-ai-agents-with", "canonical_source": "https://www.marktechpost.com/2026/06/03/meet-openjarvis-a-local-first-framework-for-on-device-personal-ai-agents-with-tools-memory-and-learning/", "published_at": "2026-06-04 06:23:10+00:00", "updated_at": "2026-06-04 06:28:10.345541+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "ai-agents", "ai-research"], "entities": ["Stanford University", "Lambda Labs", "OpenJarvis", "Qwen3.5", "Gemma4", "Nemotron", "Granite", "Ollama"], "alternates": {"html": "https://wpnews.pro/news/meet-openjarvis-a-local-first-framework-for-on-device-personal-ai-agents-with", "markdown": "https://wpnews.pro/news/meet-openjarvis-a-local-first-framework-for-on-device-personal-ai-agents-with.md", "text": "https://wpnews.pro/news/meet-openjarvis-a-local-first-framework-for-on-device-personal-ai-agents-with.txt", "jsonld": "https://wpnews.pro/news/meet-openjarvis-a-local-first-framework-for-on-device-personal-ai-agents-with.jsonld"}}