# We built a coding harness that beats frontier models using open ones. It's in open beta.

> Source: <https://dev.to/jon_at_backboardio/we-built-a-coding-harness-that-beats-frontier-models-using-open-ones-its-in-open-beta-15g3>
> Published: 2026-06-06 21:42:46+00:00

Here is the bet we made: build software **memory-first, not model-first**, and it will outperform.

Everyone else is racing to wrap the next model. We did the opposite. We built the memory layer first, the routing first, tool-calling, now the recursive engine, then let the model be a swappable part.

Today that bet has a name: ** Backboard Development Studio**. It starts with the

The headline result? It beats frontier models using open ones. Keep reading, the numbers are below and there is a promo code at the bottom.

The beta is open. Two lines and you are running.

```
# macOS / Linux
curl -fsSL https://app.backboard.io/api/cli | bash

# Windows (PowerShell)
irm https://app.backboard.io/api/cli/windows | iex
```

Get your API key: [https://app.backboard.io](https://app.backboard.io)

Promo code: ** DEVTOCLI** for credit toward inference while you put it through its paces. Find the Promo submit in the top right corner of the billing page.

Model-first thinking says: pick the smartest model, prompt it well, hope it remembers.

Memory-first thinking says: give the system real persistence, real routing, real recall, and a "smaller" model will outwork a "smarter" one that forgets everything between turns.

We believed the second one. So we built it. The R-CLI is powered by our memory algorithms (the same ones that rank **#1 on LoCoMo and LongMemEval**) and runs on Backboard's unified API: memory, routing across **17,000+ models**, RAG, and stateful threads behind one key.

Then we tested it in public. That part did not go quietly.

Read that second line again. An open model, inside our harness, posting numbers that go toe to toe with Claude Code, at a fraction of the cost.

And to be clear: we are **not** the cheap open-source alternative. We run the full frontier lineup too. We just happen to beat frontier results with open models like GLM 5.1 and DeepSeek V4. Same harness, your choice of brain.

You do not have to pick one model. You can use two in a single task.

Try ** /expert mode**:

The expensive model architects. The fast cheap one ships. The harness orchestrates the handoff. Frontier reasoning where it counts, frontier-beating cost where it does not. One command.

Nobody else is selling that, because nobody else built memory and routing first.

We launched. A serious builder showed up in the comments and pushed back hard.

Well-tooled local repo. His own RAG, skills, memory, a knowledge graph he had clearly invested months in. He ran the CLI and came back with a fair verdict: "kind of specific, not super helpful for a setup like mine."

Serious builder. Serious objection. The strongest one a developer can make: **"I already hand-built the thing you are selling."**

Then one fact flipped the whole conversation.

**The R-CLI is stateful by default.**

The persistence he was hand-building? The session-priming file he writes and re-reads every time? The weekly cron jobs auditing how often his agents drift? The pre-commit hooks keeping them on the rails?

Native on our side. Not a layer you bolt on. The default behavior. That is what memory-first actually means in your terminal.

So for him it was never "adopt a whole new ecosystem." It was a harness swap: keep your own RAG, memory, and graph, drop the maintenance tax.

The thread went from "not for me" to "let me talk to your CLI lead." A demo call got booked. The objection did not get argued away. It got dissolved by a capability he did not know was there.

The lesson we took: the pitch was never "we are better." It was "you are doing by hand what we do by default." A developer handed us that line for free.

**Best in the world.** Performance is the bar, not a tagline. We ran benchmarks internally because we expect to be measured.

**Easiest to use.** One key. The same key for your R-CLI... well it unlocks: Memory, routing, multi-agent, parallel tool calls, all behind one integrated surface. No stitching eight services together and praying the glue holds.

**Most accessible.** Frontier coding quality, your choice of model to get there. Closed, open, or mixed in one workflow. GLM 5.1 and DeepSeek V4 are the proof, not the promise.

**People stay by choice.** Any model, your own embeddings, modular layers, your data exportable through real endpoints. No lock-in, no theatrics, no fear-mongering. If you stay, it is because the flexibility is unrivaled.

The R-CLI is the first surface of Backboard Development Studio. The IDE is close.

Same engine, same performance, plus multi-agent sessions, Pi extension integrations, and coding-theme skills pre-built. The CLI is the foundation. We nail the harness with the community first. Then the IDE lands on something already proven.

The best feedback we have gotten so far came from someone telling us we were wrong. He pushed, we answered, he booked a call, his team switched.

So: paste the command, claim your key, run ** DEVTOCLI**, and try to break it. Then drop a comment with what held up, what did not, and what your current setup still does better.

Memory-first or model-first. We made our bet. Come test it.

*Backboard.io is full-stack, model-agnostic AI infrastructure. Backboard Development Studio is our recursive coding environment, stateful by default, built on the unified API.*