# Show HN: Prismag – Per-block model routing for the terminal and any IDE

> Source: <https://github.com/rufus-SD/prismag>
> Published: 2026-06-22 09:00:55+00:00

**One prompt enters. Each block routes to the right model.**

Tag any block with `@@model`

and PRISMAG sends it to the model you chose —
planning to Opus, implementation to Composer, summaries to a fast model —
without switching the IDE picker or juggling chats.

```
prismag> @@opus: design the auth flow   @@composer: implement the middleware

  ── @@opus → claude-4.6-opus-high-thinking ───────────────────────────
  Use short-lived access tokens with rotating refresh tokens because…

  ── @@composer → composer-2.5-fast ───────────────────────────────────
  // middleware/auth.go
  func RequireAuth(next http.Handler) http.Handler { … }

  routed 2 blocks · chained · 1.8s
```

Today's AI coding tools force a binary choice:

- Pick
**one model** for the whole conversation, or - Open
**multiple chats** and split the work by hand.

Neither matches how you actually work. Planning wants depth (Opus). Implementation wants speed (Composer). Review wants a different lens entirely.

| Without PRISMAG | With PRISMAG |
|---|---|
| One model per chat | A model per block, in one prompt |
| Switch the picker between tasks | `@@opus:` … `@@composer:` … and go |
| Manual context copy-paste between chats | Output of block N chains into block N+1 |
| Auto-routing by cost/latency (OpenRouter) | You choose the model per block |
| YAML/Python pipelines (LangGraph/CrewAI) | Chat-native `@@` syntax, zero config |

```
Prompt with @@tags ──▶ parser ──▶ orchestrator ──▶ model backends ──▶ sectioned result
                                       ▲
                                       └── ContextStore (in-memory · or maind)
```

- The trigger is
, not`@@`

`@`

— a bare`@`

collides with the IDE's mention menu.`@@`

travels as plain text through every chat surface. - Routing is
**deterministic** and owned by the CLI +`registry.yaml`

. - Blocks run
**serial + chained** by default (output N → context N+1), or`--parallel`

for independent blocks. - Context flows through a pluggable store — in-memory by default, or
[maind](https://github.com/rufus-SD/maind)for encrypted, cross-session memory.

```
# Go 1.26+
go install github.com/rufus-SD/prismag@latest

# or clone and build
git clone https://github.com/rufus-SD/prismag.git
cd prismag && make install
# 1. Guided onboarding — environment, optional API keys, model discovery, registry
prismag setup

# 2. Wire routing into your editor (auto-detects the tool)
prismag init

# 3. Route a prompt
prismag run "@@opus: plan the cache layer" "@@composer: implement it"
```

Or just run `prismag`

with no args to drop into the interactive `prismag>`

session.

PRISMAG works in **two ways**, from the same global config:

**CLI / REPL**— runs in any terminal, on any OS. Executes each block via provider APIs using your keys. Universal, deterministic.** In your IDE**—`prismag init`

writes a rule that teaches the agent to route`@@`

blocks through PRISMAG. Where the IDE supports per-task subagents, each block is dispatched to its own subagent + model using your subscription (no API keys needed).

| Editor | Rule file | Dispatch |
|---|---|---|
Cursor |
`.cursor/rules/prismag-routing.mdc` + `.cursor/agents/` |
subagents (any model) |
Claude Code |
`CLAUDE.md` + `.claude/agents/` |
subagents (Claude) + API fallback |
Windsurf |
`.windsurf/rules/prismag-routing.md` |
runs via `prismag run` |
GitHub Copilot |
`.github/copilot-instructions.md` |
runs via `prismag run` |
Cline |
`.clinerules/prismag-routing.md` |
runs via `prismag run` |
Roo Code |
`.roo/rules/prismag-routing.md` |
runs via `prismag run` |
Aider |
`CONVENTIONS.md` |
runs via `prismag run` |
generic |
`.prismag/rules.md` |
runs via `prismag run` |

```
prismag connect cursor      # or: claude, windsurf, copilot, cline, roo, aider, generic
```

Subagent dispatch gives true per-block model switching where the editor exposes
it (Cursor, Claude Code). Everywhere else, the agent runs `prismag run`

and shows
the sectioned output verbatim — same routing, same result.

```
@@<alias>: <task>
context shared with every block goes here, before the first tag

@@opus: review the security implications of this auth module
@@composer: write the unit tests for AuthService
@@fast: summarize the diff in 3 bullets
```

`@@alias`

is case-insensitive and maps to a model via`registry.yaml`

.- Text before the first
`@@`

is shared context for all blocks. **Serial + chained** by default;`--parallel`

for independent blocks.- Chained runs fail fast; parallel runs tolerate partial failure.

```
aliases:
  opus:
    model: claude-opus-4-6        # concrete id + offline fallback
    match: claude-opus-4-6        # family resolved against the live model list
    provider: anthropic
    agent: opus-planner           # subagent used when routing in-IDE
    description: Deep reasoning, architecture, security review
  composer:
    model: composer-2.5-fast
    provider: cursor
    agent: composer-implementer
    description: Fast implementation, multi-file edits
  fast:
    model: gpt-5.3-codex
    provider: openai
    description: Cheap, quick summaries and simple transforms
```

Two optional top-level keys remove friction for everyday use:

```
default: opus4.8       # untagged prompts route here, so `prismag "do X"` needs no @@tag
exec:                  # CLI tool-loop defaults — set permissions once, no flags per run
  enabled: true        # let blocks act on this machine (write files, …)
  shell: true          # also allow run_shell
  approve: ask         # ask = confirm each action y/N (default) · auto = no prompt
  # root: ~/Desktop    # optional: confine file actions to one tree
```

The same model has a different id in every context — `claude-opus-4-8`

on the
Anthropic API, `claude-opus-4-8-thinking-high`

in Cursor, a local tag in Ollama.
Pinning one string breaks the moment a provider renames or bumps a model.

So PRISMAG treats an alias as a **family** and resolves it to a currently-valid id
from the live model list for the active context (queried with your keys in the
CLI, cached 12h; the agent-maintained cache in the IDE). It picks the best match
deterministically, self-heals across renames, and falls back to the pinned `model`

when offline. Set `match:`

to make the family explicit; otherwise `model`

doubles
as it. Inspect what's available any time with `prismag models`

.

| Command | What it does |
|---|---|
`prismag` |
Interactive `prismag>` session (or onboarding on first run) |
`prismag setup` |
First-time setup: keys, model discovery, starter registry |
`prismag init [tool]` |
Wire routing into this project (auto-detects the editor) |
`prismag connect <tool>` |
Write the integration rule (+ subagents where supported) |
`prismag run "@@..."` |
Route and execute a tagged prompt (untagged → `default:` alias; `--exec` /`exec:` lets blocks act) |
`prismag route "@@..."` |
Show the delegation plan without executing (`--json` too) |
`prismag list` |
List `@@aliases` with availability marks |
`prismag models` |
Show models available right now |
`prismag doctor` |
Diagnose keys, registry, and environment |
`prismag sessions` |
List saved REPL session transcripts |
`prismag resume [id]` |
Reopen a past session with its context |

PRISMAG calls provider APIs **directly** — keys go straight to the vendor, never
to a gateway. Keys are read from the environment, a `~/.config/prismag/.env`

, or
stored encrypted in [maind](https://github.com/rufus-SD/maind) when present.

```
ANTHROPIC_API_KEY only:            + OPENAI_API_KEY:
  @@opus      ✓ ready                @@opus      ✓ ready
  @@fast      ✗ needs OPENAI_API_KEY @@fast      ✓ ready
```

Inside an IDE that dispatches subagents, blocks route via your subscription — no API keys required.

Route any block to a model running on your own machine — no API key, no cloud,
$0 per token. Both [Ollama](https://ollama.com) and
[vLLM](https://github.com/vllm-project/vllm) expose an OpenAI-compatible API, so
PRISMAG talks to them natively (streaming included).

```
ollama pull qwen2.5-coder:7b        # serves on http://localhost:11434
aliases:
  local:
    model: qwen2.5-coder:7b
    provider: ollama                # or: vllm
    # base_url: http://localhost:11434/v1   # optional override
    description: Local model — private, free, offline
prismag run "@@local: refactor this function"   # runs entirely on your box
```

Endpoints default to `http://localhost:11434/v1`

(Ollama) and
`http://localhost:8000/v1`

(vLLM); override per-alias with `base_url`

or globally
with `OLLAMA_BASE_URL`

/ `VLLM_BASE_URL`

. Mix freely — plan locally, implement in
the cloud: `@@local: draft`

then `@@opus: review`

.

By default a CLI block returns **text** — PRISMAG is a router, not an agent. Turn
on exec and a block can take real actions through a small, **permission-gated**
tool loop: it asks before every step, so you grant rights action-by-action.

Set it once in `registry.yaml`

(`exec.enabled: true`

) plus a `default:`

alias, and
the everyday flow needs no tag and no flags — like an agent that asks first:

```
prismag "create a folder on my desktop named poems"
  ⚠ allow run_shell: mkdir -p ~/Desktop/poems ? [y/N] y
  ✓ run_shell: mkdir -p ~/Desktop/poems
```

Prefer per-run control instead? Skip the config and pass `--exec`

(flags always
override config):

```
prismag run --exec "@@opus4.8: create ~/Desktop/poem.txt with a short flower poem"
```

- Tools:
`write_file`

,`read_file`

, and`run_shell`

(`exec.shell: true`

/`--exec-shell`

). - Every action needs approval;
`approve: auto`

(or`--yes`

) skips the prompt (use with care), and a non-interactive shell denies by default.`root:`

confines file actions to one tree. **Destructive commands are refused by default**—`rm -rf /`

,`mkfs`

,`dd of=/dev/…`

, fork bombs,`shutdown`

, etc. are blocked*even if approved*, so a careless`y`

(or`approve: auto`

) can't wreck your machine. Ordinary deletes still work via the normal prompt. Override only with`exec.allow_destructive: true`

.- The protocol is provider-agnostic (a fenced
`prismag`

JSON action), so it works on Anthropic, OpenAI, OpenRouter,**and local** Ollama/vLLM models alike. **CLI-only by design**: inside an IDE the agent already has its own tools, so PRISMAG just emits a delegation plan there. In the`prismag>`

REPL, toggle it with`:exec`

(`:exec shell`

,`:exec yes`

,`:exec off`

).

PRISMAG already *is* the router, so it calls provider REST APIs directly with no
self-hosted proxy, DB, or admin UI to trust and patch. That keeps the
dependency/supply-chain surface tiny — direct APIs, a single static binary.

PRISMAG is a routing protocol any agent can speak — no SDK required. Shell out to
`prismag route --json`

to get a deterministic plan (which model runs which block),
then dispatch with your own model access; or `prismag run --api`

to have PRISMAG
execute and return the result. See [INTEGRATIONS.md](/rufus-SD/prismag/blob/main/INTEGRATIONS.md).

[maind](https://github.com/rufus-SD/maind) is the optional memory backend: an
encrypted, local-first store the CLI and your IDE agent share. With both wired in,
context survives across blocks, sessions, and editors.

See [CONTRIBUTING.md](/rufus-SD/prismag/blob/main/CONTRIBUTING.md).

See [SECURITY.md](/rufus-SD/prismag/blob/main/SECURITY.md) for credential handling and vulnerability reporting.