Show HN: Prismag – Per-block model routing for the terminal and any IDE

wpnews.pro

One prompt enters. Each block routes to the right model.

Tag any block with @@model

and PRISMAG sends it to the model you chose — planning to Opus, implementation to Composer, summaries to a fast model — without switching the IDE picker or juggling chats.

prismag> @@opus: design the auth flow   @@composer: implement the middleware

  ── @@opus → claude-4.6-opus-high-thinking ───────────────────────────
  Use short-lived access tokens with rotating refresh tokens because…

  ── @@composer → composer-2.5-fast ───────────────────────────────────
  // middleware/auth.go
  func RequireAuth(next http.Handler) http.Handler { … }

  routed 2 blocks · chained · 1.8s

Today's AI coding tools force a binary choice:

Pick one model for the whole conversation, or - Open multiple chats and split the work by hand.

Neither matches how you actually work. Planning wants depth (Opus). Implementation wants speed (Composer). Review wants a different lens entirely.

Without PRISMAG	With PRISMAG
One model per chat	A model per block, in one prompt
Switch the picker between tasks	`@@opus:` … `@@composer:` … and go
Manual context copy-paste between chats	Output of block N chains into block N+1
Auto-routing by cost/latency (OpenRouter)	You choose the model per block
YAML/Python pipelines (LangGraph/CrewAI)	Chat-native `@@` syntax, zero config

Prompt with @@tags ──▶ parser ──▶ orchestrator ──▶ model backends ──▶ sectioned result
                                       ▲
                                       └── ContextStore (in-memory · or maind)

The trigger is , not@@

@

— a bare@

collides with the IDE's mention menu.@@

travels as plain text through every chat surface. - Routing is deterministic and owned by the CLI +registry.yaml

. - Blocks run serial + chained by default (output N → context N+1), or--parallel

for independent blocks. - Context flows through a pluggable store — in-memory by default, or maindfor encrypted, cross-session memory.

go install github.com/rufus-SD/prismag@latest

git clone https://github.com/rufus-SD/prismag.git
cd prismag && make install
prismag setup

prismag init

prismag run "@@opus: plan the cache layer" "@@composer: implement it"

Or just run prismag

with no args to drop into the interactive prismag>

session.

PRISMAG works in two ways, from the same global config:

CLI / REPL— runs in any terminal, on any OS. Executes each block via provider APIs using your keys. Universal, deterministic.** In your IDE**—prismag init

writes a rule that teaches the agent to route@@

blocks through PRISMAG. Where the IDE supports per-task subagents, each block is dispatched to its own subagent + model using your subscription (no API keys needed).

Editor	Rule file	Dispatch
Cursor
`.cursor/rules/prismag-routing.mdc` + `.cursor/agents/`
subagents (any model)
Claude Code
`CLAUDE.md` + `.claude/agents/`
subagents (Claude) + API fallback
Windsurf
`.windsurf/rules/prismag-routing.md`
runs via `prismag run`
GitHub Copilot
`.github/copilot-instructions.md`
runs via `prismag run`
Cline
`.clinerules/prismag-routing.md`
runs via `prismag run`
Roo Code
`.roo/rules/prismag-routing.md`
runs via `prismag run`
Aider
`CONVENTIONS.md`
runs via `prismag run`
generic
`.prismag/rules.md`
runs via `prismag run`

prismag connect cursor      # or: claude, windsurf, copilot, cline, roo, aider, generic

Subagent dispatch gives true per-block model switching where the editor exposes it (Cursor, Claude Code). Everywhere else, the agent runs prismag run

and shows the sectioned output verbatim — same routing, same result.

@@<alias>: <task>
context shared with every block goes here, before the first tag

@@opus: review the security implications of this auth module
@@composer: write the unit tests for AuthService
@@fast: summarize the diff in 3 bullets

@@alias

is case-insensitive and maps to a model viaregistry.yaml

.- Text before the first @@

is shared context for all blocks. Serial + chained by default;--parallel

for independent blocks.- Chained runs fail fast; parallel runs tolerate partial failure.

aliases:
  opus:
    model: claude-opus-4-6        # concrete id + offline fallback
    match: claude-opus-4-6        # family resolved against the live model list
    provider: anthropic
    agent: opus-planner           # subagent used when routing in-IDE
    description: Deep reasoning, architecture, security review
  composer:
    model: composer-2.5-fast
    provider: cursor
    agent: composer-implementer
    description: Fast implementation, multi-file edits
  fast:
    model: gpt-5.3-codex
    provider: openai
    description: Cheap, quick summaries and simple transforms

Two optional top-level keys remove friction for everyday use:

default: opus4.8       # untagged prompts route here, so `prismag "do X"` needs no @@tag
exec:                  # CLI tool-loop defaults — set permissions once, no flags per run
  enabled: true        # let blocks act on this machine (write files, …)
  shell: true          # also allow run_shell
  approve: ask         # ask = confirm each action y/N (default) · auto = no prompt

The same model has a different id in every context — claude-opus-4-8

on the Anthropic API, claude-opus-4-8-thinking-high

in Cursor, a local tag in Ollama. Pinning one string breaks the moment a provider renames or bumps a model.

So PRISMAG treats an alias as a family and resolves it to a currently-valid id from the live model list for the active context (queried with your keys in the CLI, cached 12h; the agent-maintained cache in the IDE). It picks the best match deterministically, self-heals across renames, and falls back to the pinned model

when offline. Set match:

to make the family explicit; otherwise model

doubles as it. Inspect what's available any time with prismag models

.

Command	What it does
`prismag`
Interactive `prismag>` session (or onboarding on first run)
`prismag setup`
First-time setup: keys, model discovery, starter registry
`prismag init [tool]`
Wire routing into this project (auto-detects the editor)
`prismag connect <tool>`
Write the integration rule (+ subagents where supported)
`prismag run "@@..."`
Route and execute a tagged prompt (untagged → `default:` alias; `--exec` /`exec:` lets blocks act)
`prismag route "@@..."`
Show the delegation plan without executing (`--json` too)
`prismag list`
List `@@aliases` with availability marks
`prismag models`
Show models available right now
`prismag doctor`
Diagnose keys, registry, and environment
`prismag sessions`
List saved REPL session transcripts
`prismag resume [id]`
Reopen a past session with its context

PRISMAG calls provider APIs directly — keys go straight to the vendor, never to a gateway. Keys are read from the environment, a ~/.config/prismag/.env

, or stored encrypted in maind when present.

ANTHROPIC_API_KEY only:            + OPENAI_API_KEY:
  @@opus      ✓ ready                @@opus      ✓ ready
  @@fast      ✗ needs OPENAI_API_KEY @@fast      ✓ ready

Inside an IDE that dispatches subagents, blocks route via your subscription — no API keys required.

Route any block to a model running on your own machine — no API key, no cloud, $0 per token. Both Ollama and vLLM expose an OpenAI-compatible API, so PRISMAG talks to them natively (streaming included).

ollama pull qwen2.5-coder:7b        # serves on http://localhost:11434
aliases:
  local:
    model: qwen2.5-coder:7b
    provider: ollama                # or: vllm
    description: Local model — private, free, offline
prismag run "@@local: refactor this function"   # runs entirely on your box

Endpoints default to http://localhost:11434/v1

(Ollama) and http://localhost:8000/v1

(vLLM); override per-alias with base_url

or globally with OLLAMA_BASE_URL

/ VLLM_BASE_URL

. Mix freely — plan locally, implement in the cloud: @@local: draft

then @@opus: review

.

By default a CLI block returns text — PRISMAG is a router, not an agent. Turn on exec and a block can take real actions through a small, permission-gated tool loop: it asks before every step, so you grant rights action-by-action.

Set it once in registry.yaml

(exec.enabled: true

) plus a default:

alias, and the everyday flow needs no tag and no flags — like an agent that asks first:

prismag "create a folder on my desktop named poems"
  ⚠ allow run_shell: mkdir -p ~/Desktop/poems ? [y/N] y
  ✓ run_shell: mkdir -p ~/Desktop/poems

Prefer per-run control instead? Skip the config and pass --exec

(flags always override config):

prismag run --exec "@@opus4.8: create ~/Desktop/poem.txt with a short flower poem"

Tools: write_file

,read_file

, andrun_shell

(exec.shell: true

/--exec-shell

). - Every action needs approval; approve: auto

(or--yes

) skips the prompt (use with care), and a non-interactive shell denies by default.root:

confines file actions to one tree. Destructive commands are refused by default—rm -rf /

,mkfs

,dd of=/dev/…

, fork bombs,shutdown

, etc. are blockedeven if approved, so a carelessy

(orapprove: auto

) can't wreck your machine. Ordinary deletes still work via the normal prompt. Override only withexec.allow_destructive: true

.- The protocol is provider-agnostic (a fenced prismag

JSON action), so it works on Anthropic, OpenAI, OpenRouter,and local Ollama/vLLM models alike. CLI-only by design: inside an IDE the agent already has its own tools, so PRISMAG just emits a delegation plan there. In theprismag>

REPL, toggle it with:exec

(:exec shell

,:exec yes

,:exec off

).

PRISMAG already is the router, so it calls provider REST APIs directly with no self-hosted proxy, DB, or admin UI to trust and patch. That keeps the dependency/supply-chain surface tiny — direct APIs, a single static binary.

PRISMAG is a routing protocol any agent can speak — no SDK required. Shell out to prismag route --json

to get a deterministic plan (which model runs which block), then dispatch with your own model access; or prismag run --api

to have PRISMAG execute and return the result. See INTEGRATIONS.md.

maind is the optional memory backend: an encrypted, local-first store the CLI and your IDE agent share. With both wired in, context survives across blocks, sessions, and editors.

See CONTRIBUTING.md.

See SECURITY.md for credential handling and vulnerability reporting.

source & further reading

github.com — original article

Show HN: Prismag – Per-block model routing for the terminal and any IDE

Run your AI side-project on zahid.host