Piper – DevOps copilot where the LLM picks typed actions, not shell

Piper, a new DevOps copilot, operates with a safety-first architecture where a large language model never directly executes commands — it only selects typed actions from a fixed catalog, which are then validated by deterministic code and run locally on the user's machine. The tool uses a conversational terminal interface to drive existing tools like SSH, kubectl, and Docker, but gates any mutating operations behind explicit human approval, preventing the LLM from reaching infrastructure without consent. By separating the LLM's planning role from command execution, Piper aims to provide a secure, auditable alternative to traditional AI-powered command-line tools that generate arbitrary shell strings.

DevOps at the speed of thought. A terminal-first, LLM-driven DevOps copilot that is safe by construction — the LLM proposes, deterministic code validates, the human approves anything that mutates. Why why-this-exists · · quick-start Quick start · the-deterministic-gate The gate · action-catalog Catalog · knowledge-base--rag Knowledge base Security Important The LLM never executes anything. It only picks an action from a fixed catalog. PIPER then validates the choice and runs the command on your own machine through a single audited executor. The LLM is a planner, not a shell. This is the entire product. PIPER pulls the relevant runbook from its knowledge base, runs read-only diagnostics over SSH, finds the planted issues, proposes fixes — and refuses to apply them, because M1 is read-only. The LLM proposes; the deterministic gate validates; the human stays in the loop. PIPER drives the tools you already trust ssh , kubectl , docker , gh , aws , gcloud , journalctl , ... from a conversational terminal UI — but every command runs locally , picked from a typed action catalog , validated by a path denylist + secret scrubber, and for anything that mutates gated behind an explicit human approval. The LLM can hallucinate freely; it cannot reach your infrastructure unless a real human says yes. › uptime, memory and disk on staging — tail nginx logs if anything looks off PIPER planning… 3 actions chosen from the catalog 1. system.uptime 2. system.memory 3. system.disk usage ✓ system.uptime 520ms, ran locally ✓ system.memory 340ms, ran locally ✓ system.disk usage 410ms, ran locally ▌ Y ◉ ◉ Y ▌ Staging has been up for 14 days with a 0.43 load average ev-1 . Memory ▌ has plenty of headroom — 12 GB free out of 16 GB total ev-2 — and the ▌ root volume sits at 38% ev-3 . Nothing worth flagging on the resource ▌ side, no need to dig into the nginx logs right now. Every ev-N is a link back to the exact command output that produced the claim. PIPER cannot make claims without evidence — the verifier rejects ungrounded synthesis and retries. This is the heart of the product. Read it twice. | What most LLM CLIs do | What PIPER does | | |---|---|---| Who composes the command | The LLM writes a shell string tail -f … , kubectl get … | The LLM picks an action name + typed args from a closed catalog | Who runs it | An execution layer that runs whatever the LLM wrote | PIPER's local executor runs a fixed command template bound to that action | What if the LLM hallucinates | A bogus command might run on your infrastructure | The catalog has no entry for a bogus action → the executor refuses | What you can audit | Prompts + arbitrary shell history | A typed list of actions in source — src/actions/builtin/ — plus the verbatim local exec in audit log | Where the command runs | Sometimes a remote sandbox, sometimes your machine | Always your machine. Local subprocess, optionally SSH'ing into an allowlisted host you registered | Concretely: when the LLM wants to check disk space, it does not emit "df -h /" . It emits a typed tool call — { "name": "system.disk usage", "args": { "host": "staging", "path": "/" } } — and PIPER's executor only src/exec/executor.ts runs anything translates that into df -h / and spawns a local subprocess. The shell string is built in PIPER's source code , not by the LLM. The args are validated by Zod before the spawn. Secrets are stripped on the way out, before going to the audit log and before going back to the LLM. The LLM can ask for system.disk usage . It cannot ask for system.evil undocumented thing . That's the safety property. We built PIPER for two people, both real: The lone developer with no DevOps support. You shipped an app, you need to keep it running, and there's no one to call when the staging container won't come up at 11pm. Today your fallback is pasting logs into ChatGPT and hoping. The DevOps engineer doing the same diagnostic dance fifty times a day. Tail the logs on that node. Check why the deploy is stuck. Verify the cron. You don't need a tutor — you need an editor for infrastructure with audit trail and rollback wired in. Both meet on the same contract: PIPER never silently mutates anything, and you can always see exactly what it is about to do. Not an autonomous agent. PIPER does not act on mutate / destructive actions without approval, and never will. Not a chat product. The TUI is a working surface, not a conversation. Not a Kubernetes admin panel, not a CI replacement, not a monitoring tool. PIPER drives the CLIs you already trust and adds the safety + grounding layer. Not a black box. Every action, prompt, approval rule and audit log entry is readable in source. | Milestone | What | State | |---|---|---| M0 | Spike — Bun --compile + Ink + PGlite WASM | ✅ shipped | M1 | Read-only diagnostics: SSH, logs, health, container/pod status, deterministic gate | ✅ shipped | M1.5 | RAG/memory layer, 3 embedding backends, sessions + resume, auto-compaction, interactive /model & /memory , HUMAN/YOLO modes, 40+ read actions | ✅ shipped | M2 | Mutations behind HITL — docker deploy, env updates, migrations, rollback | ⏳ next | M3 | Scale — Kubernetes deploys, continuous monitor loop, repo suggestions | ⏳ | M4 | On-prem / regulated — local-model-only path, encrypted audit, runbook ingestion at install | ⏳ | No mutate or destructive tier actions exist in the catalog yet, and the runner explicitly refuses them. M1.5 is fully diagnostic by design. Download the binary for your platform from the latest release https://github.com/antoniociccia/piper/releases/latest — no Bun, no node modules , single ~76 MB file. macOS Apple Silicon curl -fsSLO https://github.com/antoniociccia/piper/releases/latest/download/piper-darwin-arm64 chmod +x piper-darwin-arm64 && mv piper-darwin-arm64 /usr/local/bin/piper piper Linux x64 curl -fsSLO https://github.com/antoniociccia/piper/releases/latest/download/piper-linux-x64 chmod +x piper-linux-x64 && sudo mv piper-linux-x64 /usr/local/bin/piper piper A .sha256 is published alongside each binary — verify the download before running. Need Bun https://bun.sh ≥ 1.2 one-line install: curl -fsSL https://bun.sh/install | bash . git clone https://github.com/antoniociccia/piper cd piper bun install bun dev On first launch PIPER detects that ~/.piper/credentials.json doesn't exist and runs an interactive wizard: Backend — probes for any local LLM server running Ollama :11434 , LM Studio :1234 , llama.cpp :8080 , vLLM :8000 , or asks for an OpenRouter API key. Model — pick a tier Featherweight ~$0.10/M, Economy ~$0.44/M, Balanced ~$3/M, Premium $30+/M or a local model from the listed catalog. Embedding backend — wasm default, in-process, offline after first run , http local OpenAI-compatible endpoint , openrouter cloud, paid , or none disable RAG . Budget — per-session USD cap default $0.50; hard stop, not a warning . SSH environment — optionally add a first host PIPER will be able to reach. The wizard writes ~/.piper/credentials.json with mode 0600 . From there: › check uptime and disk usage on staging To resume a previous session at startup: bun dev -- --resume opens a picker over recent sessions bun run build ./dist/piper, ~76 MB ./dist/piper runs without Bun, without node modules The binary embeds PostgreSQL WASM ~13 MB and Yoga layout. The embedding model is not bundled — it lazy-fetches on first RAG use ~120 MB, one time and caches at ~/.piper/cache/models/ . You cannot get "no hallucination" from an LLM. Don't try. Instead, make being wrong safe . PIPER's LLM lives inside a deterministic cage: ┌────────────┐ proposes actions ┌────────────────┐ │ LLM │ ─────tool calls──────► │ Action catalog│ │ any model │ │ read|mutate| │ │ │ │ destructive │ └────────────┘ └───────┬────────┘ ▲ │ validate │ scrubbed │ args Zod │ messages ▼ │ ┌────────────────┐ │ │ Executor │ │ │ the ONLY │ │ │ side-effect │ │ │ surface │ │ └───────┬────────┘ │ │ │ scrub stdout/stderr │ spawns kubectl / └──────────────────────────────────────┤ docker / ssh / │ nc / gh / ... ▼ ┌──────────────────┐ │ PGlite + pgvector│ │ audit log, │ │ evidence, │ │ knowledge │ └──────────────────┘ Three permission tiers with no overrides: | Tier | Examples | Approval | |---|---|---| read | uptime , docker.ps , kubectl get | None. Executes directly. Safe by definition. | mutate | M2 docker deploy , env update | Per-env approval prompt; remembered | destructive | M2 delete , drop , prune , force-push | Fresh prompt every time. Never remembered. Ever. | Five overlapping defenses, applied at every layer: Architectural — SSH keys never leave the OS ssh binary; API keys never enter messages .content . Single-module discipline + CI rule. Path denylist — ~/.ssh/id , ~/.aws/credentials , ~/.kube/config , ~/.gnupg/ , ~/.docker/config.json , ~/.netrc , ~/.piper/ , .env . Non-disablable. User config can extend the list, never weaken it. Two-pass scrubbing — write-time every Executor output → audit log and pre-LLM every message body → HTTP call . Defense in depth. Args refuse — if the LLM tries to embed a recognisable secret AKIA… , sk-or-… , JWTs, PEM blocks, Bearer … in an action's args, the Executor refuses the action — it does not redact. Redaction would mutate semantics. Provider-level privacy — OpenRouter requests set body.provider.data collection = 'deny' . Local mode routes inference through Ollama / llama.cpp / LM Studio / vLLM — network egress for inference is zero. Full design rationale in docs/architecture.md /antoniociccia/piper/blob/main/docs/architecture.md and . /antoniociccia/piper/blob/main/docs/decisions/ADR-001-deterministic-gate.md docs/decisions/ADR-001-deterministic-gate.md 40+ read-tier actions across the major DevOps surfaces. Every action is a typed object registered in src/actions/builtin/ , validated by Zod, executed only through src/exec/executor.ts . Free-form shell from the LLM is not representable in the type system. Click to expand the full catalog | Category | Action | What it does | |---|---|---| System | system.uptime | uptime load average + time up | system.os info | uname -a + /etc/os-release | | system.memory | free -h | | system.disk usage | df -h path? | | system.process list | ps -eo pid,user,pcpu,pmem,args -ww | | system.list dir | ls -la <path deny-list enforced | | system.file stat | stat <path | | system.cpu info | lscpu / /proc/cpuinfo | | system.dmesg | Kernel ring buffer tail | | system.package list | Installed packages dpkg -l / rpm -qa | | system.cron list | User + system crontabs | | system.systemctl list | systemctl list-units --type=service | | system.iptables list | iptables -L -n -v | | Network | network.connections | ss -tunap | network.port check | nc -zv open / refused / timeout / closed | | network.ping | ping -c N -W T | | network.dns lookup | dig / host lookup | | ssh.connect | Probe SSH reachability against an allowlisted host | | Logs | logs.tail | tail -n N <path with optional grep | Services | service.status | systemctl status <unit | service.journal | journalctl -u <unit -n N | | Docker | docker.ps | Container list JSON | docker.logs | Container log tail | | docker.inspect | Container inspect summarised | | docker.compose ps | docker compose ps for a project | | Kubernetes | kubernetes.get | kubectl get <kind pods, deploys, services… | kubernetes.logs | kubectl logs <pod with -c , --previous , tail-N | | kubernetes.describe | kubectl describe <kind /<name | | kubernetes.top pod | kubectl top pod | | kubernetes.events | kubectl get events --sort-by=.lastTimestamp | | kubernetes.context current | kubectl config current-context | | Git | git.status | git status --porcelain=v1 | git.log | git log -n N --oneline --decorate | | GitHub | github.pr list | gh pr list | github.pr view | gh pr view <number | | github.run list | gh run list Actions | | github.run view | gh run view <id logs, conclusion | | github.issue list | gh issue list | | AWS | aws.s3 ls | aws s3 ls | aws.ec2 describe | aws ec2 describe-instances | | aws.cloudwatch tail | aws logs tail CloudWatch | | aws.rds describe | aws rds describe-db-instances | | GCP | gcp.compute list | gcloud compute instances list | gcp.logging read | gcloud logging read | | Azure | azure.vm list | az vm list | Database | postgres.pg isready | pg isready against host:port | Memory | memory.search | In-process semantic search over the local knowledge base | PIPER ships a memory.search action. It is not a shell action — it's in-process semantic search over a local PGlite + pgvector store of: — markdown under runbook docs/runbooks/ — architecture decision records under adr docs/decisions/ — produced by session-summary /session-report — distilled incident notes annex format, opt-in solved-case — free-form knowledge you add yourself note The planner is instructed to call memory.search first when the user's prompt looks like a known incident pattern, a deploy procedure, or references a host that has prior session notes. The agent stays grounded in your runbooks instead of the model's training data. | Backend | Model | Dim | Cost | Notes | |---|---|---|---|---| wasm default | Xenova/multilingual-e5-small | 384 | free | In-process via @huggingface/transformers . 94 languages. ~120 MB downloaded once, then fully offline. Cached at ~/.piper/cache/models/ . | http | OpenAI-compatible local endpoint | varies | free | Ollama nomic-embed-text , 768-dim , LM Studio, llama.cpp, vLLM. | openrouter | Cloud paid embedding model | varies | paid | Only offered if an API key is configured. | none | — | — | — | Disables RAG. memory.search returns empty. | The schema auto-recreates if the dimension mismatches — switching e.g. from Ollama 768-dim to WASM 384-dim drops the old vectors and rebuilds from source. Zero manual migration. Toggle modes with Shift+Tab : HUMAN default — PIPER asks for approval per planned step. Verbatim command is shown before any run. YOLO — read-tier actions execute without per-step approval. mutate and destructive actions still always ask, every time , by design. Slash commands /model interactive model picker Local / OpenRouter tabs, paging, filter /memory knowledge-base viewer Overview + Sources, delete with d /mem, /rag aliases for /memory /resume pick a recent session and reload its history into scrollback /env add <name <user@host :port --key <path --desc "..." --tag a,b /env list /env remove <name /session-report summarise the current session into the knowledge base /debug toggle verbose agent events costs, synth status, RAG hits, LLM trace /help show context-sensitive help /save file.md export the last report to a file /quit exit PIPER Ctrl+C also works Keyboard | Keys | Effect | |---|---| Enter | Send | Shift+Enter | New line multi-line input | Shift+Tab | Toggle HUMAN ↔ YOLO | Ctrl+O | Collapse reasoning — hide agent-event lines from future turns | ? | Context-sensitive help | Esc | Clear current input | Ctrl+C | Quit | The bottom strip of the TUI shows everything at a glance: Y ◉ ◉ Y diagnosing staging $0.0123 | google/gemini-pro-1.5 | OR $4.32 left | 12.4k/128k 10% ███▒▒▒▒▒▒▒ HUMAN Alien mascot — color-cycles while PIPER thinks; idle when waiting on you. Session title — auto-generated from the first user prompt by a tiny LLM call. Cost — running session cost in USD, real provider pricing. Model id — the model currently driving the planner /model to switch . OpenRouter remaining credit — live-fetched every 60s on paid backends. Token meter — N/limit % against the model's maxContextTokens minus 4k reserved for output , measured with real gpt-tokenizer cl100k base. Mode badge — HUMAN green or YOLO red . Persistent by default. PGlite stores sessions at ~/.piper/data/pglite/ . Override with PIPER DATA DIR=/path . Force in-memory ephemeral with PIPER EPHEMERAL=1 . Auto-titled. Small LLM call on the first user prompt names the session. Auto-saved reports. Every done writes the final answer to ~/.piper/data/reports/{sessionId}/run-{ts}.md . Resume. bun dev -- --resume at startup, or /resume mid-session. Auto-compaction. When the planner's context exceeds 70% of the model's maxContextTokens , older turns are rolled into a single summary message. Grounded synthesis. Every claim cites ev-N . A run passes the verifier if ≥75% of substantive lines are cited; ungrounded answers retry.History stays in the terminal's native scrollback — append-only, no redraw, no flicker, no loss when you scroll up. <Static scrollback persistence. | Concern | Choice | |---|---| | Runtime | Bun ≥ 1.2 single-binary via bun build --compile | | Language | TypeScript strict noUncheckedIndexedAccess , exactOptionalPropertyTypes , no any | | Terminal UI | Ink React for the terminal | | Persistence | PGlite PostgreSQL in WASM — single embedded DB | | Vectors | pgvector inside the same PGlite DB HNSW index | | Embeddings default | @huggingface/transformers + Xenova/multilingual-e5-small WASM | | Tokenizer | gpt-tokenizer cl100k base | | Schema validation | Zod | | Model API | OpenAI-compatible /v1/chat/completions | Why these choices: see docs/decisions/ /antoniociccia/piper/blob/main/docs/decisions . ~/.piper/credentials.json created by the wizard, mode 0600 ~/.piper/credentials.json created by the wizard, mode 0600 { "openrouter api key": "sk-or-v1-...", "default provider": "openrouter", "default model": "deepseek/deepseek-v4-pro", "embedding backend": "wasm", "max session cost usd": 0.50, "max followup iterations": 1, "compaction keep recent": 6, "compaction trigger pct": 0.70, "environments": { "prod-web": { "host": "192.0.2.10", "ssh user": "deploy", "port": 22, "identity file": "/Users/me/.ssh/id ed25519", "description": "production web tier", "tags": "prod", "web" } } } Environment variables override the file — useful in CI | Variable | Purpose | |---|---| PIPER PROVIDER | openrouter | ollama | lmstudio | llamacpp | vllm | custom | PIPER BASE URL | Endpoint override | PIPER API KEY / OPENROUTER API KEY | API key | PIPER MODEL | Model id | PIPER EMBEDDING BACKEND | wasm | http | openrouter | none | PIPER MAX SESSION COST USD | Hard budget cap | PIPER DATA DIR | Persistent storage default: ~/.piper/data/pglite/ | PIPER EPHEMERAL | Set to 1 for in-memory storage loses sessions at exit | If an env var doesn't look like a valid API key e.g. a leftover test value , PIPER ignores it with a warning and falls back to the file. bun test 386 unit + gate tests no Docker, no network bun run e2e Docker sshd fixture, E2E tests, teardown bun run typecheck tsc --noEmit, strict Coverage focuses on the security-critical layers: catalog gate, path denylist, secret scrubber, audit log persistence, verifier, embedding-dim migration. CI runs license-checker and rejects any GPL transitive dependency . PIPER is built around a deterministic safety gate. Vulnerability disclosure process: see SECURITY.md /antoniociccia/piper/blob/main/SECURITY.md . Coordinated disclosure, 90-day default. Particular care for: - Prompt-injection that smuggles a command into the gate - Any code path that runs shell outside the Executor - Any code path that logs or sends unredacted secrets - Any code path that lets a remembered rule auto-approve a destructive action - Any code path that bypasses the SSH host allowlist The full architecture + threat model is at docs/architecture.md /antoniociccia/piper/blob/main/docs/architecture.md . Apache-2.0. See LICENSE /antoniociccia/piper/blob/main/LICENSE and — the NOTICE file discloses the Apache-2.0 transitive deps and the LGPL transitive disclosure for /antoniociccia/piper/blob/main/NOTICE NOTICE @img/sharp-libvips pulled in by the embedding pipeline .Contributions welcome. See CONTRIBUTING.md /antoniociccia/piper/blob/main/CONTRIBUTING.md for the flow. Two-eyes rule on anything touching src/exec/ , src/security/ , or src/actions/ — the maintainer reviews these personally.Built with the conviction that making being wrong safe beats trying to make the LLM never wrong.