Piper – DevOps copilot where the LLM picks typed actions, not shell

wpnews.pro

DevOps at the speed of thought.

A terminal-first, LLM-driven DevOps copilot that is safe by construction — the LLM proposes, deterministic code validates, the human approves anything that mutates.

Why ·

·

Quick start·

The gate·

Catalog·

Knowledge base

Security Important

The LLM never executes anything. It only picks an action from a fixed catalog. PIPER then validates the choice and runs the command on your own machine through a single audited executor. The LLM is a planner, not a shell. This is the entire product.

PIPER pulls the relevant runbook from its knowledge base, runs read-only diagnostics over SSH, finds the planted issues, proposes fixes — and refuses to apply them, because M1 is read-only. The LLM proposes; the deterministic gate validates; the human stays in the loop.

PIPER drives the tools you already trust (ssh

, kubectl

, docker

, gh

, aws

, gcloud

, journalctl

, ...) from a conversational terminal UI — but every command runs locally, picked from a typed action catalog, validated by a path denylist + secret scrubber, and (for anything that mutates) gated behind an explicit human approval. The LLM can hallucinate freely; it cannot reach your infrastructure unless a real human says yes.

›  uptime, memory and disk on staging — tail nginx logs if anything looks off

PIPER planning…  (3 actions chosen from the catalog)
  1. system.uptime
  2. system.memory
  3. system.disk_usage
   ✓ system.uptime         (520ms, ran locally)
   ✓ system.memory         (340ms, ran locally)
   ✓ system.disk_usage     (410ms, ran locally)

▌ Y(◉ ◉)Y
▌ Staging has been up for 14 days with a 0.43 load average [ev-1]. Memory
▌ has plenty of headroom — 12 GB free out of 16 GB total [ev-2] — and the
▌ root volume sits at 38% [ev-3]. Nothing worth flagging on the resource
▌ side, no need to dig into the nginx logs right now.

Every [ev-N]

is a link back to the exact command output that produced the claim. PIPER cannot make claims without evidence — the verifier rejects ungrounded synthesis and retries.

This is the heart of the product. Read it twice.

What most LLM CLIs do	What PIPER does
Who composes the command
The LLM writes a shell string (`tail -f …` , `kubectl get …` )
The LLM picks an action name + typed args from a closed catalog
Who runs it
An execution layer that runs whatever the LLM wrote	PIPER's local executor runs a fixed command template bound to that action
What if the LLM hallucinates
A bogus command might run on your infrastructure	The catalog has no entry for a bogus action → the executor refuses
What you can audit
Prompts + arbitrary shell history	A typed list of actions in source — `src/actions/builtin/` — plus the verbatim local exec in `audit_log`
Where the command runs
Sometimes a remote sandbox, sometimes your machine	Always your machine. Local subprocess, optionally SSH'ing into an allowlisted host you registered

Concretely: when the LLM wants to check disk space, it does not emit "df -h /"

. It emits a typed tool call —

{ "name": "system.disk_usage", "args": { "host": "staging", "path": "/" } }

— and PIPER's executor (only src/exec/executor.ts

runs anything) translates that into df -h /

and spawns a local subprocess. The shell string is built in PIPER's source code, not by the LLM. The args are validated by Zod before the spawn. Secrets are stripped on the way out, before going to the audit log and before going back to the LLM.

The LLM can ask for system.disk_usage

. It cannot ask for system.evil_undocumented_thing

. That's the safety property.

We built PIPER for two people, both real:

The lone developer with no DevOps support. You shipped an app, you need to keep it running, and there's no one to call when the staging container won't come up at 11pm. Today your fallback is pasting logs into ChatGPT and hoping.The DevOps engineer doing the same diagnostic dance fifty times a day. Tail the logs on that node. Check why the deploy is stuck. Verify the cron. You don't need a tutor — you need an editor for infrastructure with audit trail and rollback wired in.

Both meet on the same contract: PIPER never silently mutates anything, and you can always see exactly what it is about to do.

Not an autonomous agent. PIPER does not act onmutate

/destructive

actions without approval, and never will.Not a chat product. The TUI is a working surface, not a conversation.Not a Kubernetes admin panel, not a CI replacement, not a monitoring tool. PIPER drives the CLIs you already trust and adds the safety + grounding layer.Not a black box. Every action, prompt, approval rule and audit log entry is readable in source.

Milestone	What	State
M0
Spike — Bun `--compile` + Ink + PGlite WASM
✅ shipped
M1
Read-only diagnostics: SSH, logs, health, container/pod status, deterministic gate
✅ shipped
M1.5
RAG/memory layer, 3 embedding backends, sessions + resume, auto-compaction, interactive `/model` & `/memory` , HUMAN/YOLO modes, 40+ read actions
✅ shipped
M2
Mutations behind HITL — docker deploy, env updates, migrations, rollback	⏳ next
M3
Scale — Kubernetes deploys, continuous monitor loop, repo suggestions	⏳
M4
On-prem / regulated — local-model-only path, encrypted audit, runbook ingestion at install	⏳

No mutate

or destructive

tier actions exist in the catalog yet, and the runner explicitly refuses them. M1.5 is fully diagnostic by design.

Download the binary for your platform from the latest release — no Bun, no node_modules

, single ~76 MB file.

curl -fsSLO https://github.com/antoniociccia/piper/releases/latest/download/piper-darwin-arm64
chmod +x piper-darwin-arm64 && mv piper-darwin-arm64 /usr/local/bin/piper
piper

curl -fsSLO https://github.com/antoniociccia/piper/releases/latest/download/piper-linux-x64
chmod +x piper-linux-x64 && sudo mv piper-linux-x64 /usr/local/bin/piper
piper

A .sha256

is published alongside each binary — verify the download before running.

Need Bun ≥ 1.2 (one-line install: curl -fsSL https://bun.sh/install | bash

).

git clone https://github.com/antoniociccia/piper
cd piper
bun install
bun dev

On first launch PIPER detects that ~/.piper/credentials.json

doesn't exist and runs an interactive wizard:

Backend— probes for any local LLM server running (Ollama:11434

, LM Studio:1234

, llama.cpp:8080

, vLLM:8000

), or asks for an OpenRouter API key.Model— pick a tier (Featherweight ~$0.10/M, Economy ~$0.44/M, Balanced ~$3/M, Premium $30+/M) or a local model from the listed catalog.Embedding backend—wasm

(default, in-process, offline after first run),http

(local OpenAI-compatible endpoint),openrouter

(cloud, paid), ornone

(disable RAG).Budget— per-session USD cap (default $0.50; hard stop, not a warning).** SSH environment**— optionally add a first host PIPER will be able to reach.

The wizard writes ~/.piper/credentials.json

with mode 0600

. From there:

›  check uptime and disk usage on staging

To resume a previous session at startup:

bun dev -- --resume      # opens a picker over recent sessions
bun run build      # ./dist/piper, ~76 MB
./dist/piper       # runs without Bun, without node_modules

The binary embeds PostgreSQL WASM (~13 MB) and Yoga layout. The embedding model is not bundled — it lazy-fetches on first RAG use (~120 MB, one time) and caches at ~/.piper/cache/models/

.

You cannot get "no hallucination" from an LLM. Don't try. Instead, make being wrong safe. PIPER's LLM lives inside a deterministic cage:

       ┌────────────┐  proposes actions      ┌────────────────┐
       │    LLM     │ ─────tool_calls──────► │  Action catalog│
       │ (any model)│                        │  (read|mutate| │
       │            │                        │   destructive) │
       └────────────┘                        └───────┬────────┘
              ▲                                      │ validate
              │ scrubbed                             │ args (Zod)
              │ messages                             ▼
              │                              ┌────────────────┐
              │                              │   Executor     │
              │                              │   (the ONLY    │
              │                              │   side-effect  │
              │                              │   surface)     │
              │                              └───────┬────────┘
              │                                      │
              │           scrub stdout/stderr        │ spawns kubectl /
              └──────────────────────────────────────┤ docker / ssh /
                                                     │ nc / gh / ...
                                                     ▼
                                           ┌──────────────────┐
                                           │ PGlite + pgvector│
                                           │ audit_log,       │
                                           │ evidence,        │
                                           │ knowledge        │
                                           └──────────────────┘

Three permission tiers with no overrides:

Tier	Examples	Approval
`read`
`uptime` , `docker.ps` , `kubectl get`
None. Executes directly. Safe by definition.
`mutate`
(M2) `docker deploy` , env update
Per-env approval prompt; remembered
`destructive`
(M2) `delete` , `drop` , `prune` , force-push
Fresh prompt every time. Never remembered. Ever.

Five overlapping defenses, applied at every layer:

Architectural— SSH keys never leave the OSssh

binary; API keys never entermessages[].content

. Single-module discipline + CI rule.Path denylist—~/.ssh/id_*

,~/.aws/credentials

,~/.kube/config

,~/.gnupg/

,~/.docker/config.json

,~/.netrc

,~/.piper/

,.env*

. Non-disablable. User config canextendthe list, never weaken it.Two-pass scrubbing— write-time (every Executor output → audit log) and pre-LLM (every message body → HTTP call). Defense in depth.** Args refuse**— if the LLM tries to embed a recognisable secret (AKIA…

,sk-or-…

, JWTs, PEM blocks,Bearer …

) in an action's args, the Executorrefuses the action— it does not redact. Redaction would mutate semantics.** Provider-level privacy**— OpenRouter requests setbody.provider.data_collection = 'deny'

. Local mode routes inference through Ollama / llama.cpp / LM Studio / vLLM — network egress for inference is zero.

Full design rationale in docs/architecture.md and

.

docs/decisions/ADR-001-deterministic-gate.md

40+ read-tier actions across the major DevOps surfaces. Every action is a typed object registered in src/actions/builtin/

, validated by Zod, executed only through src/exec/executor.ts

. Free-form shell from the LLM is not representable in the type system.

Click to expand the full catalog

Category	Action	What it does
System
`system.uptime`
`uptime` (load average + time up)
`system.os_info`
`uname -a` + `/etc/os-release`

`system.memory`
`free -h`

`system.disk_usage`
`df -h [path?]`

`system.process_list`
`ps -eo pid,user,pcpu,pmem,args -ww`

`system.list_dir`
`ls -la <path>` (deny-list enforced)

`system.file_stat`
`stat <path>`

`system.cpu_info`
`lscpu` / `/proc/cpuinfo`

`system.dmesg`
Kernel ring buffer tail
`system.package_list`
Installed packages (`dpkg -l` / `rpm -qa` )

`system.cron_list`
User + system crontabs
`system.systemctl_list`
`systemctl list-units --type=service`

`system.iptables_list`
`iptables -L -n -v`

Network
`network.connections`
`ss -tunap`
`network.port_check`
`nc -zv` (open / refused / timeout / closed)

`network.ping`
`ping -c N -W T`

`network.dns_lookup`
`dig` / `host` lookup

`ssh.connect`
Probe SSH reachability against an allowlisted host
Logs
`logs.tail`
`tail -n N <path>` with optional grep
Services
`service.status`
`systemctl status <unit>`
`service.journal`
`journalctl -u <unit> -n N`

Docker
`docker.ps`
Container list (JSON)
`docker.logs`
Container log tail
`docker.inspect`
Container inspect (summarised)
`docker.compose_ps`
`docker compose ps` for a project

Kubernetes
`kubernetes.get`
`kubectl get <kind>` (pods, deploys, services…)
`kubernetes.logs`
`kubectl logs <pod>` (with `-c` , `--previous` , tail-N)

`kubernetes.describe`
`kubectl describe <kind>/<name>`

`kubernetes.top_pod`
`kubectl top pod`

`kubernetes.events`
`kubectl get events --sort-by=.lastTimestamp`

`kubernetes.context_current`
`kubectl config current-context`

Git
`git.status`
`git status --porcelain=v1`
`git.log`
`git log -n N --oneline --decorate`

GitHub
`github.pr_list`
`gh pr list`
`github.pr_view`
`gh pr view <number>`

`github.run_list`
`gh run list` (Actions)

`github.run_view`
`gh run view <id>` (logs, conclusion)

`github.issue_list`
`gh issue list`

AWS
`aws.s3_ls`
`aws s3 ls`
`aws.ec2_describe`
`aws ec2 describe-instances`

`aws.cloudwatch_tail`
`aws logs tail` (CloudWatch)

`aws.rds_describe`
`aws rds describe-db-instances`

GCP
`gcp.compute_list`
`gcloud compute instances list`
`gcp.logging_read`
`gcloud logging read`

Azure
`azure.vm_list`
`az vm list`
Database
`postgres.pg_isready`
`pg_isready` against host:port
Memory
`memory.search`
In-process semantic search over the local knowledge base

PIPER ships a memory.search

action. It is not a shell action — it's in-process semantic search over a local PGlite + pgvector store of:

— markdown underrunbook

docs/runbooks/

— architecture decision records underadr

docs/decisions/

— produced bysession-summary

/session-report

— distilled incident notes (annex format, opt-in)solved-case

— free-form knowledge you add yourselfnote

The planner is instructed to call memory.search

first when the user's prompt looks like a known incident pattern, a deploy procedure, or references a host that has prior session notes. The agent stays grounded in your runbooks instead of the model's training data.

Backend	Model	Dim	Cost
`wasm` (default)
`Xenova/multilingual-e5-small`
384	free	In-process via `@huggingface/transformers` . 94 languages. ~120 MB downloaded once, then fully offline. Cached at `~/.piper/cache/models/` .
`http`
OpenAI-compatible local endpoint	varies	free	Ollama (`nomic-embed-text` , 768-dim), LM Studio, llama.cpp, vLLM.
`openrouter`
Cloud paid embedding model	varies	paid	Only offered if an API key is configured.
`none`
—	—	—	Disables RAG. `memory.search` returns empty.

The schema auto-recreates if the dimension mismatches — switching e.g. from Ollama 768-dim to WASM 384-dim drops the old vectors and rebuilds from source. Zero manual migration.

Toggle modes with Shift+Tab:

HUMAN(default) — PIPER asks for approval per planned step. Verbatim command is shown before any run.** YOLO**— read-tier actions execute without per-step approval.mutate

anddestructive

actionsstill always ask, every time, by design.

Slash commands

/model               interactive model picker (Local / OpenRouter tabs, paging, filter)
/memory              knowledge-base viewer (Overview + Sources, delete with d)
/mem, /rag           aliases for /memory
/resume              pick a recent session and reload its history into scrollback
/env add <name> <user@host[:port]> [--key <path>] [--desc "..."] [--tag a,b]
/env list
/env remove <name>
/session-report      summarise the current session into the knowledge base
/debug               toggle verbose agent events (costs, synth status, RAG hits, LLM trace)
/help                show context-sensitive help
/save [file.md]      export the last report to a file
/quit                exit PIPER (Ctrl+C also works)

Keyboard

Keys	Effect
`Enter`
Send
`Shift+Enter`
New line (multi-line input)
`Shift+Tab`
Toggle HUMAN ↔ YOLO
`Ctrl+O`
Collapse reasoning — hide agent-event lines from future turns
`?`
Context-sensitive help
`Esc`
Clear current input
`Ctrl+C`
Quit

The bottom strip of the TUI shows everything at a glance:

Y(◉ ◉)Y  diagnosing staging      $0.0123 | google/gemini-pro-1.5 | OR $4.32 left | 12.4k/128k (10%) ███▒▒▒▒▒▒▒  HUMAN

Alien mascot— color-cycles while PIPER thinks; idle when waiting on you.** Session title**— auto-generated from the first user prompt by a tiny LLM call.** Cost**— running session cost in USD, real provider pricing.** Model id**— the model currently driving the planner (/model

to switch).OpenRouter remaining credit— live-fetched every 60s on paid backends.** Token meter**—N/limit (%)

against the model'smaxContextTokens

(minus 4k reserved for output), measured with realgpt-tokenizer

cl100k_base.Mode badge— HUMAN (green) or YOLO (red).

Persistent by default. PGlite stores sessions at~/.piper/data/pglite/

. Override withPIPER_DATA_DIR=/path

. Force in-memory (ephemeral) withPIPER_EPHEMERAL=1

.Auto-titled. Small LLM call on the first user prompt names the session.Auto-saved reports. Everydone

writes the final answer to~/.piper/data/reports/{sessionId}/run-{ts}.md

.Resume.bun dev -- --resume

at startup, or/resume

mid-session.Auto-compaction. When the planner's context exceeds 70% of the model'smaxContextTokens

, older turns are rolled into a single summary message.Grounded synthesis. Every claim cites[ev-N]

. A run passes the verifier if ≥75% of substantive lines are cited; ungrounded answers retry.History stays in the terminal's native scrollback — append-only, no redraw, no flicker, no loss when you scroll up.<Static>

scrollback persistence.

Concern	Choice
Runtime	Bun ≥ 1.2 (single-binary via `bun build --compile` )
Language	TypeScript strict (`noUncheckedIndexedAccess` , `exactOptionalPropertyTypes` , no `any` )
Terminal UI	Ink (React for the terminal)
Persistence	PGlite (PostgreSQL in WASM — single embedded DB)
Vectors	pgvector inside the same PGlite DB (HNSW index)
Embeddings (default)	`@huggingface/transformers` + `Xenova/multilingual-e5-small` (WASM)
Tokenizer	`gpt-tokenizer` (cl100k_base)
Schema validation	Zod
Model API	OpenAI-compatible `/v1/chat/completions`

Why these choices: see docs/decisions/.

~/.piper/credentials.json

(created by the wizard, mode 0600)

~/.piper/credentials.json

(created by the wizard, mode 0600)

{
  "openrouter_api_key": "sk-or-v1-...",
  "default_provider": "openrouter",
  "default_model": "deepseek/deepseek-v4-pro",
  "embedding_backend": "wasm",
  "max_session_cost_usd": 0.50,
  "max_followup_iterations": 1,
  "compaction_keep_recent": 6,
  "compaction_trigger_pct": 0.70,
  "environments": {
    "prod-web": {
      "host": "192.0.2.10",
      "ssh_user": "deploy",
      "port": 22,
      "identity_file": "/Users/me/.ssh/id_ed25519",
      "description": "production web tier",
      "tags": ["prod", "web"]
    }
  }
}

Environment variables (override the file — useful in CI)

Variable	Purpose
`PIPER_PROVIDER`
`openrouter	ollama
`PIPER_BASE_URL`
Endpoint override
`PIPER_API_KEY` / `OPENROUTER_API_KEY`
API key
`PIPER_MODEL`
Model id
`PIPER_EMBEDDING_BACKEND`
`wasm	http
`PIPER_MAX_SESSION_COST_USD`
Hard budget cap
`PIPER_DATA_DIR`
Persistent storage (default: `~/.piper/data/pglite/` )
`PIPER_EPHEMERAL`
Set to `1` for in-memory storage (loses sessions at exit)

If an env var doesn't look like a valid API key (e.g. a leftover test

value), PIPER ignores it with a warning and falls back to the file.

bun test                       # 386 unit + gate tests (no Docker, no network)
bun run e2e                    # Docker sshd fixture, E2E tests, teardown
bun run typecheck              # tsc --noEmit, strict

Coverage focuses on the security-critical layers: catalog gate, path denylist, secret scrubber, audit log persistence, verifier, embedding-dim migration.

CI runs license-checker

and rejects any GPL transitive dependency.

PIPER is built around a deterministic safety gate. Vulnerability disclosure process: see SECURITY.md. Coordinated disclosure, 90-day default. Particular care for:

Prompt-injection that smuggles a command into the gate
Any code path that runs shell outside the Executor
Any code path that logs or sends unredacted secrets
Any code path that lets a remembered rule auto-approve a destructive

action - Any code path that bypasses the SSH host allowlist

The full architecture + threat model is at docs/architecture.md.

Apache-2.0. See LICENSE and

— the NOTICE file discloses the Apache-2.0 transitive deps and the LGPL transitive disclosure for

NOTICE

@img/sharp-libvips

(pulled in by the embedding pipeline).Contributions welcome. See CONTRIBUTING.md for the flow.

Two-eyes rule on anything touching

src/exec/

, src/security/

, or src/actions/

— the maintainer reviews these personally.Built with the conviction that making being wrong safe beats trying to make the LLM never wrong.

source & further reading

github.com — original article

Piper – DevOps copilot where the LLM picks typed actions, not shell

Run your AI side-project on zahid.host