cd /news/ai-tools/piper-devops-copilot-where-the-llm-p… · home topics ai-tools article
[ARTICLE · art-17473] src=github.com pub= topic=ai-tools verified=true sentiment=↑ positive

Piper – DevOps copilot where the LLM picks typed actions, not shell

Piper, a new DevOps copilot, operates with a safety-first architecture where a large language model never directly executes commands — it only selects typed actions from a fixed catalog, which are then validated by deterministic code and run locally on the user's machine. The tool uses a conversational terminal interface to drive existing tools like SSH, kubectl, and Docker, but gates any mutating operations behind explicit human approval, preventing the LLM from reaching infrastructure without consent. By separating the LLM's planning role from command execution, Piper aims to provide a secure, auditable alternative to traditional AI-powered command-line tools that generate arbitrary shell strings.

read15 min publishedMay 29, 2026

DevOps at the speed of thought.

A terminal-first, LLM-driven DevOps copilot that is safe by construction — the LLM proposes, deterministic code validates, the human approves anything that mutates.

Why ·

·

Quick start·

The gate·

Catalog·

Knowledge base

Security Important

The LLM never executes anything. It only picks an action from a fixed catalog. PIPER then validates the choice and runs the command on your own machine through a single audited executor. The LLM is a planner, not a shell. This is the entire product.

PIPER pulls the relevant runbook from its knowledge base, runs read-only diagnostics over SSH, finds the planted issues, proposes fixes — and refuses to apply them, because M1 is read-only. The LLM proposes; the deterministic gate validates; the human stays in the loop.

PIPER drives the tools you already trust (ssh

, kubectl

, docker

, gh

, aws

, gcloud

, journalctl

, ...) from a conversational terminal UI — but every command runs locally, picked from a typed action catalog, validated by a path denylist + secret scrubber, and (for anything that mutates) gated behind an explicit human approval. The LLM can hallucinate freely; it cannot reach your infrastructure unless a real human says yes.

›  uptime, memory and disk on staging — tail nginx logs if anything looks off

PIPER planning…  (3 actions chosen from the catalog)
  1. system.uptime
  2. system.memory
  3. system.disk_usage
   ✓ system.uptime         (520ms, ran locally)
   ✓ system.memory         (340ms, ran locally)
   ✓ system.disk_usage     (410ms, ran locally)

▌ Y(◉ ◉)Y
▌ Staging has been up for 14 days with a 0.43 load average [ev-1]. Memory
▌ has plenty of headroom — 12 GB free out of 16 GB total [ev-2] — and the
▌ root volume sits at 38% [ev-3]. Nothing worth flagging on the resource
▌ side, no need to dig into the nginx logs right now.

Every [ev-N]

is a link back to the exact command output that produced the claim. PIPER cannot make claims without evidence — the verifier rejects ungrounded synthesis and retries.

This is the heart of the product. Read it twice.

What most LLM CLIs do What PIPER does
Who composes the command
The LLM writes a shell string (tail -f … , kubectl get … )
The LLM picks an action name + typed args from a closed catalog
Who runs it
An execution layer that runs whatever the LLM wrote PIPER's local executor runs a fixed command template bound to that action
What if the LLM hallucinates
A bogus command might run on your infrastructure The catalog has no entry for a bogus action → the executor refuses
What you can audit
Prompts + arbitrary shell history A typed list of actions in source — src/actions/builtin/ — plus the verbatim local exec in audit_log
Where the command runs
Sometimes a remote sandbox, sometimes your machine Always your machine. Local subprocess, optionally SSH'ing into an allowlisted host you registered

Concretely: when the LLM wants to check disk space, it does not emit "df -h /"

. It emits a typed tool call —

{ "name": "system.disk_usage", "args": { "host": "staging", "path": "/" } }

— and PIPER's executor (only src/exec/executor.ts

runs anything) translates that into df -h /

and spawns a local subprocess. The shell string is built in PIPER's source code, not by the LLM. The args are validated by Zod before the spawn. Secrets are stripped on the way out, before going to the audit log and before going back to the LLM.

The LLM can ask for system.disk_usage

. It cannot ask for system.evil_undocumented_thing

. That's the safety property.

We built PIPER for two people, both real:

The lone developer with no DevOps support. You shipped an app, you need to keep it running, and there's no one to call when the staging container won't come up at 11pm. Today your fallback is pasting logs into ChatGPT and hoping.The DevOps engineer doing the same diagnostic dance fifty times a day. Tail the logs on that node. Check why the deploy is stuck. Verify the cron. You don't need a tutor — you need an editor for infrastructure with audit trail and rollback wired in.

Both meet on the same contract: PIPER never silently mutates anything, and you can always see exactly what it is about to do.

Not an autonomous agent. PIPER does not act onmutate

/destructive

actions without approval, and never will.Not a chat product. The TUI is a working surface, not a conversation.Not a Kubernetes admin panel, not a CI replacement, not a monitoring tool. PIPER drives the CLIs you already trust and adds the safety + grounding layer.Not a black box. Every action, prompt, approval rule and audit log entry is readable in source.

Milestone What State
M0
Spike — Bun --compile + Ink + PGlite WASM
✅ shipped
M1
Read-only diagnostics: SSH, logs, health, container/pod status, deterministic gate
✅ shipped
M1.5
RAG/memory layer, 3 embedding backends, sessions + resume, auto-compaction, interactive /model & /memory , HUMAN/YOLO modes, 40+ read actions
✅ shipped
M2
Mutations behind HITL — docker deploy, env updates, migrations, rollback ⏳ next
M3
Scale — Kubernetes deploys, continuous monitor loop, repo suggestions
M4
On-prem / regulated — local-model-only path, encrypted audit, runbook ingestion at install

No mutate

or destructive

tier actions exist in the catalog yet, and the runner explicitly refuses them. M1.5 is fully diagnostic by design.

Download the binary for your platform from the latest release — no Bun, no node_modules

, single ~76 MB file.

curl -fsSLO https://github.com/antoniociccia/piper/releases/latest/download/piper-darwin-arm64
chmod +x piper-darwin-arm64 && mv piper-darwin-arm64 /usr/local/bin/piper
piper

curl -fsSLO https://github.com/antoniociccia/piper/releases/latest/download/piper-linux-x64
chmod +x piper-linux-x64 && sudo mv piper-linux-x64 /usr/local/bin/piper
piper

A .sha256

is published alongside each binary — verify the download before running.

Need Bun ≥ 1.2 (one-line install: curl -fsSL https://bun.sh/install | bash

).

git clone https://github.com/antoniociccia/piper
cd piper
bun install
bun dev

On first launch PIPER detects that ~/.piper/credentials.json

doesn't exist and runs an interactive wizard:

Backend— probes for any local LLM server running (Ollama:11434

, LM Studio:1234

, llama.cpp:8080

, vLLM:8000

), or asks for an OpenRouter API key.Model— pick a tier (Featherweight ~$0.10/M, Economy ~$0.44/M, Balanced ~$3/M, Premium $30+/M) or a local model from the listed catalog.Embedding backendwasm

(default, in-process, offline after first run),http

(local OpenAI-compatible endpoint),openrouter

(cloud, paid), ornone

(disable RAG).Budget— per-session USD cap (default $0.50; hard stop, not a warning).** SSH environment**— optionally add a first host PIPER will be able to reach.

The wizard writes ~/.piper/credentials.json

with mode 0600

. From there:

›  check uptime and disk usage on staging

To resume a previous session at startup:

bun dev -- --resume      # opens a picker over recent sessions
bun run build      # ./dist/piper, ~76 MB
./dist/piper       # runs without Bun, without node_modules

The binary embeds PostgreSQL WASM (~13 MB) and Yoga layout. The embedding model is not bundled — it lazy-fetches on first RAG use (~120 MB, one time) and caches at ~/.piper/cache/models/

.

You cannot get "no hallucination" from an LLM. Don't try. Instead, make being wrong safe. PIPER's LLM lives inside a deterministic cage:

       ┌────────────┐  proposes actions      ┌────────────────┐
       │    LLM     │ ─────tool_calls──────► │  Action catalog│
       │ (any model)│                        │  (read|mutate| │
       │            │                        │   destructive) │
       └────────────┘                        └───────┬────────┘
              ▲                                      │ validate
              │ scrubbed                             │ args (Zod)
              │ messages                             ▼
              │                              ┌────────────────┐
              │                              │   Executor     │
              │                              │   (the ONLY    │
              │                              │   side-effect  │
              │                              │   surface)     │
              │                              └───────┬────────┘
              │                                      │
              │           scrub stdout/stderr        │ spawns kubectl /
              └──────────────────────────────────────┤ docker / ssh /
                                                     │ nc / gh / ...
                                                     ▼
                                           ┌──────────────────┐
                                           │ PGlite + pgvector│
                                           │ audit_log,       │
                                           │ evidence,        │
                                           │ knowledge        │
                                           └──────────────────┘

Three permission tiers with no overrides:

Tier Examples Approval
read
uptime , docker.ps , kubectl get
None. Executes directly. Safe by definition.
mutate
(M2) docker deploy , env update
Per-env approval prompt; remembered
destructive
(M2) delete , drop , prune , force-push
Fresh prompt every time. Never remembered. Ever.

Five overlapping defenses, applied at every layer:

Architectural— SSH keys never leave the OSssh

binary; API keys never entermessages[].content

. Single-module discipline + CI rule.Path denylist~/.ssh/id_*

,~/.aws/credentials

,~/.kube/config

,~/.gnupg/

,~/.docker/config.json

,~/.netrc

,~/.piper/

,.env*

. Non-disablable. User config canextendthe list, never weaken it.Two-pass scrubbing— write-time (every Executor output → audit log) and pre-LLM (every message body → HTTP call). Defense in depth.** Args refuse**— if the LLM tries to embed a recognisable secret (AKIA…

,sk-or-…

, JWTs, PEM blocks,Bearer …

) in an action's args, the Executorrefuses the action— it does not redact. Redaction would mutate semantics.** Provider-level privacy**— OpenRouter requests setbody.provider.data_collection = 'deny'

. Local mode routes inference through Ollama / llama.cpp / LM Studio / vLLM — network egress for inference is zero.

Full design rationale in docs/architecture.md and

.

docs/decisions/ADR-001-deterministic-gate.md

40+ read-tier actions across the major DevOps surfaces. Every action is a typed object registered in src/actions/builtin/

, validated by Zod, executed only through src/exec/executor.ts

. Free-form shell from the LLM is not representable in the type system.

Click to expand the full catalog

Category Action What it does
System
system.uptime
uptime (load average + time up)
system.os_info
uname -a + /etc/os-release
system.memory
free -h
system.disk_usage
df -h [path?]
system.process_list
ps -eo pid,user,pcpu,pmem,args -ww
system.list_dir
ls -la <path> (deny-list enforced)
system.file_stat
stat <path>
system.cpu_info
lscpu / /proc/cpuinfo
system.dmesg
Kernel ring buffer tail
system.package_list
Installed packages (dpkg -l / rpm -qa )
system.cron_list
User + system crontabs
system.systemctl_list
systemctl list-units --type=service
system.iptables_list
iptables -L -n -v
Network
network.connections
ss -tunap
network.port_check
nc -zv (open / refused / timeout / closed)
network.ping
ping -c N -W T
network.dns_lookup
dig / host lookup
ssh.connect
Probe SSH reachability against an allowlisted host
Logs
logs.tail
tail -n N <path> with optional grep
Services
service.status
systemctl status <unit>
service.journal
journalctl -u <unit> -n N
Docker
docker.ps
Container list (JSON)
docker.logs
Container log tail
docker.inspect
Container inspect (summarised)
docker.compose_ps
docker compose ps for a project
Kubernetes
kubernetes.get
kubectl get <kind> (pods, deploys, services…)
kubernetes.logs
kubectl logs <pod> (with -c , --previous , tail-N)
kubernetes.describe
kubectl describe <kind>/<name>
kubernetes.top_pod
kubectl top pod
kubernetes.events
kubectl get events --sort-by=.lastTimestamp
kubernetes.context_current
kubectl config current-context
Git
git.status
git status --porcelain=v1
git.log
git log -n N --oneline --decorate
GitHub
github.pr_list
gh pr list
github.pr_view
gh pr view <number>
github.run_list
gh run list (Actions)
github.run_view
gh run view <id> (logs, conclusion)
github.issue_list
gh issue list
AWS
aws.s3_ls
aws s3 ls
aws.ec2_describe
aws ec2 describe-instances
aws.cloudwatch_tail
aws logs tail (CloudWatch)
aws.rds_describe
aws rds describe-db-instances
GCP
gcp.compute_list
gcloud compute instances list
gcp.logging_read
gcloud logging read
Azure
azure.vm_list
az vm list
Database
postgres.pg_isready
pg_isready against host:port
Memory
memory.search
In-process semantic search over the local knowledge base

PIPER ships a memory.search

action. It is not a shell action — it's in-process semantic search over a local PGlite + pgvector store of:

— markdown underrunbook

docs/runbooks/

— architecture decision records underadr

docs/decisions/

— produced bysession-summary

/session-report

— distilled incident notes (annex format, opt-in)solved-case

— free-form knowledge you add yourselfnote

The planner is instructed to call memory.search

first when the user's prompt looks like a known incident pattern, a deploy procedure, or references a host that has prior session notes. The agent stays grounded in your runbooks instead of the model's training data.

Backend Model Dim Cost Notes
wasm (default)
Xenova/multilingual-e5-small
384 free In-process via @huggingface/transformers . 94 languages. ~120 MB downloaded once, then fully offline. Cached at ~/.piper/cache/models/ .
http
OpenAI-compatible local endpoint varies free Ollama (nomic-embed-text , 768-dim), LM Studio, llama.cpp, vLLM.
openrouter
Cloud paid embedding model varies paid Only offered if an API key is configured.
none
Disables RAG. memory.search returns empty.

The schema auto-recreates if the dimension mismatches — switching e.g. from Ollama 768-dim to WASM 384-dim drops the old vectors and rebuilds from source. Zero manual migration.

Toggle modes with Shift+Tab:

HUMAN(default) — PIPER asks for approval per planned step. Verbatim command is shown before any run.** YOLO**— read-tier actions execute without per-step approval.mutate

anddestructive

actionsstill always ask, every time, by design.

Slash commands

/model               interactive model picker (Local / OpenRouter tabs, paging, filter)
/memory              knowledge-base viewer (Overview + Sources, delete with d)
/mem, /rag           aliases for /memory
/resume              pick a recent session and reload its history into scrollback
/env add <name> <user@host[:port]> [--key <path>] [--desc "..."] [--tag a,b]
/env list
/env remove <name>
/session-report      summarise the current session into the knowledge base
/debug               toggle verbose agent events (costs, synth status, RAG hits, LLM trace)
/help                show context-sensitive help
/save [file.md]      export the last report to a file
/quit                exit PIPER (Ctrl+C also works)

Keyboard

Keys Effect
Enter
Send
Shift+Enter
New line (multi-line input)
Shift+Tab
Toggle HUMAN ↔ YOLO
Ctrl+O
Collapse reasoning — hide agent-event lines from future turns
?
Context-sensitive help
Esc
Clear current input
Ctrl+C
Quit

The bottom strip of the TUI shows everything at a glance:

Y(◉ ◉)Y  diagnosing staging      $0.0123 | google/gemini-pro-1.5 | OR $4.32 left | 12.4k/128k (10%) ███▒▒▒▒▒▒▒  HUMAN

Alien mascot— color-cycles while PIPER thinks; idle when waiting on you.** Session title**— auto-generated from the first user prompt by a tiny LLM call.** Cost**— running session cost in USD, real provider pricing.** Model id**— the model currently driving the planner (/model

to switch).OpenRouter remaining credit— live-fetched every 60s on paid backends.** Token meter**—N/limit (%)

against the model'smaxContextTokens

(minus 4k reserved for output), measured with realgpt-tokenizer

cl100k_base.Mode badge— HUMAN (green) or YOLO (red).

Persistent by default. PGlite stores sessions at~/.piper/data/pglite/

. Override withPIPER_DATA_DIR=/path

. Force in-memory (ephemeral) withPIPER_EPHEMERAL=1

.Auto-titled. Small LLM call on the first user prompt names the session.Auto-saved reports. Everydone

writes the final answer to~/.piper/data/reports/{sessionId}/run-{ts}.md

.Resume.bun dev -- --resume

at startup, or/resume

mid-session.Auto-compaction. When the planner's context exceeds 70% of the model'smaxContextTokens

, older turns are rolled into a single summary message.Grounded synthesis. Every claim cites[ev-N]

. A run passes the verifier if ≥75% of substantive lines are cited; ungrounded answers retry.History stays in the terminal's native scrollback — append-only, no redraw, no flicker, no loss when you scroll up.<Static>

scrollback persistence.

Concern Choice
Runtime Bun ≥ 1.2 (single-binary via bun build --compile )
Language TypeScript strict (noUncheckedIndexedAccess , exactOptionalPropertyTypes , no any )
Terminal UI Ink (React for the terminal)
Persistence PGlite (PostgreSQL in WASM — single embedded DB)
Vectors pgvector inside the same PGlite DB (HNSW index)
Embeddings (default) @huggingface/transformers + Xenova/multilingual-e5-small (WASM)
Tokenizer gpt-tokenizer (cl100k_base)
Schema validation Zod
Model API OpenAI-compatible /v1/chat/completions

Why these choices: see docs/decisions/.

~/.piper/credentials.json

(created by the wizard, mode 0600)

~/.piper/credentials.json

(created by the wizard, mode 0600)

{
  "openrouter_api_key": "sk-or-v1-...",
  "default_provider": "openrouter",
  "default_model": "deepseek/deepseek-v4-pro",
  "embedding_backend": "wasm",
  "max_session_cost_usd": 0.50,
  "max_followup_iterations": 1,
  "compaction_keep_recent": 6,
  "compaction_trigger_pct": 0.70,
  "environments": {
    "prod-web": {
      "host": "192.0.2.10",
      "ssh_user": "deploy",
      "port": 22,
      "identity_file": "/Users/me/.ssh/id_ed25519",
      "description": "production web tier",
      "tags": ["prod", "web"]
    }
  }
}

Environment variables (override the file — useful in CI)

Variable Purpose
PIPER_PROVIDER
`openrouter ollama
PIPER_BASE_URL
Endpoint override
PIPER_API_KEY / OPENROUTER_API_KEY
API key
PIPER_MODEL
Model id
PIPER_EMBEDDING_BACKEND
`wasm http
PIPER_MAX_SESSION_COST_USD
Hard budget cap
PIPER_DATA_DIR
Persistent storage (default: ~/.piper/data/pglite/ )
PIPER_EPHEMERAL
Set to 1 for in-memory storage (loses sessions at exit)

If an env var doesn't look like a valid API key (e.g. a leftover test

value), PIPER ignores it with a warning and falls back to the file.

bun test                       # 386 unit + gate tests (no Docker, no network)
bun run e2e                    # Docker sshd fixture, E2E tests, teardown
bun run typecheck              # tsc --noEmit, strict

Coverage focuses on the security-critical layers: catalog gate, path denylist, secret scrubber, audit log persistence, verifier, embedding-dim migration.

CI runs license-checker

and rejects any GPL transitive dependency.

PIPER is built around a deterministic safety gate. Vulnerability disclosure process: see SECURITY.md. Coordinated disclosure, 90-day default. Particular care for:

  • Prompt-injection that smuggles a command into the gate

  • Any code path that runs shell outside the Executor

  • Any code path that logs or sends unredacted secrets

  • Any code path that lets a remembered rule auto-approve a destructive

action - Any code path that bypasses the SSH host allowlist

The full architecture + threat model is at docs/architecture.md.

Apache-2.0. See LICENSE and

— the NOTICE file discloses the Apache-2.0 transitive deps and the LGPL transitive disclosure for

NOTICE

@img/sharp-libvips

(pulled in by the embedding pipeline).Contributions welcome. See CONTRIBUTING.md for the flow.

Two-eyes rule on anything touching

src/exec/

, src/security/

, or src/actions/

— the maintainer reviews these personally.Built with the conviction that making being wrong safe beats trying to make the LLM never wrong.

── more in #ai-tools 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/piper-devops-copilot…] indexed:0 read:15min 2026-05-29 ·