AGENTS.md
, SYSTEM.md
, APPEND_SYSTEM.md
, skills, and extension hooks.Most coding agents present themselves as finished products: you install them, learn their commands, and work within the boundaries the authors chose. That can be fine if the built-in workflow matches your needs. It becomes limiting when you want to change how prompts are assembled, how tools are registered, how sessions are summarized, or how the agent is embedded inside your own application.
Pi Coding Agent takes a different path.
Based on the official Pi homepage, documentation, and repository, Pi from Earendil Works is better understood as a minimal agent harness with a coding-oriented runtime than as a fixed end-user product. It ships with useful defaults, but its architecture assumes users may want to replace or extend large parts of the workflow. The project explicitly positions advanced behavior such as plan-like workflows, extra commands, and other higher-level capabilities as things that can live in extensions or packages instead of being hardcoded into the core.
That design choice matters for engineers building AI tooling. It affects maintainability, portability, and how easily the system can adapt to terminals, IDE wrappers, automation pipelines, or internal developer platforms.
In this article, we will look at how Pi is structured, why its layering matters, how its context pipeline works, and what tradeoffs appear once you start using extensions, RPC mode, or SDK embedding.
A coding agent has to do several jobs at once:
Many tools solve all four inside one tightly coupled application. That can make the initial experience simple, but it often makes customization expensive. If you want to change prompt composition or session summarization, you may end up forking the project or working against internal assumptions.
Pi’s architecture addresses this by splitting responsibilities into layers.
According to the repository README, Pi is organized as a monorepo with distinct packages:
@earendil-works/pi-ai
@earendil-works/pi-agent-core
@earendil-works/pi-coding-agent
@earendil-works/pi-tui
This package split is the clearest way to understand the system.
pi-ai
This is the provider abstraction layer. Its role is to present a unified interface across multiple model providers.
Why this layer exists:
This is a standard but important decision. If provider-specific details leak into higher layers, the whole system becomes harder to test and evolve.
pi-agent-core
This is the runtime layer for core agent behavior, including tool calling and state management.
Why this matters:
Architecturally, this is the part that keeps Pi from being “just a CLI.”
pi-coding-agent
This is where Pi becomes a coding agent rather than a generic agent harness.
This layer includes:
This package is the operational center of the project. It contains the logic that most users think of as “Pi,” while still remaining separable from the lower-level runtime and the higher-level UI.
pi-tui
This is the terminal UI layer.
Its presence as a distinct package is important because it suggests the user interface is not the agent itself. The same runtime can support different frontends.
That leads directly to one of Pi’s strongest architectural decisions: frontend/runtime separation.
The official docs describe four major usage modes:
That means Pi is not tied to its terminal interface, even if the terminal is the primary experience.
This is the user-facing CLI workflow most people will start with. It combines the runtime with the terminal UI and built-in commands.
These modes are useful for automation or simple scripting where you want structured output without a long-lived interactive session.
RPC mode exposes Pi through a JSONL protocol over stdin/stdout. This is the mode that makes IDE integrations, editor plugins, and service wrappers plausible without reimplementing the core runtime.
For example:
pi --mode rpc [options]
{"id": "req-1", "type": "prompt", "message": "Hello, world!"}
This is a strong design choice because subprocess embedding is often the easiest integration path for tools written in another language or running in another environment.
For Node.js and TypeScript applications, Pi can be embedded in-process through its SDK.
import {
type CreateAgentSessionRuntimeFactory,
createAgentSessionFromServices,
createAgentSessionRuntime,
createAgentSessionServices,
getAgentDir,
runRpcMode,
SessionManager,
} from "@earendil-works/pi-coding-agent";
const createRuntime: CreateAgentSessionRuntimeFactory = async ({ cwd, sessionManager, sessionStartEvent }) => {
const services = await createAgentSessionServices({ cwd });
return {
...(await createAgentSessionFromServices({
services,
sessionManager,
sessionStartEvent,
})),
services,
diagnostics: services.diagnostics,
};
};
const runtime = await createAgentSessionRuntime(createRuntime, {
cwd: process.cwd(),
agentDir: getAgentDir(),
sessionManager: SessionManager.create(process.cwd()),
});
await runRpcMode(runtime);
This snippet shows the decomposition clearly: services, session manager, runtime creation, then a mode runner on top.
For AI agents, architecture is really about workflow under constraints. Pi’s runtime appears to follow a loop like this:
The interesting part is that this pipeline is not fully hardcoded. The extension system lets you intercept multiple stages.
The extension docs describe lifecycle events around startup, provider requests, tool calls, compaction, tree navigation, and shutdown. Examples mentioned in the source material include:
session_start
before_agent_start
tool_call
before_provider_request
after_provider_response
session_before_compact
session_compact
session_before_tree
session_tree
session_shutdown
That event model suggests a publish/subscribe architecture around the core loop instead of a single monolithic pipeline. This is one of the biggest reasons Pi feels more like a toolkit than a product.
A lot of agent systems treat prompt engineering as text pasted into a config file. Pi treats it as infrastructure.
According to the docs and homepage, Pi can load:
AGENTS.md
and CLAUDE.md
from user/global and project directoriesSYSTEM.md
to replace the default system promptAPPEND_SYSTEM.md
to append to itThis is not a minor convenience feature. It changes how the system is operated.
Skills are loaded only when needed instead of always being included in the prompt. That helps avoid bloating context windows and prompt caches.
This is a practical tradeoff:
Pi chooses the second option, which fits its broader design: minimal default core, dynamic behavior at runtime.
Pi also allows extensions to modify the assembled system prompt before model execution.
export default function promptCustomizer(pi: ExtensionAPI) {
pi.on("before_agent_start", async (event) => {
const { systemPrompt, systemPromptOptions } = event;
const customPrompt = addToolGuidance(systemPromptOptions, systemPrompt);
const appendSection = mergeWithUserAppend(systemPromptOptions);
return {
systemPrompt: `${customPrompt}${appendSection}`,
};
});
}
This is a strong example of Pi’s philosophy. Prompt composition is not just a file- step; it is part of the runtime and open to modification.
Pi stores sessions in JSONL and supports commands such as /resume
, /new
, /tree
, /fork
, and /clone
.
That combination implies that the session model is not a flat transcript. It supports branching workflows where a user can explore alternate paths.
JSONL is a practical format for agent session storage because it is:
For terminal-first tools, that is often a better fit than requiring a heavier database.
The source material notes that branch summarization is used when switching branches so that context from the abandoned branch can be injected into the new branch’s working context.
That matters because branching is not just a UI feature. It affects memory and continuity.
Pi also distinguishes between full history and in-memory working context. Compaction affects the latter, not the underlying stored session history. That is an important operational detail if you are debugging behavior or writing extensions that depend on prior entries.
Most agent systems eventually need summarization because context windows are finite. Pi exposes compaction as a visible architectural feature rather than hiding it as internal bookkeeping.
The docs describe two summarization mechanisms:
They also define cut-point rules. For example, tool results must remain attached to their tool calls, so valid compaction boundaries are restricted.
That is exactly the kind of implementation detail extension authors need to know. If your extension assumes history can be split anywhere, you may break tool-call coherence.
Pi even allows custom compaction logic through hooks.
pi.on("session_before_compact", async (event, ctx) => {
const { preparation, branchEntries, customInstructions, signal } = event;
// Cancel:
return { cancel: true };
// Custom summary:
return {
compaction: {
summary: "...",
firstKeptEntryId: preparation.firstKeptEntryId,
tokensBefore: preparation.tokensBefore,
},
};
});
This makes compaction a policy surface, not just an implementation detail.
The flexibility is useful, but it increases the burden on extension authors.
You need to understand:
firstKeptEntryId
tokensBefore
If you ignore those details, summaries may be technically valid but operationally misleading.
Pi’s homepage explicitly says it skips some built-in features and expects users to add them through extensions or packages. That is one of the most unusual and important aspects of the project.
Tools are not fixed at compile time. An extension can register them during session startup.
import type { ExtensionAPI } from "@earendil-works/pi-coding-agent";
import { Type } from "typebox";
const ECHO_PARAMS = Type.Object({
message: Type.String({ description: "Message to echo" }),
});
export default function dynamicToolsExtension(pi: ExtensionAPI) {
const registeredToolNames = new Set<string>();
const registerEchoTool = (
name: string,
label: string,
prefix: string,
): boolean => {
if (registeredToolNames.has(name)) {
return false;
}
registeredToolNames.add(name);
pi.registerTool({
name,
label,
description: `Echo a message with prefix: ${prefix}`,
promptSnippet: `Echo back user-provided text with ${prefix.trim()} prefix`,
promptGuidelines: [
"Use echo_session when the user asks for exact echo output.",
],
parameters: ECHO_PARAMS,
async execute(_toolCallId, params) {
return {
content: [{ type: "text", text: `${prefix}${params.message}` }],
details: { tool: name, prefix },
};
},
});
return true;
};
pi.on("session_start", (_event, ctx) => {
registerEchoTool("echo_session", "Echo Session", "[session] ");
ctx.ui.notify("Registered dynamic tool: echo_session", "info");
});
}
This is a clear signal that Pi’s workflow surface is intended to be extended, not merely configured.
Based on the provided material, extensions can influence:
That is unusually broad. It also explains why Pi can remain small at the core while still supporting highly specialized workflows.
RPC mode is one of Pi’s most practical features for teams building wrappers or custom frontends. But the protocol details matter.
The docs specify strict JSONL semantics with LF as the record delimiter.
The source material calls out a concrete gotcha: Node’s readline
is not protocol-compliant for this use case because it can split on Unicode line separators such as U+2028
and U+2029
, which are valid inside JSON strings.
That means a robust client should:
\n
only\r\n
by stripping the trailing \r
This is a good example of a small but important systems detail. If you are embedding Pi inside an editor extension or orchestrator, protocol correctness matters more than convenience.
Pi’s flexibility does not remove operational risk.
The repository README states that Pi does not provide a built-in permission system for filesystem, process, network, or credential access. It runs with the launching user’s permissions.
That has an obvious implication: if you need stronger isolation, you should containerize or otherwise sandbox it externally.
Before trust is granted, Pi loads only a subset of context and extension sources. According to the docs, project-local extensions, package-managed project extensions, and project settings are loaded only after trust resolution.
In non-interactive modes, trust prompts are not shown, so automation behavior depends on defaults or explicit CLI overrides.
If you are building tooling around Pi, document this clearly. Otherwise, a project may behave differently in interactive use versus CI-like or subprocess-driven environments.
After /fork
or /clone
, Pi emits session_shutdown
for the old extension instance, reloads and rebinds extensions, and then emits session_start
for the new session.
That means in-memory extension state is not automatically preserved. If state matters, persist it into session entries or rebuild it during startup.
Pi’s design is especially useful when you need one of the following:
In other words, Pi is less about delivering one ideal workflow and more about providing a stable substrate for many workflows.
That is the real architectural difference.
Pi Coding Agent stands out because it treats extensibility as the default architecture rather than an afterthought. The minimal core is not a limitation by accident; it is the mechanism that keeps the system adaptable.
That makes Pi especially interesting for engineers who want more than a terminal chatbot. If you need a coding agent that can be embedded, wrapped, or reshaped without forking the entire application, Pi’s layered design is worth studying.
The practical next step is to evaluate it in the mode closest to your real use case:
In Pi, the architecture is the product.