Halyard – open AI work ledger for developers (time, tokens, cost, invoices)

wpnews.pro

A halyard is the line that raises the sails. Pull on it, the sails go up. Pull on this one, your AI work comes into focus.

Your AI work leaves a trail. Halyard makes that trail legible, auditable, and client-safe.

Every AI session — time, tokens, model, cost, project — captured where the work happens, stored as plain text on your machine, owned by you. No account. No cloud service. No prompt or code capture. Ever. MIT licensed.

Status: alpha, open source — capture loop, reports, invoices, The Bridge, and TUI in daily use.

You're doing AI-assisted work. At the end of a sprint, a month, or a client engagement, you can't answer three basic questions:

What did AI actually cost on this project?
What did AI help produce — and can you prove it?
Is your AI spend going in the right direction?

Your tools don't record this. Halyard does.

It runs as lightweight hooks inside Claude Code, Cursor, and Gemini CLI, with manual/editor-task capture for tools like VS Code where no public AI-session hook exists. Every session writes one line to a plain-text log you own. From that log: cost breakdowns, project attribution, invoice evidence, and eventually a signed, verifiable AI work appendix you can hand to a client.

The privacy promise is unconditional: Halyard captures session metadata, never prompt content, code context, file contents, or transcripts.

Individual developers and freelancers — your primary audience right now. Your time, your AI spend, and your invoice evidence live as plain text on your laptop. Halyard helps you prove what happened without exposing prompts or code. Git it, back it up, sync it however you want. No account. No SaaS. No proprietary format.

Small AI shops — share the same local ledger format across a team. Project spend, trust-labeled cost allocation, and client-safe appendices built from individual plain-text logs.

Enterprise — the same format will support governance, cost centers, and cross-tool AI Work Intelligence later. That layer is additive, gated on design-partner pull, and will not change what local files mean.

Halyard has three layers:

Collection — Lightweight hooks that run where AI work happens. Claude Code, Cursor, Windsurf, and Gemini CLI hooks capture sessions automatically; a VS Code extension captures editing time, branch, and code delta — token counts unavailable until VS Code/Copilot exposes a public session hook. Since v3.5, Claude Code sessions are tagged with an advisory client surface (cli

/ desktop

/ ide

) detected from the local environment. Captured fields include time, tokens when available, model, cost, project, and branch. Written to a plain-text log you own. New sessions are appended, and the current hardening track is making corrections explicit and auditable. Nothing is lost silently.

Intelligence — Analytics built on that log. Local CLI reports, cost-by-project breakdowns, per-model spend, budget alerts, and trust-labeled totals (captured vs. calculated vs. allocated). Works offline, no account required.

AI Work Ledger — Cost allocation for seat subscriptions and credit plans. If you pay $200/month for Claude Max, Halyard allocates that cost across your projects proportionally — by active minutes, session count, or credit usage — so you know what each client engagement actually costs. Runs on top of ai-sessions.log

and ai-plans.toml

; nothing is written back to the raw log.

Proof Artifacts — Invoice evidence today, and a signed attestable AI work appendix next. The goal is a client-safe artifact that proves AI-assisted work without showing prompts, transcripts, source code, or file contents.

The Bridge — A local dashboard for watching capture happen in real time. Run halyard dashboard

inside any Halyard project.

Rich Session Telemetry — Where tools expose it, Halyard captures operational metadata beyond cost: tool call counts, error rates, wall time vs. active agent time, code delta, and per-model breakdowns. Gemini CLI sessions include full multi-model breakdowns from the history file. These signals surface in the TUI and The Bridge as work-health indicators — not productivity scores, but honest signals of session shape.

Honors — A service record that rewards clean proof, not raw hours. Ranks advance on attributed sessions (Deckhand → Commodore), stripes track your watch streak, and eight medals recognize behaviors that matter: completing your first watch, keeping a clean manifest, rescuing adrift sessions, and more. Run halyard honors

to see your record.

Friends of the Sea — One sea creature per completed project, auto-assigned by personality. Projects move through nautical voyage stages (Anchors Aweigh → Making Headway → Rounding the Mark → Flying Colors → Shipshape · Moored) as sessions accumulate. Auto-completes on target hit or inactivity. Run halyard voyage

to see the roster. Your Captain's Quarters on The Bridge shows a Passport — one stamp per AI tool you've used.

Halyard is a Python 3.11+ local-first CLI and dashboard, not a hosted billing service. The halyard

command is a Typer app, reports use Rich, the terminal dashboard uses Textual, and The Bridge is a small 127.0.0.1

HTTP server. The durable data model is plain text in the project folder; SQLite is only a rebuildable read-model cache for faster queries.

The capture pipeline is intentionally simple:

AI tool hook/importer/manual command
  -> normalized AiSession object
  -> append-focused ai-sessions.log line
  -> reports, ledger allocation, dashboard, invoice evidence

So: is Halyard "just looking at logs"? Not exactly. It uses the best public signal each tool exposes:

Claude Code: installsUserPromptSubmit

andStop

hooks. The start hook records session start time and git SHA. The stop hook reads the structured hook payload; for newer Claude Code formats it can also aggregate token/model metadata from the local transcript JSONL path passed by the hook.Cursor: installsbeforeSubmitPrompt

andstop

hooks. It reads the stop payload and prefersworkspace_roots

for attribution because that is the actual editor workspace, not necessarily the shell CWD.Gemini CLI: installsSessionStart

,AfterModel

, andAfterAgent

hooks.AfterModel

accumulates token usage fromusageMetadata

;AfterAgent

finalizes the session. It integrates deeply with OpenTelemetry (OTLP) to measure exact API and tool-execution durations (api_seconds

,tool_seconds

) and enriches from Gemini's local history file for accurate multi-model token breakdowns, tool-call counts, and deterministic cost.Codex Desktop: imports local~/.codex/sessions/.../rollout-*.jsonl

files, extracts timing/model/token metadata, and records imported session IDs so repeated imports do not duplicate entries.VS Code / GitHub Copilot: a local VS Code extension (vscode-extension/

) tracks active editing time, captures branch and code-delta via git, and writes sessions throughhalyard record-session

. Install the extension from a local.vsix

build; no public Copilot session hook exists yet so token counts are not available.

Every collector writes the same normalized record shape: timestamps, tool, model, tokens when available, cache tokens, cost, billing type, project, branch, capture source, and attribution provenance. Halyard does not store prompts, source code, file contents, or full transcripts in ai-sessions.log

. When a collector temporarily reads a local transcript or history file, it is only to extract session metadata.

The log is append-focused. Session records are s ...

lines; corrections are separate amendment records keyed by a hash of the original line. Writers hold an exclusive OS-level file lock (fcntl.flock

on POSIX, msvcrt.locking

on Windows) so concurrent hooks do not interleave writes. Malformed records are quarantined instead of crashing report generation.

Cost handling is explicit about trust. Direct API usage can be captured or calculated from tokens and the local pricing table. Seat or credit plans are allocated at report time from ai-plans.toml

by active minutes, session count, or credits. Reports label the result as captured, calculated, allocated, inferred, mixed, or unallocated so client-facing evidence does not pretend an estimate is a measurement.

Platform:macOS, Linux, and Windows. Thehalyard service install

command is macOS-only; other platforms can runhalyard dashboard

in a long-lived terminal instead.

pipx install halyard
cd ~/businesses/my-freelance
halyard init

halyard setup

halyard doctor --first-capture

halyard dashboard
halyard start acme/auth-migration
halyard stop

halyard tui

halyard log "what did I spend this month?"   # from the CLI (local by

halyard invoice acme --month 2026-05 --include-ai-evidence

View Full Command Reference

halyard usage --range 30d
halyard usage --range 7d --json   # machine-readable

halyard report --all --json | jq '.totals.cost_usd'
halyard budget --json | jq '.[] | select(.month.state=="over")'   # spend gate

halyard service install

pipx inject halyard mcp   # add the MCP extra to the pipx install
halyard mcp            # stdio MCP server for Claude Code / Cursor

halyard install-hook          # Claude Code
halyard install-cursor-hook   # Cursor
halyard install-gemini-hook   # Gemini CLI
halyard install-vscode-tasks  # VS Code manual capture task

halyard record-session --tool vscode --model github-copilot --minutes 15 --note "Copilot chat"

halyard import-gemini

halyard set-budget acme --daily 10.00 --monthly 200.00

halyard report --ledger

halyard confirm-attribution

halyard update-pricing

halyard honors
halyard voyage

See docs/demo.md for a full walkthrough — self-guided and live presentation script in one document. If capture does not show up, start with

.

docs/troubleshooting.md

halyard mcp

runs a read-only MCP server over stdio so Claude Code, Cursor, or any MCP client can query your local ledger in-context — e.g. "how much did I spend this week?", "did this work ship?", "what's my adrift rate?". It exposes work_summary

, sessions

, spend_in_range

, project_breakdown

, cost_by_model

, and outcomes_status

. No tool writes anything; only metadata already in the ledger is returned (never prompts, code, or transcripts). It needs the optional extra (pipx inject halyard mcp

, or pipx install 'halyard[mcp]'

from a fresh install).

halyard init

(and halyard setup

) auto-registers it with every MCP client detected on your PATH — Claude Code, Cursor, Gemini CLI — so you never edit a config file. Re-run any time, or register one client explicitly:

halyard install-mcp-claude     # or -cursor / -gemini

It writes a single mcpServers.halyard

entry to each client's config (~/.claude.json

, ~/.cursor/mcp.json

, ~/.gemini/settings.json

), preserving every other server. The Halyard repo also ships a ready .mcp.json for Claude Code zero-config pickup in-repo:

{ "mcpServers": { "halyard": { "command": "halyard", "args": ["mcp"] } } }

Tool	How it's captured	Status
Claude Code	`Stop` hook — fires on every session end
Shipped
Cursor	`stop` hook — fires when agent completes
Shipped
Gemini CLI	`SessionStart` / `AfterModel` / `AfterAgent` hooks + history file enrichment
Shipped
Codex Desktop	JSONL session importer	Shipped
VS Code / GitHub Copilot	VS Code task + `record-session --tool vscode` ; no public Copilot hook yet
Manual capture
Windsurf	TBD	Future
OpenAI API direct	SDK wrapper or proxy	Future

Gemini CLI sessions include per-model token breakdowns (flash vs. pro vs. thinking), tool call counts, and accurate multi-model cost — derived from the same history file Gemini CLI uses for its own shutdown summary.

Tools that are not written in Python can emit sessions directly to the local Hub:

halyard spec
samples/emit-session.sh

The Hub accepts POST http://127.0.0.1:4318/v1/ingest

with either a raw canonical ai-sessions.log

line or a structured fields

object. Structured payloads must include start

, end

, tool

, model

, input_tokens

, output_tokens

, and cost_usd

; optional metadata keys are the same keys shown by halyard spec

.

curl -X POST http://127.0.0.1:4318/v1/ingest \
  -H "Content-Type: application/json" \
  -d '{"fields":{"start":"2026-05-23T10:00:00","end":"2026-05-23T10:05:00","tool":"custom-tool","model":"model-x","input_tokens":100,"output_tokens":50,"cost_usd":0.01}}'

Per session (one line in ai-sessions.log

):

Start and end time
Tool ( claude-code

,cursor

,gemini-cli

,vscode

, …) - Model identifier

Input tokens, output tokens, cache read/write
Cost in USD (from local pricing table, snapshotted at capture)
Project attribution ( client:project

) - Git branch

Billing model ( api

,credits

,seat

) - Capture source ( hook

,sdk

,manual

)

What is not captured: prompt content, code context, file contents, any user data beyond session metadata.

Set per-project spend limits in your personal ~/.halyard/budgets.toml

— never committed to the repo. Warnings fire at session start when you've exceeded a daily or monthly threshold. Sessions always proceed; this is instrumentation, not a gate.

halyard set-budget acme --daily 15.00 --monthly 300.00
halyard budget   # shows current spend vs limits across all projects
my-business/
├── halyard.toml          # business name, currency, invoice counter
├── clients.toml          # array of clients
├── projects.toml         # array of projects
├── time.timeclock        # hledger-compatible human time log
├── ai-sessions.log       # AI usage events (plain text, append-focused)
├── ai-plans.toml         # seat/credit plan definitions for cost allocation
└── invoices/             # generated invoice markdown + PDF

Agent state (hooks, API keys, budgets, active timer) lives in ~/.halyard/

, separate from the project folder.

Some captured sessions don't carry token data. The dashboard flags them as missing tokens

and halyard usage

reports them under token_data_missing_sessions

. Common reasons:

Claude Code seat sessions— when a session is on a credits/seat plan rather than the API, token counts may not be available in the hook payload. The session is still recorded withtokens_available=false

; cost is reported as$0.00

because there is no per-session API charge. Usethe AI Work Ledger(halyard report --ledger

) to see allocated subscription cost for these.Manual or VS Code editor-task captures— sessions recorded viahalyard record-session

without an explicit--input-tokens

/--output-tokens

arrive without token data. This is expected; the capture is preserved for time/attribution.Codex imports— the Codex Desktop import path reads from the local rollout JSONL but may not surface token counts in all session formats. Import is still useful for attribution and time.

These rows are not quarantined and not lost — they appear in the log with tokens_available=false

so downstream tooling can distinguish "no data" from "zero usage".

Halyard records two cost classes:

Direct API cost(cost_usd

on the session line) — per-call charge from the provider, captured when the hook payload includes pricing.Allocated subscription cost— your $20/month Claude Pro or $200/month Claude Max is allocated across captured sessions proportionally. Runhalyard report --ledger

to see this. Cost trust labels in the dashboard (captured

/calculated

/allocated

) tell you which lens you're looking through.

If cost_usd

is consistently $0.00 and you have no ai-plans.toml

, create one in the project folder (define your seat/credit plans) so halyard report --ledger

can allocate subscription cost.

That's the same token_data_missing_sessions

count surfaced as a pill on the Usage Analytics panel. It's a counter, not an error — see "missing tokens" above. If the number is climbing, check that your active AI tool's hook is sending token data (halyard doctor

confirms hook health) and that recent sessions on the API plan aren't being dropped.

This project uses OpenSpec for spec-driven development. Every feature lives as a change folder under openspec/changes/

with a proposal, specs, design, and task checklist.

Versioning note:the published PyPI package is0.x

(currently0.2.1

). ThevN.x

identifiers below (e.g.v2.24

) are internal OpenSpec changeset IDs,notrelease versions — they track design history, not what youpipx install

.

Change	Description
-	No active focus; all current changes shipped.

Change	Description
`v3.6-windsurf-collector`

v3.5-claude-code-surface

v0-time-and-invoice

halyard init

, human time tracking, invoice generationv0.1-log-and-invoice

halyard log

natural-language query + halyard invoice

v0.2-ai-agent-loop

halyard log

v0.3-provider-neutral-log

halyard log --agent openai

v1-ai-intelligence

v1.5-multi-tool-collectors

v2-ai-work-ledger

confirm-attribution

, invoice evidence appendixv2-local-activity-dashboard

halyard dashboard

)v2.1-dynamic-pricing

halyard update-pricing

— live pricing table syncv2.2-budget-limits

v2.3-gemini-history

halyard import-gemini

v2.4-data-integrity

v2.5-cli-decoupling

v2.6-rich-session-telemetry

v2.7-ai-work-health

v2.8-calendar-blocks

v2.9-onboarding-doctor

halyard doctor

setup diagnosisv2.10-guided-setup

v2.11-hook-normalization

v2.12-glass-cockpit-service

v2.13-backtracking-attribution

v2.14-sqlite-read-model

v2.15-transaction-history

v4-tui

halyard tui

)v2.16-distribution-and-security

v2.17-log-integrity

v2.18-cache-and-audit-hardening

v2.20-security-remediation

v2.21-attribution-provenance

attr_method

) for billing and audit clarityv2.22-security-architecture

v2.23-usage-analytics

v2.24-outcome-metadata

halyard outcome sync

)v2.25-honors-and-achievements

halyard honors

)v2.26-passport-and-friends

halyard voyage

)| Change | Description | |---|---| v3.0-outcome-graph |

v3-org-admin-dashboard

Shipped— OSS launch (v0.2.1), honors and achievements (halyard honors

), Passport stamps, Friends of the Sea voyage stages (halyard voyage

), outcome metadata v2.24: branch field, commit count, code delta,halyard outcome sync

, and Claude Code client-surface detection (v3.5).Then— Attestable AI work appendix (v2.19): signed, client-safe proof of AI-assisted work, enriched with commit and PR signals.Later, if design partners ask— Outcome graph (v3.0): connect sessions to commits, PRs, tests, and deliverables.** Further out**— Redacted sync, org rollups, governance, finance exports, and enterprise reporting.

These hold at every tier:

Local-first. The core product runs offline. Cloud is optional and additive.Plain text forever. Your data is yours, in formats that outlast any startup.Files are the source of truth. No hidden state, no proprietary database.Append-only direction. New sessions are appended. Corrections are explicit and auditable; attribution cleanup is being hardened toward append-only correction records.No silent writes. Every AI-proposed change is shown before it's applied.MIT licensed. Permissively. Forever.

Early but open. The project uses OpenSpec for spec-driven development — every feature has a proposal.md

, design.md

, specs/

, and tasks.md

before code is written. See openspec/changes/

for what is actively being built.

To contribute:

Browse openspec/changes/

for open changesets. - Check the tasks.md

in any changeset for unchecked items. - Open an issue before a PR if you are proposing a new feature — start with a one-paragraph proposal so we can align on fit before you write code.

Bug reports and docs improvements need no prior discussion.
The test suite is pytest

; coverage requirements are enforced. Runpython -m pytest

before submitting.

If something is confusing, a docs issue is as valuable as a code PR.

MIT.

A Kormilo LLC project.

source & further reading

github.com — original article

Halyard – open AI work ledger for developers (time, tokens, cost, invoices)

Run your AI side-project on zahid.host