Show HN: Promptloop – create, run, and improve prompt evals from the terminal

wpnews.pro

cd /news/ai-tools/show-hn-promptloop-create-run-and-im… · home › topics › ai-tools › article

[ARTICLE · art-17826] src=github.com ↗ pub=2026-05-29T16:06Z topic=ai-tools verified=true sentiment=↑ positive

Show HN: Promptloop – create, run, and improve prompt evals from the terminal

Promptloop, a new interactive CLI tool built on LangChain's deepagents framework, enables developers to create, run, and improve prompt evaluations entirely from the terminal. The tool saves methodology, test cases, reports, prompt history, and chat checkpoints under a `.evals/` directory in the target project, supporting metrics including latency, JSON schema validation, fuzzy matching, and LLM judge scoring. Promptloop allows users to register prompts, add test cases, run evaluations, generate failure analysis reports, and approve prompt diffs without leaving the command line.

read3 min views21 publishedMay 29, 2026

An interactive CLI agent for the full prompt-eval loop: create test cases, run evals, generate reports, and approve prompt diffs without leaving your terminal.

Built on LangChain deepagents.

Agent harnesses are getting better, but prompts still shape what they do. promptloop turns a prompt and eval intent into a repeatable loop:

It saves the methodology, test cases, reports, prompt history, and chat checkpoints under .evals/

in the target project.

.evals/
  prompts/        # registered prompts + version history
  test_cases/     # per-prompt test suites
  eval_configs/   # methodology (metrics, models, judges)
  results/        # eval runs and reports
  chat.db         # SQLite checkpoint of conversation threads

Example metrics:

latency

: response timejson_schema

: validates structured outputfuzzy_match

: compares text similarityllm_judge

: scores output with a judge prompt

Suppose your project has a prompt at prompts/summarize.md

Summarize the user's note in three bullets.
Return JSON.

Start promptloop and describe the behavior you want to test:

$ uv run promptloop --project-dir ~/work/notes-app

promptloop> Evaluate the prompt at prompts/summarize.md.

Registered prompt 'summarize' (v1)
Source: /Users/me/work/notes-app/prompts/summarize.md

promptloop> Add a test case where the note includes action items, dates, and unrelated chatter.

Added test case 'tc_action_items' for prompt 'summarize'
(metrics: json_schema, llm_judge).

promptloop> Run the eval.

Run complete - ID: run_20260529_091214_a3f2
Results: 2 passed / 1 failed / 3 total
Avg latency: 1840ms
Max concurrency: 3

  passed [tc_basic_summary] anthropic:claude-sonnet-4-6
    json_schema: valid JSON matching schema | llm_judge: 0.86
  failed [tc_action_items] anthropic:claude-sonnet-4-6
    json_schema: schema mismatch: 'action_items' is a required property
  passed [tc_noise] anthropic:claude-sonnet-4-6
    json_schema: valid JSON matching schema | llm_judge: 0.82

Ask for a fix, and promptloop proposes a diff instead of editing blindly:

promptloop> Propose a prompt change for the failing action-items case.

Proposed changes to 'summarize' from v1:
--- summarize (current)
+++ summarize (proposed)
@@
-Summarize the user's note in three bullets.
-Return JSON.
+Summarize the user's note in three bullets.
+If the note contains follow-up tasks, extract them into an action_items array.
+Each action item should include a task, owner if mentioned, and due_date if mentioned.
+
+Return only valid JSON with this shape:
+{
+  "summary": ["...", "...", "..."],
+  "action_items": [
+    {"task": "...", "owner": "...", "due_date": "..."}
+  ]
+}

It also generates a report you can inspect before approving the change:


**Run:** run_20260529_091214_a3f2
**Models:** anthropic:claude-sonnet-4-6
**Pass rate:** 67% (2/3)
**Avg latency:** 1840ms

## Failure Analysis

The action-items case failed because the prompt only requested "three bullets"
and "JSON"; it did not define a required JSON shape or explain how to handle
dates, owners, and follow-up tasks.

## Recommendations

1. Add an explicit `action_items` field to the schema.
2. Tell the model to preserve due dates and owners when present.
3. Require JSON-only output so downstream parsing is stable.
git clone <this repo>
cd promptloop
uv sync
uv run promptloop --project-dir /path/to/your/project

You'll get an interactive chat. Try things like:

"Evaluate the prompt atsrc/prompts/summarize.txt

""Add three more test cases for edge cases"**"Re-run withopenai:gpt-4o-mini

and compare to the last run""Propose a fix for the failing JSON schema cases"

Command	Description
`/help`
Show help
`/clear`
Start a new conversation thread
`/threads`
List saved threads
`/thread <id>`
Switch to a thread in-session
`/quit`
Exit

Resume past sessions with promptloop --thread <id>

. Press Esc to interrupt a streaming response.

The agent has a small set of typed tools on top of deepagents' filesystem access:

register_prompt

,propose_prompt_changes

,apply_prompt_changes

,show_prompt_history

add_test_case

,infer_json_schema

,save_eval_config

run_eval

,list_eval_runs

generate_report

,read_report

,compare_runs

For more detail on the agent runtime behind this project, see The Harness Behind Deep Agent.

Early / experimental. Feedback and issues welcome.

source & further reading

github.com — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/show-hn-promptloop-creat…

Read original on github.com → github.com/Bella3202019/promptloop

mentioned entities

Promptloop

LangChain

deepagents

metadata

slugshow-hn-promptloop-create-run-and-improve-prompt-evals-from-the-terminal

topic#ai-tools

secondary4 topics

sentimentpositive

canonicalgithub.com

navigation

← prevPioneering the Agentic Shift Wit…

next →Show HN: Oort – A prompt library…

── more in #ai-tools 4 stories · sorted by recency

cryptobriefing.com · 14 Jul · #ai-tools

Visa unveils AI Financial Assistant that turns transaction history into a conversation

github.com · 14 Jul · #ai-tools

Show HN: Town – Discord in a pixel town where the NPCs have skills

startupfortune.com · 14 Jul · #ai-tools

StepFun Unveils the First Agentic AI Phone Ahead of Apple and OpenAI

machinebrief.com · 14 Jul · #ai-tools

Vacation Rental Listings with AI

── more on @promptloop 3 stories trending now

wpnews · 23 May · #artificial-intelligence

AccessLens — a blind person's lanyard, powered by Gemma 4 on-device

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 21 May · #developer-tools

Antigravity CLI: A Hands-On Guide to Google's Terminal Coding Agent

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required