Papa flags hard sentences, passive voice, adverbs, and complex phrasing with a readability grade. It is an open-source alpha CLI and Python library that can run locally or in CI, emit JSON for automation, and generate a self-contained HTML report.
CI usage·
LLM workflow·
How it works
Roadmap** View a sample HTML report** generated by Papa. It highlights hard sentences, passive voice, adverbs, and complex phrases with a readability grade panel. The report is self-contained and uses no network assets.
Papa is currently alpha software. The supported surface is intentionally small:
-
CLI command:
papa -
Python package:
papa-lint -
Input formats: Markdown and plain text
-
Reports: terminal, JSON, and self-contained HTML
-
CI gating: use
--max-grade
and the process exit code
GitHub Action packaging, SARIF/Markdown reporters, config files, optional external linters, and built-in LLM rewriting are roadmap items.
The Hemingway Editor is useful, but it is closed, manual, and not designed for docs pipelines. Other open-source writing tools exist, but they tend to have separate output formats and workflows.
Papa focuses on a narrow first job: make prose readability checks scriptable.
Scores readability with ARI, Flesch-Kincaid, and Gunning fog.Highlights like Hemingway by marking hard sentences, passive voice, adverbs, and complex phrases.Handles Markdown safely by ignoring frontmatter, fenced code, inline code, and SVG while preserving offsets back to the original file.Emits JSON so agents, scripts, or dashboards can consume the findings.Gates CI by exiting non-zero when prose is harder than your max grade.
pipx install papa-lint
papa post.md
papa post.md --report html -o report.html
papa post.md --report json > findings.json
papa post.md --max-grade 10
Example terminal output:
post.md - Grade 11, hard to read
✗ fails --max-grade 10
18 hard · 4 very hard · 6 passive · 9 adverbs · 12 complex
ARI 11.2 · FK 10.8 · Fog 12.1 (142 sentences)
L42 very hard Very hard to read (grade 16): "Because the detour squeezes..."
L58 passive Passive voice: 'is measured'
Papa does not need a dedicated GitHub Action yet. Install it in your workflow
and let --max-grade
fail the job when prose crosses your threshold:
name: readability
on: pull_request
jobs:
papa:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: python -m pip install papa-lint
- run: papa README.md docs/*.md --max-grade 10
For local development, run the same command before opening a PR.
Papa does not call an LLM directly. It emits structured JSON that you can hand to an agent or script:
papa post.md --report json > findings.json
The JSON includes the file path, document score, and findings with offsets into the original source:
{
"version": "0.1",
"file": "post.md",
"score": {
"ari": 11.2,
"flesch_kincaid": 10.8,
"gunning_fog": 12.1,
"reading_grade": "Grade 11, hard to read",
"verdict": "fail"
},
"findings": [
{
"start": 1423,
"end": 1490,
"category": "passive",
"message": "Passive voice: 'is measured'",
"severity": "warn"
}
]
}
See docs/llm-contract.md for a prompt pattern that uses Papa findings to guide a rewrite.
input -> Ingestor -> Analyzers -> Aggregator -> Reporters
strip code readability merge spans terminal
+ SVG passive + scores json
offset map adverbs html
complex phrase
Ingest: detect Markdown or text, strip non-prose, and preserve an offset map back to the original source.** Analyze**: run built-in analyzers for readability, passive voice, adverbs, and complex phrases.** Aggregate**: merge overlapping findings, compute document scores, and apply the optional max-grade gate.** Report**: render terminal, JSON, or HTML output and set the exit code.
from papa import analyze
result = analyze(open("post.md", encoding="utf-8").read(), path="post.md", max_grade=10)
print(result.score.reading_grade)
print(result.score.verdict)
-
Config file support, likely
papa.toml -
GitHub Action wrapper
-
SARIF and Markdown reporters for PR annotations and summaries
-
Built-in
--suggest
workflow for LLM-assisted rewrites - MDX, HTML, and reStructuredText ingestion
- Optional integrations with tools such as
proselint
,alex
, andvale
- npm, Homebrew, and Docker distribution
| Papa alpha | Hemingway App | write-good | vale | |
|---|---|---|---|---|
| Readability grade | ✅ | ✅ | ❌ | ❌ |
| Sentence highlights | ✅ | ✅ | ||
| Markdown code-block awareness | ✅ | ❌ | ❌ | ✅ |
| CLI | ✅ | ❌ | ✅ | ✅ |
| CI gate via exit code | ✅ | ❌ | ✅ | ✅ |
| JSON for automation | ✅ | ❌ | ✅ | |
| Open source | ✅ | ❌ | ✅ | ✅ |
Papa currently uses textstat for readability formulas and includes small built-in heuristics for the Hemingway style findings. Future optional integrations may include proselint, write-good, vale, and alex.
Issues and PRs welcome. See CONTRIBUTING.md and our Code of Conduct. Good first issues are labeled.
MIT © Bharadwaj Pendyala