# Show HN: Papa – open-source Hemingway-style readability linting for Markdown

> Source: <https://github.com/bharadwaj-pendyala/papa>
> Published: 2026-07-04 00:03:25+00:00

Papa flags hard sentences, passive voice, adverbs, and complex phrasing with a
readability grade. It is an open-source **alpha** CLI and Python library that
can run locally or in CI, emit JSON for automation, and generate a self-contained
HTML report.

[ Quickstart](#-quickstart) ·

[·](#-ci-usage)

**CI usage**[·](#-llm-workflow)

**LLM workflow**[·](#-how-it-works)

**How it works**

**Roadmap**** View a sample HTML report** generated by
Papa. It highlights hard sentences, passive voice, adverbs, and complex phrases
with a readability grade panel. The report is self-contained and uses no
network assets.

Papa is currently **alpha software**. The supported surface is intentionally
small:

- CLI command:
`papa`

- Python package:
`papa-lint`

- Input formats: Markdown and plain text
- Reports: terminal, JSON, and self-contained HTML
- CI gating: use
`--max-grade`

and the process exit code

GitHub Action packaging, SARIF/Markdown reporters, config files, optional external linters, and built-in LLM rewriting are roadmap items.

The [Hemingway Editor](https://hemingwayapp.com) is useful, but it is closed,
manual, and not designed for docs pipelines. Other open-source writing tools
exist, but they tend to have separate output formats and workflows.

Papa focuses on a narrow first job: make prose readability checks scriptable.

**Scores readability** with ARI, Flesch-Kincaid, and Gunning fog.**Highlights like Hemingway** by marking hard sentences, passive voice, adverbs, and complex phrases.**Handles Markdown safely** by ignoring frontmatter, fenced code, inline code, and SVG while preserving offsets back to the original file.**Emits JSON** so agents, scripts, or dashboards can consume the findings.**Gates CI** by exiting non-zero when prose is harder than your max grade.

```
pipx install papa-lint

papa post.md
papa post.md --report html -o report.html
papa post.md --report json > findings.json
papa post.md --max-grade 10
```

Example terminal output:

```
post.md  -  Grade 11, hard to read
  ✗ fails --max-grade 10

  18 hard · 4 very hard · 6 passive · 9 adverbs · 12 complex
  ARI 11.2 · FK 10.8 · Fog 12.1  (142 sentences)

  L42   very hard  Very hard to read (grade 16): "Because the detour squeezes..."
  L58   passive    Passive voice: 'is measured'
```

Papa does not need a dedicated GitHub Action yet. Install it in your workflow
and let `--max-grade`

fail the job when prose crosses your threshold:

```
name: readability
on: pull_request
jobs:
  papa:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - run: python -m pip install papa-lint
      - run: papa README.md docs/*.md --max-grade 10
```

For local development, run the same command before opening a PR.

Papa does not call an LLM directly. It emits structured JSON that you can hand to an agent or script:

```
papa post.md --report json > findings.json
```

The JSON includes the file path, document score, and findings with offsets into the original source:

```
{
  "version": "0.1",
  "file": "post.md",
  "score": {
    "ari": 11.2,
    "flesch_kincaid": 10.8,
    "gunning_fog": 12.1,
    "reading_grade": "Grade 11, hard to read",
    "verdict": "fail"
  },
  "findings": [
    {
      "start": 1423,
      "end": 1490,
      "category": "passive",
      "message": "Passive voice: 'is measured'",
      "severity": "warn"
    }
  ]
}
```

See [docs/llm-contract.md](/bharadwaj-pendyala/papa/blob/main/docs/llm-contract.md) for a prompt pattern that uses
Papa findings to guide a rewrite.

``` php
input -> Ingestor -> Analyzers -> Aggregator -> Reporters
         strip code  readability  merge spans  terminal
         + SVG       passive      + scores     json
         offset map  adverbs                  html
                     complex phrase
```

**Ingest**: detect Markdown or text, strip non-prose, and preserve an offset map back to the original source.** Analyze**: run built-in analyzers for readability, passive voice, adverbs, and complex phrases.** Aggregate**: merge overlapping findings, compute document scores, and apply the optional max-grade gate.** Report**: render terminal, JSON, or HTML output and set the exit code.

``` python
from papa import analyze

result = analyze(open("post.md", encoding="utf-8").read(), path="post.md", max_grade=10)
print(result.score.reading_grade)
print(result.score.verdict)
```

- Config file support, likely
`papa.toml`

- GitHub Action wrapper
- SARIF and Markdown reporters for PR annotations and summaries
- Built-in
`--suggest`

workflow for LLM-assisted rewrites - MDX, HTML, and reStructuredText ingestion
- Optional integrations with tools such as
`proselint`

,`alex`

, and`vale`

- npm, Homebrew, and Docker distribution

| Papa alpha | Hemingway App | write-good | vale | |
|---|---|---|---|---|
| Readability grade | ✅ | ✅ | ❌ | ❌ |
| Sentence highlights | ✅ | ✅ | ||
| Markdown code-block awareness | ✅ | ❌ | ❌ | ✅ |
| CLI | ✅ | ❌ | ✅ | ✅ |
| CI gate via exit code | ✅ | ❌ | ✅ | ✅ |
| JSON for automation | ✅ | ❌ | ✅ | |
| Open source | ✅ | ❌ | ✅ | ✅ |

Papa currently uses [textstat](https://github.com/textstat/textstat) for
readability formulas and includes small built-in heuristics for the Hemingway
style findings. Future optional integrations may include
[proselint](https://github.com/amperser/proselint),
[write-good](https://github.com/btford/write-good),
[vale](https://github.com/errata-ai/vale), and
[alex](https://github.com/get-alex/alex).

Issues and PRs welcome. See [CONTRIBUTING.md](/bharadwaj-pendyala/papa/blob/main/CONTRIBUTING.md) and our
[Code of Conduct](/bharadwaj-pendyala/papa/blob/main/CODE_OF_CONDUCT.md). Good first issues are
[labeled](https://github.com/bharadwaj-pendyala/papa/labels/good%20first%20issue).

MIT © Bharadwaj Pendyala