cd /news/large-language-models/show-hn-suture-a-reverse-proxy-that-… · home topics large-language-models article
[ARTICLE · art-21079] src=github.com pub= topic=large-language-models verified=true sentiment=· neutral

Show HN: Suture – a reverse proxy that repairs truncated JSON in LLM streams

Suture, an ultra-low-latency reverse proxy, repairs truncated and malformed JSON in LLM streaming responses on the fly to prevent JSONDecodeError and similar parsing failures. The tool sits between applications and providers like OpenAI, Anthropic, and Google Vertex AI, emitting missing characters to make reassembled JSON valid without buffering the stream or adding meaningful latency. Suture is available as a standalone binary or Rust library, requiring no SDK changes, retries, or regenerated tokens.

read4 min publishedJun 4, 2026

Ultra-low-latency reverse proxy that repairs truncated and malformed JSON in LLM streaming responses, on the fly.

📝

The story:[Why your LLM tool calls silently break — and a ~10µs fix]

When an upstream LLM stream is cut off — by max_tokens

, a context-window limit, or a dropped socket — the JSON it was emitting (a tool call's arguments

, or structured-output content

) is left unterminated, and your application throws JSONDecodeError

/ serde_json

"EOF while parsing" errors. Suture sits between your app and the provider, watches the stream, and emits exactly the missing characters to make the reassembled JSON valid — without buffering the stream or adding meaningful latency.

A tool-call stream truncated at max_tokens

leaves your client reassembling invalid JSON:

// what the client reassembles from the delta events:
{"city": "Par      // ← unterminated → JSONDecodeError / serde_json: EOF while parsing a string

Suture closes it on the wire, so the client gets valid JSON instead — no SDK changes, no retry, no regenerated tokens:

{"city": "Par"}    // ← valid; the string and object are safely closed

You're in the right place if your LLM app has thrown any of these on a streaming response:

json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column …

json.decoder.JSONDecodeError: Expecting value: line 1 column … (char …)

serde_json::Error: EOF while parsing a string

/EOF while parsing an object

pydantic_core.ValidationError

on a truncated tool-callarguments

  • Tool / function-call arguments that won't parse when the model hits max_tokens

  • Truncated structured-output / JSON-mode content

across streamed deltas

…on OpenAI, Anthropic, Google Vertex AI (Gemini / Claude), or AWS Bedrock.

  • Repairs OpenAI(/v1/chat/completions

),Anthropic(/v1/messages

),GCP Vertex AI(Gemini + Claude-on-Vertex), and** AWS Bedrock**(ConverseStream

) streaming responses. SSE-aware— repairs thereassembledtool-call arguments / structured content accumulated across delta events, not just raw wire bytes.Streaming + compressed— transparently decodes gzip/brotli/deflate, repairs, and re-encodes per the client'sAccept-Encoding

; never buffers the whole body. Added overhead is ~10 µs per chunk.Holds no credentials— your provider API key / bearer token is forwarded verbatim.- The byte-level repair engine is usable as a standalone library: cargo add suture-repair

(thenuse suture::…

), or just the engine viacargo add suture-repair-core

(use suture_core::repair_str

).

cargo install suture-repair    # installs the `suture` binary; or: docker build -t suture .
suture                         # listens on 127.0.0.1:8787

Point your SDK's base URL at Suture (your API key still flows through):

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8787/v1", api_key=os.environ["OPENAI_API_KEY"])

Routes: POST /v1/chat/completions

→ OpenAI, POST /v1/messages

→ Anthropic, POST /v1/projects/*

→ Vertex, POST /model/*

→ Bedrock (each when enabled), GET /health

.

Three layers, each independently tested:

— a byte-level JSON repair state machine. Given any prefix of a valid JSON value, it computes the characters needed to close it (or reports that the input is inconsistent and should pass through untouched). No allocation beyond nesting depth.suture-core

— an incremental SSE parser + per-provider extractors that reassemble the JSON-bearing field across delta events, drive the core engine, and synthesize a closing event at stream end (before the terminator).suture-sse

— an axum/reqwest reverse proxy. Forwards your request verbatim, then on the response:suture

text/event-stream

is repaired via the SSE layer; a singleapplication/json

body is closed with the core engine; anything else streams through unchanged.

Env var Default Purpose
SUTURE_LISTEN
127.0.0.1:8787
listen address
SUTURE_OPENAI_BASE
https://api.openai.com
OpenAI upstream
SUTURE_ANTHROPIC_BASE
https://api.anthropic.com
Anthropic upstream
SUTURE_VERTEX_ENABLED
0
enable the Vertex route (host derived from the path)
SUTURE_VERTEX_BASE
optional Vertex upstream override
SUTURE_BEDROCK_ENABLED
0
enable the Bedrock route (host from the validated Host header)
SUTURE_BEDROCK_BASE
optional Bedrock upstream override

See deploy/ for a

Dockerfile

and Cloud Run, ECS/Fargate, and Kubernetes-sidecar manifests, plus operational notes (don't buffer the stream, TLS at the edge, health checks). The sidecar pattern (co-located, localhost) best matches the low-latency design.OpenAI, Anthropic, GCP Vertex AI, and AWS Bedrock (ConverseStream

) are supported, with transparent compression handling. Bedrock uses credential-free SigV4 passthrough — the client signs for the real Bedrock host and Suture forwards verbatim, so Suture never sees a reusable AWS secret (the secret key never leaves the client; only a per-request signature transits).

Dual-licensed under either of MIT or Apache-2.0, at your option. Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you shall be dual-licensed as above, without any additional terms.

── more in #large-language-models 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/show-hn-suture-a-rev…] indexed:0 read:4min 2026-06-04 ·