Show HN: KV-psi, using Linux PSI to to trim an LLM KV cache

wpnews.pro

cd /news/large-language-models/show-hn-kv-psi-using-linux-psi-to-to… · home › topics › large-language-models › article

[ARTICLE · art-42138] src=github.com ↗ pub=2026-06-27T22:50Z topic=large-language-models verified=true sentiment=· neutral

Show HN: KV-psi, using Linux PSI to to trim an LLM KV cache

A developer released KV-psi, a reference implementation that uses Linux Pressure Stall Information (PSI) to trim an LLM KV cache under memory pressure. Benchmarks on an NVIDIA Jetson showed PSI-based trimming reduced KV cache size by up to 35% while maintaining throughput, compared to a fixed cache policy.

read1 min views1 publishedJun 27, 2026

Show HN: KV-psi, using Linux PSI to to trim an LLM KV cache — Image: source

PSI KV Governor is a small reference implementation for using Linux Pressure Stall Information to trim an LLM KV cache when the system is under memory pressure.

Linux with PSI enabled: cgroup memory.pressure

or/proc/pressure/memory

Python 3.10+
llama.cpp build dependencies for the runner
a GGUF model, for example models/SmolLM2-135M-Instruct-Q2_K.gguf

Check PSI:

cat /proc/pressure/memory
PYTHONPATH=src python benchmarks/pressure_bench.py --preflight-only

Run the reference simulator:

PYTHONPATH=src python -m psi_kv_governor.cli simulate

Build the llama.cpp runner:

scripts/build_llama_runner.sh

Download the small benchmark model if needed:

python scripts/download_demo_model.py

Run both variant orders. This matters because PSI avg10

, cache, and zram/swap state can carry over from the first pressure run into the second.

PYTHONPATH=src python benchmarks/pressure_bench.py \
  -c 2048 \
  -n 1536 \
  --keep 64 \
  --tail 256 \
  --min-prune 64 \
  --pressure-mib 6000 \
  --pressure-step-mib 1024 \
  --pressure-warmup-s 10 \
  --variant-cooldown-s 45 \
  --out-dir data/bench-pressure/fixed-first

PYTHONPATH=src python benchmarks/pressure_bench.py \
  --variant-order psi-first \
  -c 2048 \
  -n 1536 \
  --keep 64 \
  --tail 256 \
  --min-prune 64 \
  --pressure-mib 6000 \
  --pressure-step-mib 1024 \
  --pressure-warmup-s 10 \
  --variant-cooldown-s 45 \
  --out-dir data/bench-pressure/psi-first

Recent Jetson result:

run	variant	decoded	tok/s	prunes	final KV	external PSI some/full
fixed-first	fixed	1536	94.00	0	1547	1.61/1.61
fixed-first	PSI	1536	88.80	4	1291	4.14/3.94
psi-first	PSI	1536	96.16	2	1004	2.46/2.33
psi-first	fixed	1536	89.76	0	1547	5.56/5.56

Result directories:

data/bench-pressure/real-psi-6000m-1536tok-cooldown

data/bench-pressure/real-psi-6000m-1536tok-cooldown-psi-first

source & further reading

github.com — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/show-hn-kv-psi-using-lin…

Read original on github.com → github.com/infiniteregrets/kv-psi

mentioned entities

KV-psi

PSI KV Governor

Linux

llama.cpp

SmolLM2-135M-Instruct-Q2_K.gguf

NVIDIA Jetson

metadata

slugshow-hn-kv-psi-using-linux-psi-to-to-trim-an-llm-kv-cache

topic#large-language-models

secondary2 topics

sentimentneutral

canonicalgithub.com

navigation

← prevThe psychology behind AI fueled …

next →How a Seemingly Harmless Image C…

── more in #large-language-models 4 stories · sorted by recency

dev.to · 27 Jun · #large-language-models

Building a RAG System from Scratch — Wrap-up and What Comes Next

dev.to · 27 Jun · #large-language-models

Building a RAG System from Scratch — Cloud Deployment with Render and Supabase

dev.to · 27 Jun · #large-language-models

Local AI - How to Run Open Source AI Models Locally

dev.to · 27 Jun · #large-language-models

Building an Autonomous AI Agent: From Zero to Production in 2026

── more on @kv-psi 3 stories trending now

wpnews · 25 May · #artificial-intelligence

Maia-3: free and open source

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 28 May · #ai-startups

[AINews] Cognition raises $1B in $26B Series D

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required