cd /news/large-language-models/show-hn-kv-psi-using-linux-psi-to-to… · home topics large-language-models article
[ARTICLE · art-42138] src=github.com ↗ pub= topic=large-language-models verified=true sentiment=· neutral

Show HN: KV-psi, using Linux PSI to to trim an LLM KV cache

A developer released KV-psi, a reference implementation that uses Linux Pressure Stall Information (PSI) to trim an LLM KV cache under memory pressure. Benchmarks on an NVIDIA Jetson showed PSI-based trimming reduced KV cache size by up to 35% while maintaining throughput, compared to a fixed cache policy.

read1 min views1 publishedJun 27, 2026
Show HN: KV-psi, using Linux PSI to to trim an LLM KV cache
Image: source

PSI KV Governor is a small reference implementation for using Linux Pressure Stall Information to trim an LLM KV cache when the system is under memory pressure.

  • Linux with PSI enabled: cgroup memory.pressure

or/proc/pressure/memory

  • Python 3.10+
  • llama.cpp build dependencies for the runner
  • a GGUF model, for example models/SmolLM2-135M-Instruct-Q2_K.gguf

Check PSI:

cat /proc/pressure/memory
PYTHONPATH=src python benchmarks/pressure_bench.py --preflight-only

Run the reference simulator:

PYTHONPATH=src python -m psi_kv_governor.cli simulate

Build the llama.cpp runner:

scripts/build_llama_runner.sh

Download the small benchmark model if needed:

python scripts/download_demo_model.py

Run both variant orders. This matters because PSI avg10

, cache, and zram/swap state can carry over from the first pressure run into the second.

PYTHONPATH=src python benchmarks/pressure_bench.py \
  -c 2048 \
  -n 1536 \
  --keep 64 \
  --tail 256 \
  --min-prune 64 \
  --pressure-mib 6000 \
  --pressure-step-mib 1024 \
  --pressure-warmup-s 10 \
  --variant-cooldown-s 45 \
  --out-dir data/bench-pressure/fixed-first

PYTHONPATH=src python benchmarks/pressure_bench.py \
  --variant-order psi-first \
  -c 2048 \
  -n 1536 \
  --keep 64 \
  --tail 256 \
  --min-prune 64 \
  --pressure-mib 6000 \
  --pressure-step-mib 1024 \
  --pressure-warmup-s 10 \
  --variant-cooldown-s 45 \
  --out-dir data/bench-pressure/psi-first

Recent Jetson result:

run variant decoded tok/s prunes final KV external PSI some/full
fixed-first fixed 1536 94.00 0 1547 1.61/1.61
fixed-first PSI 1536 88.80 4 1291 4.14/3.94
psi-first PSI 1536 96.16 2 1004 2.46/2.33
psi-first fixed 1536 89.76 0 1547 5.56/5.56

Result directories:

data/bench-pressure/real-psi-6000m-1536tok-cooldown

data/bench-pressure/real-psi-6000m-1536tok-cooldown-psi-first

── more in #large-language-models 4 stories · sorted by recency
── more on @kv-psi 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/show-hn-kv-psi-using…] indexed:0 read:1min 2026-06-27 ·