Show HN: KV-psi, using Linux PSI to to trim an LLM KV cache A developer released KV-psi, a reference implementation that uses Linux Pressure Stall Information (PSI) to trim an LLM KV cache under memory pressure. Benchmarks on an NVIDIA Jetson showed PSI-based trimming reduced KV cache size by up to 35% while maintaining throughput, compared to a fixed cache policy. PSI KV Governor is a small reference implementation for using Linux Pressure Stall Information to trim an LLM KV cache when the system is under memory pressure. - Linux with PSI enabled: cgroup memory.pressure or /proc/pressure/memory - Python 3.10+ - llama.cpp build dependencies for the runner - a GGUF model, for example models/SmolLM2-135M-Instruct-Q2 K.gguf Check PSI: cat /proc/pressure/memory PYTHONPATH=src python benchmarks/pressure bench.py --preflight-only Run the reference simulator: PYTHONPATH=src python -m psi kv governor.cli simulate Build the llama.cpp runner: scripts/build llama runner.sh Download the small benchmark model if needed: python scripts/download demo model.py Run both variant orders. This matters because PSI avg10 , cache, and zram/swap state can carry over from the first pressure run into the second. PYTHONPATH=src python benchmarks/pressure bench.py \ -c 2048 \ -n 1536 \ --keep 64 \ --tail 256 \ --min-prune 64 \ --pressure-mib 6000 \ --pressure-step-mib 1024 \ --pressure-warmup-s 10 \ --variant-cooldown-s 45 \ --out-dir data/bench-pressure/fixed-first PYTHONPATH=src python benchmarks/pressure bench.py \ --variant-order psi-first \ -c 2048 \ -n 1536 \ --keep 64 \ --tail 256 \ --min-prune 64 \ --pressure-mib 6000 \ --pressure-step-mib 1024 \ --pressure-warmup-s 10 \ --variant-cooldown-s 45 \ --out-dir data/bench-pressure/psi-first Recent Jetson result: | run | variant | decoded | tok/s | prunes | final KV | external PSI some/full | |---|---|---|---|---|---|---| | fixed-first | fixed | 1536 | 94.00 | 0 | 1547 | 1.61/1.61 | | fixed-first | PSI | 1536 | 88.80 | 4 | 1291 | 4.14/3.94 | | psi-first | PSI | 1536 | 96.16 | 2 | 1004 | 2.46/2.33 | | psi-first | fixed | 1536 | 89.76 | 0 | 1547 | 5.56/5.56 | Result directories: data/bench-pressure/real-psi-6000m-1536tok-cooldown data/bench-pressure/real-psi-6000m-1536tok-cooldown-psi-first