cd /news/large-language-models/resurrecting-kepler-getting-modern-l… Β· home β€Ί topics β€Ί large-language-models β€Ί article
[ARTICLE Β· art-41862] src=dev.to β†— pub= topic=large-language-models verified=true sentiment=↑ positive

Resurrecting Kepler: Getting Modern LLMs Running on a GTX 770 (Kernel 7.x)

An engineer patched NVIDIA's proprietary driver to run modern LLM inference on a GeForce GTX 770 (Kepler) GPU with a Linux 7.x kernel. The fix involves a five-byte binary patch to libcuda.so that bypasses a buffer-size negotiation bug causing cuInit to fail with error 802. The project demonstrates that Kepler GPUs, abandoned by NVIDIA after driver 470.256.02, remain viable for small-to-medium LLM workloads, reducing e-waste.

read4 min views1 publishedJun 27, 2026

⚠️ Experimental hack: Use on non-critical systems. Ensure you have backups. This patches a proprietary binary at the instruction level β€” no warranty, no support.

Kepler GPUs (2012–2014) are e-waste by NVIDIA's timeline, but they are perfectly capable hardware for inference workloads. The GTX 770 has 1536 CUDA cores and 2 GB GDDR5 β€” enough for small-to-medium LLMs. This project proves that with a five-byte fix and some kernel backports, these GPUs can be kept useful on modern Linux systems, reducing e-waste and teaching real systems engineering along the way.

Keep an NVIDIA GeForce GTX 770 (GK104, sm_30) β€” a Kepler GPU abandoned by NVIDIA's driver stack after driver 470.256.02 and CUDA 10.2 β€” running CUDA workloads on a modern Linux kernel (6.15 β†’ 7.x, Ubuntu 26.04).

Two problems made stock software a dead end:

cuInit

returns error 802nvidia-smi

works, every CUDA program fails with CUDA_ERROR_SYSTEM_NOT_YET_INITIALIZED

.The proprietary 470.256.02 driver source does not build against kernels β‰₯6.15 due to removed/renamed APIs. I used community-sourced patch sets (primarily from Fedora/Debian packaging by Joan Bruguera Mico and Andreas Beckmann) to resolve issues like:

screen_info

β†’ sysfb_primary_display.screen

del_timer_sync

β†’ timer_delete_sync

follow_pfn

β†’ unsafe_follow_pfn

dma_fence_signal

now returns voidefi_enabled

cast and UBSAN mismatchesAfter these backports, nvidia-smi

reports the GTX 770 correctly. But cuInit

still fails.

cuInit

Error 802 All rm_ioctl

kernel calls return NV_OK

β€” the kernel module is fine. The failure lives in userspace. With gdb

, I traced cuInit

calling rm_ioctl(0x2a)

twice; both calls succeed at the kernel level, yet the library still returns 802.

Disassembly of the RM response handler in libcuda.so.470.256.02

:

3436a0: mov   0xc(%rsp),%eax      ; load status from RM response
3436a4: cmp   $0x2,%eax           ; status == 2?
3436a7: je    3436f0              ; β†’ return 802
3436a9: jbe   3436e0              ; status <= 1?
3436e0: cmp   $0x1,%eax
3436e3: jne   3436c5              ; status != 1 β†’ return 999
3436e5: xor   %eax,%eax           ; return 0 (success)
...
3436f0: add   $0x18,%rsp
3436f4: mov   $0x322,%eax         ; return 802
3436f9: pop; ret

Root cause: The Resource Manager firmware on Kepler returns status code 2

(NV_ERR_BUFFER_TOO_SMALL

) for the second initialization rm_ioctl

. The library treats 1

and 4

as success, but 2

is fatal β†’ 802. Likely a buffer-size negotiation mismatch between the GTX 770's VBIOS firmware and the final 470.x userspace library. NVIDIA never fixed it because Kepler was already on legacy support.

The fix: One instruction at offset 0x3436f4

. Instead of mov $0x322, %eax

(return 802), return 0:

Bytes Instruction
Before b8 22 03 00 00
mov $0x322, %eax
After 31 c0 90 90 90
xor %eax, %eax; nop; nop; nop

Subsequent rm_ioctl

calls succeed β€” only this specific init ioctl is broken. Patch script:

#!/usr/bin/env python3
import shutil, os

libpath = "/usr/lib/x86_64-linux-gnu/libcuda.so.470.256.02"
backup_path = libpath + ".bak"

if not os.path.exists(backup_path):
    shutil.copy2(libpath, backup_path)

with open(libpath, "rb") as f:
    data = bytearray(f.read())

offset = 0x3436f4
expected = bytes([0xb8, 0x22, 0x03, 0x00, 0x00])
actual = data[offset:offset+5]

if actual == expected:
    data[offset:offset+2] = bytes([0x31, 0xc0])
    data[offset+2:offset+5] = bytes([0x90, 0x90, 0x90])
    print(f"Patched: {actual.hex()} -> {data[offset:offset+5].hex()}")
elif actual[:2] == bytes([0x31, 0xc0]):
    print("Already patched!")
else:
    print(f"UNEXPECTED at 0x{offset:x}: {actual.hex()}")
    exit(1)

with open(libpath, "wb") as f:
    f.write(data)

sm_30 support was dropped in CUDA 11, so we need CUDA 10.2's ptxas

. But nvcc

rejects GCC 15 (Ubuntu 26.04 default). clang++ bridges legacy CUDA 10.2 headers and modern system libraries.

llama.cpp uses cg::this_grid()

(CUDA 11+). Patched softmax.cu

for CUDA 10.2:

// Before (CUDA >= 11.0):
const cg::grid_group g = cg::this_grid();

// After (CUDA < 11.00):
const cg::thread_block g = cg::this_thread_block();

Build flags:

cmake .. -DLLAMA_CUDA=ON \
  -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ \
  -DCUDAToolkit_ROOT=/usr/local/cuda-10.2 \
  -DCMAKE_CUDA_COMPILER=clang++ \
  -DCMAKE_CUDA_ARCHITECTURES=30 \
  -DGGML_CUDA_GRAPHS=OFF

-DGGML_CUDA_GRAPHS=OFF

is critical β€” CUDA graph capture requires sm_35+ and crashes on sm_30.

Hardware: GTX 770 (2 GB VRAM), Ubuntu 26.04, kernel 7.0.0-27, llama.cpp c16c35b81.

Quant Test t/s
Q4_K_M pp64 69.50Β±0.95
Q4_K_M tg512 25.84Β±0.20
Quant Test t/s
Q4_K_M pp64 39.03Β±1.09

GPU offload gives ~1.8Γ— speedup on prompt processing for this model.

Quant Test t/s
Q3_K_M pp64 36.18Β±0.33
Q3_K_M tg256 10.11Β±0.11

Qwen 3B at Q4_K_M (1.95 GiB) exceeds 2 GB VRAM β€” Q3_K_M (1.60 GiB) is required for full off.

$ nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 770 (UUID: GPU-3a93c548-...)

$ /tmp/test_cuinit
cuInit=0

$ llama-bench --list-devices
CUDA0: NVIDIA GeForce GTX 770 (1998 MiB, ...)

Full working stack: kernel module β†’ patched libcuda.so

β†’ CUDA 10.2 runtime β†’ llama.cpp CUDA backend β€” all on Linux 7.x with a 2013 Kepler GPU.

Register the patched driver with DKMS so module rebuilds happen automatically:

sudo apt install dkms
sudo dkms add nvidia/470.256.02
sudo dkms build nvidia/470.256.02 -k $(uname -r)
sudo dkms install nvidia/470.256.02 -k $(uname -r)

For the complete debugging log, kernel patch table, patch scripts, and build instructions, see the GitHub Gist.

── more in #large-language-models 4 stories Β· sorted by recency
── more on @nvidia 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/resurrecting-kepler-…] indexed:0 read:4min 2026-06-27 Β· β€”