Resurrecting Kepler: Getting Modern LLMs Running on a GTX 770 (Kernel 7.x)

wpnews.pro

cd /news/large-language-models/resurrecting-kepler-getting-modern-l… · home › topics › large-language-models › article

[ARTICLE · art-41862] src=dev.to ↗ pub=2026-06-27T13:29Z topic=large-language-models verified=true sentiment=↑ positive

Resurrecting Kepler: Getting Modern LLMs Running on a GTX 770 (Kernel 7.x)

An engineer patched NVIDIA's proprietary driver to run modern LLM inference on a GeForce GTX 770 (Kepler) GPU with a Linux 7.x kernel. The fix involves a five-byte binary patch to libcuda.so that bypasses a buffer-size negotiation bug causing cuInit to fail with error 802. The project demonstrates that Kepler GPUs, abandoned by NVIDIA after driver 470.256.02, remain viable for small-to-medium LLM workloads, reducing e-waste.

read4 min views1 publishedJun 27, 2026

⚠️ Experimental hack: Use on non-critical systems. Ensure you have backups. This patches a proprietary binary at the instruction level — no warranty, no support.

Kepler GPUs (2012–2014) are e-waste by NVIDIA's timeline, but they are perfectly capable hardware for inference workloads. The GTX 770 has 1536 CUDA cores and 2 GB GDDR5 — enough for small-to-medium LLMs. This project proves that with a five-byte fix and some kernel backports, these GPUs can be kept useful on modern Linux systems, reducing e-waste and teaching real systems engineering along the way.

Keep an NVIDIA GeForce GTX 770 (GK104, sm_30) — a Kepler GPU abandoned by NVIDIA's driver stack after driver 470.256.02 and CUDA 10.2 — running CUDA workloads on a modern Linux kernel (6.15 → 7.x, Ubuntu 26.04).

Two problems made stock software a dead end:

cuInit

returns error 802nvidia-smi

works, every CUDA program fails with CUDA_ERROR_SYSTEM_NOT_YET_INITIALIZED

.The proprietary 470.256.02 driver source does not build against kernels ≥6.15 due to removed/renamed APIs. I used community-sourced patch sets (primarily from Fedora/Debian packaging by Joan Bruguera Mico and Andreas Beckmann) to resolve issues like:

screen_info

→ sysfb_primary_display.screen

del_timer_sync

→ timer_delete_sync

follow_pfn

→ unsafe_follow_pfn

dma_fence_signal

now returns voidefi_enabled

cast and UBSAN mismatchesAfter these backports, nvidia-smi

reports the GTX 770 correctly. But cuInit

still fails.

cuInit

Error 802 All rm_ioctl

kernel calls return NV_OK

— the kernel module is fine. The failure lives in userspace. With gdb

, I traced cuInit

calling rm_ioctl(0x2a)

twice; both calls succeed at the kernel level, yet the library still returns 802.

Disassembly of the RM response handler in libcuda.so.470.256.02

3436a0: mov   0xc(%rsp),%eax      ; load status from RM response
3436a4: cmp   $0x2,%eax           ; status == 2?
3436a7: je    3436f0              ; → return 802
3436a9: jbe   3436e0              ; status <= 1?
3436e0: cmp   $0x1,%eax
3436e3: jne   3436c5              ; status != 1 → return 999
3436e5: xor   %eax,%eax           ; return 0 (success)
...
3436f0: add   $0x18,%rsp
3436f4: mov   $0x322,%eax         ; return 802
3436f9: pop; ret

Root cause: The Resource Manager firmware on Kepler returns status code 2

(NV_ERR_BUFFER_TOO_SMALL

) for the second initialization rm_ioctl

. The library treats 1

and 4

as success, but 2

is fatal → 802. Likely a buffer-size negotiation mismatch between the GTX 770's VBIOS firmware and the final 470.x userspace library. NVIDIA never fixed it because Kepler was already on legacy support.

The fix: One instruction at offset 0x3436f4

. Instead of mov $0x322, %eax

(return 802), return 0:

Bytes	Instruction
Before	`b8 22 03 00 00`
`mov $0x322, %eax`
After	`31 c0 90 90 90`
`xor %eax, %eax; nop; nop; nop`

Subsequent rm_ioctl

calls succeed — only this specific init ioctl is broken. Patch script:

#!/usr/bin/env python3
import shutil, os

libpath = "/usr/lib/x86_64-linux-gnu/libcuda.so.470.256.02"
backup_path = libpath + ".bak"

if not os.path.exists(backup_path):
    shutil.copy2(libpath, backup_path)

with open(libpath, "rb") as f:
    data = bytearray(f.read())

offset = 0x3436f4
expected = bytes([0xb8, 0x22, 0x03, 0x00, 0x00])
actual = data[offset:offset+5]

if actual == expected:
    data[offset:offset+2] = bytes([0x31, 0xc0])
    data[offset+2:offset+5] = bytes([0x90, 0x90, 0x90])
    print(f"Patched: {actual.hex()} -> {data[offset:offset+5].hex()}")
elif actual[:2] == bytes([0x31, 0xc0]):
    print("Already patched!")
else:
    print(f"UNEXPECTED at 0x{offset:x}: {actual.hex()}")
    exit(1)

with open(libpath, "wb") as f:
    f.write(data)

sm_30 support was dropped in CUDA 11, so we need CUDA 10.2's ptxas

. But nvcc

rejects GCC 15 (Ubuntu 26.04 default). clang++ bridges legacy CUDA 10.2 headers and modern system libraries.

llama.cpp uses cg::this_grid()

(CUDA 11+). Patched softmax.cu

for CUDA 10.2:

// Before (CUDA >= 11.0):
const cg::grid_group g = cg::this_grid();

// After (CUDA < 11.00):
const cg::thread_block g = cg::this_thread_block();

Build flags:

cmake .. -DLLAMA_CUDA=ON \
  -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ \
  -DCUDAToolkit_ROOT=/usr/local/cuda-10.2 \
  -DCMAKE_CUDA_COMPILER=clang++ \
  -DCMAKE_CUDA_ARCHITECTURES=30 \
  -DGGML_CUDA_GRAPHS=OFF

-DGGML_CUDA_GRAPHS=OFF

is critical — CUDA graph capture requires sm_35+ and crashes on sm_30.

Hardware: GTX 770 (2 GB VRAM), Ubuntu 26.04, kernel 7.0.0-27, llama.cpp c16c35b81.

Quant	Test	t/s
Q4_K_M	pp64	69.50±0.95
Q4_K_M	tg512	25.84±0.20

Quant	Test	t/s
Q4_K_M	pp64	39.03±1.09

GPU offload gives ~1.8× speedup on prompt processing for this model.

Quant	Test	t/s
Q3_K_M	pp64	36.18±0.33
Q3_K_M	tg256	10.11±0.11

Qwen 3B at Q4_K_M (1.95 GiB) exceeds 2 GB VRAM — Q3_K_M (1.60 GiB) is required for full off.

$ nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 770 (UUID: GPU-3a93c548-...)

$ /tmp/test_cuinit
cuInit=0

$ llama-bench --list-devices
CUDA0: NVIDIA GeForce GTX 770 (1998 MiB, ...)

Full working stack: kernel module → patched libcuda.so

→ CUDA 10.2 runtime → llama.cpp CUDA backend — all on Linux 7.x with a 2013 Kepler GPU.

sudo apt install dkms
sudo dkms add nvidia/470.256.02
sudo dkms build nvidia/470.256.02 -k $(uname -r)
sudo dkms install nvidia/470.256.02 -k $(uname -r)

For the complete debugging log, kernel patch table, patch scripts, and build instructions, see the GitHub Gist.

source & further reading

dev.to — original article The Developer's Guide to Trimming AI API Costs Without Crying Stopping the flicker when you restyle a video frame by frame Session-Level Spending Limits Are Not Governance. Your Agent Needs Autonomy Tiers.

~/api · this article 200

$curl api.wpnews.pro/v1/news/resurrecting-kepler-gett…

Read original on dev.to → dev.to/skyne/resurrecting-kepler-getting-modern-…

mentioned entities

NVIDIA

GeForce GTX 770

Kepler

CUDA

Linux

Joan Bruguera Mico

Andreas Beckmann

Fedora

metadata

slugresurrecting-kepler-getting-modern-llms-running-on-a-gtx-770-kernel-7-x

topic#large-language-models

secondary2 topics

sentimentpositive

canonicaldev.to

navigation

← prevFervo Energy soars 35% on IPO am…

next →Don’t trust NYC educrats to get …

── more in #large-language-models 4 stories · sorted by recency

dev.to · 27 Jun · #large-language-models

Headless Mode on NVIDIA Jetson AGX Orin 64GB with JetPack 7.2

letsdatascience.com · 27 Jun · #large-language-models

Alice & Bob Proposes Decoupled AI Topologies

dev.to · 26 Jun · #large-language-models

nvoc: linux overclocking gains multi GPU support, scripting, and is improved for ai use

dev.to · 27 Jun · #large-language-models

The Developer's Guide to Trimming AI API Costs Without Crying

── more on @nvidia 3 stories trending now

wpnews · 25 May · #artificial-intelligence

Maia-3: free and open source

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 1 Nov · #developer-tools

Custom Zig Test Runner, better ouput, timing display, and support for special "tests:beforeAll" and "tests:afterAll" tests

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required