# Resurrecting Kepler: Getting Modern LLMs Running on a GTX 770 (Kernel 7.x)

> Source: <https://dev.to/skyne/resurrecting-kepler-getting-modern-llms-running-on-a-gtx-770-kernel-7x-4na>
> Published: 2026-06-27 13:29:09+00:00

⚠️ Experimental hack: Use on non-critical systems. Ensure you have backups. This patches a proprietary binary at the instruction level — no warranty, no support.

Kepler GPUs (2012–2014) are e-waste by NVIDIA's timeline, but they are perfectly capable hardware for inference workloads. The GTX 770 has 1536 CUDA cores and 2 GB GDDR5 — enough for small-to-medium LLMs. This project proves that with a **five-byte fix** and some kernel backports, these GPUs can be kept useful on modern Linux systems, reducing e-waste and teaching real systems engineering along the way.

Keep an **NVIDIA GeForce GTX 770 (GK104, sm_30)** — a Kepler GPU abandoned by NVIDIA's driver stack after driver 470.256.02 and CUDA 10.2 — running CUDA workloads on a modern Linux kernel (6.15 → 7.x, Ubuntu 26.04).

Two problems made stock software a dead end:

`cuInit`

returns error 802`nvidia-smi`

works, every CUDA program fails with `CUDA_ERROR_SYSTEM_NOT_YET_INITIALIZED`

.The proprietary 470.256.02 driver source does not build against kernels ≥6.15 due to removed/renamed APIs. I used community-sourced patch sets (primarily from [Fedora/Debian packaging](https://src.fedoraproject.org/rpms/nvidia-kmod) by Joan Bruguera Mico and Andreas Beckmann) to resolve issues like:

`screen_info`

→ `sysfb_primary_display.screen`

`del_timer_sync`

→ `timer_delete_sync`

`follow_pfn`

→ `unsafe_follow_pfn`

`dma_fence_signal`

now returns void`efi_enabled`

cast and UBSAN mismatchesAfter these backports, `nvidia-smi`

reports the GTX 770 correctly. But `cuInit`

still fails.

`cuInit`

Error 802
All `rm_ioctl`

kernel calls return `NV_OK`

— the kernel module is fine. The failure lives in userspace. With `gdb`

, I traced `cuInit`

calling `rm_ioctl(0x2a)`

twice; both calls succeed at the kernel level, yet the library still returns 802.

Disassembly of the RM response handler in `libcuda.so.470.256.02`

:

```
3436a0: mov   0xc(%rsp),%eax      ; load status from RM response
3436a4: cmp   $0x2,%eax           ; status == 2?
3436a7: je    3436f0              ; → return 802
3436a9: jbe   3436e0              ; status <= 1?
3436e0: cmp   $0x1,%eax
3436e3: jne   3436c5              ; status != 1 → return 999
3436e5: xor   %eax,%eax           ; return 0 (success)
...
3436f0: add   $0x18,%rsp
3436f4: mov   $0x322,%eax         ; return 802
3436f9: pop; ret
```

**Root cause:** The Resource Manager firmware on Kepler returns status code `2`

(`NV_ERR_BUFFER_TOO_SMALL`

) for the second initialization `rm_ioctl`

. The library treats `1`

and `4`

as success, but `2`

is fatal → 802. Likely a buffer-size negotiation mismatch between the GTX 770's VBIOS firmware and the final 470.x userspace library. NVIDIA never fixed it because Kepler was already on legacy support.

**The fix:** One instruction at offset `0x3436f4`

. Instead of `mov $0x322, %eax`

(return 802), return 0:

| Bytes | Instruction | |
|---|---|---|
| Before | `b8 22 03 00 00` |
`mov $0x322, %eax` |
| After | `31 c0 90 90 90` |
`xor %eax, %eax; nop; nop; nop` |

Subsequent `rm_ioctl`

calls succeed — only this specific init ioctl is broken. Patch script:

``` python
#!/usr/bin/env python3
import shutil, os

libpath = "/usr/lib/x86_64-linux-gnu/libcuda.so.470.256.02"
backup_path = libpath + ".bak"

if not os.path.exists(backup_path):
    shutil.copy2(libpath, backup_path)

with open(libpath, "rb") as f:
    data = bytearray(f.read())

offset = 0x3436f4
expected = bytes([0xb8, 0x22, 0x03, 0x00, 0x00])
actual = data[offset:offset+5]

if actual == expected:
    data[offset:offset+2] = bytes([0x31, 0xc0])
    data[offset+2:offset+5] = bytes([0x90, 0x90, 0x90])
    print(f"Patched: {actual.hex()} -> {data[offset:offset+5].hex()}")
elif actual[:2] == bytes([0x31, 0xc0]):
    print("Already patched!")
else:
    print(f"UNEXPECTED at 0x{offset:x}: {actual.hex()}")
    exit(1)

with open(libpath, "wb") as f:
    f.write(data)
```

sm_30 support was dropped in CUDA 11, so we need CUDA 10.2's `ptxas`

. But `nvcc`

rejects GCC 15 (Ubuntu 26.04 default). **clang++** bridges legacy CUDA 10.2 headers and modern system libraries.

llama.cpp uses `cg::this_grid()`

(CUDA 11+). Patched `softmax.cu`

for CUDA 10.2:

``` js
// Before (CUDA >= 11.0):
const cg::grid_group g = cg::this_grid();

// After (CUDA < 11.00):
const cg::thread_block g = cg::this_thread_block();
```

Build flags:

```
cmake .. -DLLAMA_CUDA=ON \
  -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ \
  -DCUDAToolkit_ROOT=/usr/local/cuda-10.2 \
  -DCMAKE_CUDA_COMPILER=clang++ \
  -DCMAKE_CUDA_ARCHITECTURES=30 \
  -DGGML_CUDA_GRAPHS=OFF
```

`-DGGML_CUDA_GRAPHS=OFF`

is critical — CUDA graph capture requires sm_35+ and crashes on sm_30.

Hardware: **GTX 770 (2 GB VRAM)**, **Ubuntu 26.04**, **kernel 7.0.0-27**, **llama.cpp c16c35b81**.

| Quant | Test | t/s |
|---|---|---|
| Q4_K_M | pp64 | 69.50±0.95 |
| Q4_K_M | tg512 | 25.84±0.20 |

| Quant | Test | t/s |
|---|---|---|
| Q4_K_M | pp64 | 39.03±1.09 |

GPU offload gives ~1.8× speedup on prompt processing for this model.

| Quant | Test | t/s |
|---|---|---|
| Q3_K_M | pp64 | 36.18±0.33 |
| Q3_K_M | tg256 | 10.11±0.11 |

Qwen 3B at Q4_K_M (1.95 GiB) exceeds 2 GB VRAM — Q3_K_M (1.60 GiB) is required for full offloading.

``` bash
$ nvidia-smi -L
GPU 0: NVIDIA GeForce GTX 770 (UUID: GPU-3a93c548-...)

$ /tmp/test_cuinit
cuInit=0

$ llama-bench --list-devices
CUDA0: NVIDIA GeForce GTX 770 (1998 MiB, ...)
```

Full working stack: kernel module → patched `libcuda.so`

→ CUDA 10.2 runtime → llama.cpp CUDA backend — all on Linux 7.x with a 2013 Kepler GPU.

Register the patched driver with DKMS so module rebuilds happen automatically:

```
sudo apt install dkms
sudo dkms add nvidia/470.256.02
sudo dkms build nvidia/470.256.02 -k $(uname -r)
sudo dkms install nvidia/470.256.02 -k $(uname -r)
```

For the complete debugging log, kernel patch table, patch scripts, and build instructions, see the [GitHub Gist](https://gist.github.com/skyne/fa150c6e4b025903a2dc0d34d1d9065f).
