Cosmicgpt – A GPT-in-space simulator to research SpaceX AI satellite viability

A new open-source simulator called Cosmicgpt models how space radiation, including cosmic-ray bit flips and other faults, affects GPT inference on satellites. The tool injects single-event effects into model weights, activations, and KV cache across orbits like LEO and SAA, generating reports on failure modes and output degradation. It aims to help research the viability of running AI models on SpaceX satellites.

Simulate what happens to GPT inference under space conditions — cosmic-ray bit flips and other radiation-induced faults corrupting a model's weights, activations, KV cache, and output. See what radiation does to an AI model's output: a single-run report https://davedx.github.io/cosmicgpt/report.html and an environment comparison https://davedx.github.io/cosmicgpt/comparison.html . See DESIGN.md /davedx/cosmicgpt/blob/main/DESIGN.md for goals and the conditions we model, and ARCHITECTURE.md /davedx/cosmicgpt/blob/main/ARCHITECTURE.md for the technical design. The end-to-end loop covers the full Single-Event-Effect taxonomy across three corruptible regions , with faults either hand-specified or derived from a physical radiation environment : build a seeded nanoGPT /davedx/cosmicgpt/blob/main/src/cosmicgpt/model/nanogpt.py with a real KV cache , generate a clean baseline, get faults manual or from the flux scheduler /davedx/cosmicgpt/blob/main/src/cosmicgpt/environment/scheduler.py , inject them weight mutations, activation forward-hooks, KV-cache mutations , regenerate with the same sampling seed, and diff. Fault kinds --kind : SEU single bit flip , MBU multi-bit upset , STUCK AT cell pinned 0/1 , SEL latch-up — a whole tensor zeroed , SET transient activation glitch , SEFI NaN/garbage cascade . Regions --region : weight , activation incl. lm head → logits , kv cache . Environments --orbit : LEO, SAA, POLAR, GEO, INTERPLANETARY, SOLAR STORM , with an optional solar-flare burst window raising λ t mid-inference. Every run also reports a failure mode silent correct / subtle wrong / repetition / garbage / nan garbage / crash , time-to-failure , and mean KL divergence of the output distribution, and can emit a per-step RunTrace /davedx/cosmicgpt/blob/main/src/cosmicgpt/eval/trace.py JSON the data the upcoming visualizations consume . physically-derived faults from an orbit flux scaled so a short run shows effects cosmicgpt run --orbit SAA --flux-mult 1e4 --tokens 120 a mission with a mid-inference solar flare cosmicgpt run scenarios/mission solar storm.yaml write a self-contained HTML report token diff + degradation timeline + raster cosmicgpt run --orbit SOLAR STORM --flux-mult 1e4 --report report.html regenerate a report from a saved trace — no re-inference cosmicgpt report runs/storm/trace.json -o report.html compare conditions side by side View C cosmicgpt compare --orbits LEO,SAA,SOLAR STORM -o comparison.html Reports are fully self-contained inline CSS + inline SVG, no external assets, no matplotlib so they're emailable and archivable. python -m venv .venv && source .venv/bin/activate pip install -e ". dev " run the smallest scenario SEU cosmicgpt run scenarios/walking skeleton.yaml drive the taxonomy directly cosmicgpt run --kind SEFI --n-flips 1 --tokens 120 --fault-seed 3 cosmicgpt run --kind SEL --n-flips 8 --tokens 100 verify the bit-flip foundation + injection mechanisms pytest - Single faults on low-impact sites biases, low mantissa bits are routinely masked — realistic: most cosmic-ray hits do nothing visible. Exponent/sign flips and SEL are far more destructive than mantissa flips. SET transient activation glitch is gentle: without persistence it affects one step, and only if it lands on the emitted position.- The model now has a real KV cache --region kv cache : a strike there is mutated once but persists , because every later token re-reads the corrupted entry through attention. Region is independent of fault kind — --region weight|activation|kv cache . A single short inference in LEO is essentially fault-free at realistic upset rates; meaningful corruption needs the SAA, a solar storm, or long exposure. With a flare burst window , divergence visibly begins right when the flux spikes. The model is a small, seeded, randomly-initialized char-level GPT, so the baseline text is gibberish — but that's fine for the skeleton: the point is to demonstrate the fault-injection loop and that flips especially in the float exponent measurably corrupt the output. Train a coherent model later via scripts/train tiny.py roadmap . src/cosmicgpt/ model/ nanogpt.py +KV cache , adapter.py, sites.py model + fault registry faults/ bitops.py, types.py, injector.py taxonomy + injection environment/ flux.py, presets.py, scheduler.py scaled-physical flux eval/ runner, metrics, classify, trace loop + metrics + RunTrace viz/ svg, diffview, timeline, report inline-SVG/HTML reports config.py, cli.py scenarios/ walking skeleton.yaml, sefi cascade.yaml, mission solar storm.yaml tests/ test bitops, test injection, test kvcache, test scheduler, test eval, test viz See ARCHITECTURE.md §11 /davedx/cosmicgpt/blob/main/ARCHITECTURE.md . Next step 6 : mitigation wrappers ECC / TMR voting / scrubbing / NaN guards with cost-benefit experiments, then a pluggable larger-GPT backend to test whether findings generalize.