# ISA_recovery: Auto-generates a Ghidra SLEIGH spec for undocumented ISAs

> Source: <https://github.com/infobyte/isa_recovery>
> Published: 2026-06-18 00:33:49+00:00

A reverse-engineering pipeline that turns a firmware binary and its (possibly-wrong) disassembly into a working Ghidra processor specification. When you hit a proprietary processor with no documentation and no Ghidra support, this tool recovers the real encoding of each instruction — which bits are the opcode, which are registers, which are immediates — and writes out a SLEIGH spec you can load directly into Ghidra to decompile the firmware.

Under the hood it is an **agentic workflow**: a fixed pipeline where each step is a large language model prompted for a narrow job. The workflow is orchestrated by deterministic code — not by the LLMs themselves — and every SLEIGH constructor generated at the end is verified by compiling it with Ghidra's `sleigh`

binary before being accepted. Failed compilations are fed back to the model for up to three repair attempts.

```
Objdump
   │
   ▼
Bootstrap ─── deterministic clustering (no LLM)
       │
       ▼
   ┌─ Processing Loop ──────────────────────────┐
   │  Text Interpreter → Bit Interpreter ──┐    │
   │       → Knowledge Manager             │    │
   │            → Supervisor               │    │
   │                 │          split ─────┘    │
   │                 └── next cluster ──────────┤
   └────────────────────────────────────────────┘
       │
       ▼
   Knowledge Base
       │
       ▼
   SLEIGH Generator ─── compile-verify-retry loop
       │
       ▼
   Ghidra .slaspec
```

Instructions are grouped into **clusters** by structure (byte size, token pattern, fixed-bit mask). Each cluster is then analyzed by a chain of specialized LLM steps:

**Text Interpreter** extracts the text pattern (`add {REG1}, {REG2}, {REG3}`

).**Bit Interpreter** maps each placeholder to a bit range using field-correlation tools; can request a split if a cluster mixes encodings.**Knowledge Manager** integrates per-cluster evidence into a typed knowledge base of registers, instructions, addressing modes, and architecture traits.**Supervisor** is primarily a deterministic gatekeeper (structural checks on match rates, unmapped placeholders, opcode overlap). It only invokes an LLM when a check fails, and it can either accept, re-run a specific agent with feedback, or escalate to the human via the TUI.

When the knowledge base is complete, a separate **SLEIGH generator** builds the Ghidra spec in two phases: a deterministic skeleton of all constructors marked `unimpl`

, then an LLM fills in the p-code semantics one instruction at a time, compiling each against Ghidra's `sleigh`

binary and retrying on failure.

Designed as a **co-pilot for the analyst, not a replacement**: the TUI exposes every decision, the supervisor escalates ambiguous clusters to a human, and the full LLM conversation, tool-call, and token-usage history is written to disk.

Tested on LEGv8, MIPS, pi32v2, and x86.

```
# Docker (recommended)
echo "ANTHROPIC_API_KEY=sk-ant-..." > .env
./docker/run.sh integration_tests/mips

# Local
pip install -e ".[all]"
python -m main --config config.yaml
```

**Input**: a firmware binary and an objdump disassembly — even one produced against the *wrong* architecture. The tool does not solve the disassembly problem itself; output quality scales with input disassembly quality.

**Output**: a Ghidra `.slaspec`

file plus a JSON knowledge base of registers, instruction encodings, addressing modes, and architecture traits.

Full documentation — architecture, agent internals, worked examples, configuration reference — lives in the wiki:

```
pip install -e ".[docs]"
cd wiki && mkdocs serve
```

Then open [http://localhost:8000](http://localhost:8000).

- Python >= 3.11
`ANTHROPIC_API_KEY`

environment variable- Docker (optional, for
`run.sh`

) - Ghidra (required for the SLEIGH compile-verify step)
