cd /news/ai-agents/isa-recovery-auto-generates-a-ghidra… Β· home β€Ί topics β€Ί ai-agents β€Ί article
[ARTICLE Β· art-31911] src=github.com β†— pub= topic=ai-agents verified=true sentiment=↑ positive

ISA_recovery: Auto-generates a Ghidra SLEIGH spec for undocumented ISAs

A new reverse-engineering pipeline called ISA_recovery automatically generates Ghidra SLEIGH processor specifications for undocumented instruction set architectures by using an agentic workflow of large language models to recover instruction encodings from firmware binaries and disassembly. The tool, designed as a co-pilot for analysts, outputs a Ghidra .slaspec file and a JSON knowledge base, and has been tested on LEGv8, MIPS, pi32v2, and x86.

read3 min views3 publishedJun 18, 2026

A reverse-engineering pipeline that turns a firmware binary and its (possibly-wrong) disassembly into a working Ghidra processor specification. When you hit a proprietary processor with no documentation and no Ghidra support, this tool recovers the real encoding of each instruction β€” which bits are the opcode, which are registers, which are immediates β€” and writes out a SLEIGH spec you can load directly into Ghidra to decompile the firmware.

Under the hood it is an agentic workflow: a fixed pipeline where each step is a large language model prompted for a narrow job. The workflow is orchestrated by deterministic code β€” not by the LLMs themselves β€” and every SLEIGH constructor generated at the end is verified by compiling it with Ghidra's sleigh

binary before being accepted. Failed compilations are fed back to the model for up to three repair attempts.

Objdump
   β”‚
   β–Ό
Bootstrap ─── deterministic clustering (no LLM)
       β”‚
       β–Ό
   β”Œβ”€ Processing Loop ──────────────────────────┐
   β”‚  Text Interpreter β†’ Bit Interpreter ──┐    β”‚
   β”‚       β†’ Knowledge Manager             β”‚    β”‚
   β”‚            β†’ Supervisor               β”‚    β”‚
   β”‚                 β”‚          split β”€β”€β”€β”€β”€β”˜    β”‚
   β”‚                 └── next cluster ───────────
   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
   Knowledge Base
       β”‚
       β–Ό
   SLEIGH Generator ─── compile-verify-retry loop
       β”‚
       β–Ό
   Ghidra .slaspec

Instructions are grouped into clusters by structure (byte size, token pattern, fixed-bit mask). Each cluster is then analyzed by a chain of specialized LLM steps:

Text Interpreter extracts the text pattern (add {REG1}, {REG2}, {REG3}

).Bit Interpreter maps each placeholder to a bit range using field-correlation tools; can request a split if a cluster mixes encodings.Knowledge Manager integrates per-cluster evidence into a typed knowledge base of registers, instructions, addressing modes, and architecture traits.Supervisor is primarily a deterministic gatekeeper (structural checks on match rates, unmapped placeholders, opcode overlap). It only invokes an LLM when a check fails, and it can either accept, re-run a specific agent with feedback, or escalate to the human via the TUI.

When the knowledge base is complete, a separate SLEIGH generator builds the Ghidra spec in two phases: a deterministic skeleton of all constructors marked unimpl

, then an LLM fills in the p-code semantics one instruction at a time, compiling each against Ghidra's sleigh

binary and retrying on failure.

Designed as a co-pilot for the analyst, not a replacement: the TUI exposes every decision, the supervisor escalates ambiguous clusters to a human, and the full LLM conversation, tool-call, and token-usage history is written to disk.

Tested on LEGv8, MIPS, pi32v2, and x86.

echo "ANTHROPIC_API_KEY=sk-ant-..." > .env
./docker/run.sh integration_tests/mips

pip install -e ".[all]"
python -m main --config config.yaml

Input: a firmware binary and an objdump disassembly β€” even one produced against the wrong architecture. The tool does not solve the disassembly problem itself; output quality scales with input disassembly quality.

Output: a Ghidra .slaspec

file plus a JSON knowledge base of registers, instruction encodings, addressing modes, and architecture traits.

Full documentation β€” architecture, agent internals, worked examples, configuration reference β€” lives in the wiki:

pip install -e ".[docs]"
cd wiki && mkdocs serve

Then open http://localhost:8000.

  • Python >= 3.11 ANTHROPIC_API_KEY

environment variable- Docker (optional, for run.sh

) - Ghidra (required for the SLEIGH compile-verify step)

── more in #ai-agents 4 stories Β· sorted by recency
── more on @ghidra 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/isa-recovery-auto-ge…] indexed:0 read:3min 2026-06-18 Β· β€”