09:14
2025-03-13
gist.github.com
artificial-intelligence
white-box LLM jailbreak using weight orthogonization
The provided text contains a Python script for a "white-box LLM jailbreak" technique that uses weight orthogonalization. The script loads harmful and harmless instruction datasets, extracts hidden staβ¦