09:44
2026-06-14
lesswrong.com
ai-safety
I Bet Abliteration's Cost Was Sloppy Implementation. I Was Wrong
A researcher found that a clean implementation of abliteration on Qwen3.5-27B costs only about 1.4 TruthfulQA points, far less than the 5.5+ points lost by HuiHui AI's crude method, confirming that moโฆ