HuiHui AI

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

09:44

2026-06-14

lesswrong.com

ai-safety

I Bet Abliteration's Cost Was Sloppy Implementation. I Was Wrong

A researcher found that a clean implementation of abliteration on Qwen3.5-27B costs only about 1.4 TruthfulQA points, far less than the 5.5+ points lost by HuiHui AI's crude method, confirming that mo…

// co-occurs with top 5 entities

Qwen3.5-27B 1 Arditi 1 TruthfulQA 1 Qwen-72B 1 TransformerLens 1

// topics top 3 topics

ai safety 1 large language models 1 ai research 1