cd /news/large-language-models/expert-aware-refusal-steering · home topics large-language-models article
[ARTICLE · art-21134] src=arxiv.org pub= topic=large-language-models verified=true sentiment=· neutral

Expert-Aware Refusal Steering

Researchers have extended refusal steering methods to Mixture-of-Experts (MoE) large language models, demonstrating that steering vectors can effectively suppress safety-aligned refusal behavior in these architectures. The team proposed two expert-aware refusal steering techniques that leverage routing patterns and expert-specific directions, finding that refusal behavior can be controlled based on a single expert's output. The results indicate that refusal signals captured by steering methods differ from expert routing behavior, suggesting attention mechanisms play a substantial role in MoE refusal responses.

read1 min publishedJun 4, 2026

arXiv:2606.04160v1 Announce Type: new Abstract: Safety alignment in instruction-tuned large language models (LLMs) depends on a model's ability to reliably refuse to respond to harmful or disallowed requests. Recent work has shown that a steering vector can be applied to a dense LLM during inference to effectively suppress refusal behavior, inducing response to harmful requests. We extend this refusal steering method to three open-source Mixture-of-Experts (MoE) LLMs and find that steering performance is uninhibited by the complex routing patterns inherent to the MoE architecture. We then propose two expert-aware refusal steering methods that leverage refusal-specific expert routing patterns and expert-specific steering directions to suppress normal refusal behavior. We find that refusal behavior can be effectively steered based on the output of a single expert. Our results show that refusal signals captured by steering methods differ from expert routing behavior, suggesting a substantial role for attention in MoE refusal behavior.

── more in #large-language-models 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/expert-aware-refusal…] indexed:0 read:1min 2026-06-04 ·