cd/entity/RogerΒ· homeβ€Ί entitiesβ€Ί Roger
grep -l @roger /news/*.json | wc -l β†’ 1

@Roger

mentions 1 type Organization feed RSS
19:23
2026-06-04
lesswrong.com
large-language-models

(Mis)generalization of Helpful-Only Fine-tuning

Researchers studying helpful-only (H-only) large language models found that existing models exhibit emergent misalignment, residual refusal behaviors, poor steerability, sycophancy, and incoherent cha…

// co-occurs with top 4 entities