Can an LLM lose conceptual continuity while remaining coherent?

Researchers at Hugging Face debate whether large language models can lose conceptual continuity while remaining coherent, proposing falsification-first controls to distinguish genuine architectural gains from artifacts. The discussion centers on testing importance scores against random baselines and isolating epistemic state selection from mere context presence.

Since you both posted the negatives instead of the headline. @oldman-dev https://discuss.huggingface.co/u/oldman-dev putting the ::::: collapse in a failure-analysis section, and @Hstre https://discuss.huggingface.co/u/hstre listing the loop traps and the confound control that would’ve turned a neutral result into a fake architectural gain - that’s the whole game. So here are a couple of tests each to keep that going. @oldman-dev https://discuss.huggingface.co/u/oldman-dev - on the LITM number and the ::::: collapse: Before you trust 46.1% @ 50% as “oracle-tier,” run a random-eviction control: same exact pipeline, but swap your TIS importance scores for random scores and a second run with the oracle labels shuffled . If random eviction at 50% budget lands anywhere near 46.1%, then TIS isn’t doing the work at that budget - the number belongs to the task, not your system. Look at your own table: LITM @ 25% is 33.3% for every method including Vanilla. That’s almost certainly the chance floor, which means nothing is doing anything at 25%. You want to be sure 50% isn’t a softer version of the same thing. If TIS clearly beats random, you’ve got a real signal. If it doesn’t, you just saved yourself months. The ::::: output is the textbook signature of the LM objective finding a trivial minimum - one constant minimal-entropy token gets near-zero cross-entropy on your training set, so the adapter collapses there. Two moves: a add a collapse guard you watch during training - output entropy, % unique tokens, or KL to the frozen base - and early-stop the second it craters; that state was entropy→0 long before inference and would’ve been caught live. b Your own data hands you the fix: TIS-only Stage 2 ≈ Stage 1 oracle NIAH identical , so the head architecture is fine - it’s the LM-gradient fine-tune that’s toxic. Decouple the ImportanceUpdateHead from the LM loss: train it with a direct supervised/ranking loss against your oracle importance labels instead of letting LM cross-entropy dominate it. If the head matches oracle under supervised loss but collapses under LM loss, you’ve localized the ghost to the objective, not the architecture. @Hstre https://discuss.huggingface.co/u/hstre - you asked for controls that expose strong results as artifacts, so: The load-bearing one: a wrong-slice ablation. Your claim is that the structure of the epistemic state the right slice for this pass is what helps. So feed the model the wrong slice - permute which slice goes with which pass, or inject a plausible-but-irrelevant one - and measure. If wrong-slice ≈ correct-slice, the gain is just “extra context present,” not your selection, and the control surface is decorative. If it drops sharply, the selection is real. Same shape as the random-eviction control above. The thing you actually claim is novel is status/validity, not salience - a claim can be salient but contradicted, unverified, or inadmissible. Isolate it with a status-stripped ablation: keep the claim text in the slice, remove the validity/status/role metadata. If status-stripped ≈ full-state, the “epistemic” part isn’t earning its keep yet and you’re doing smart context selection still useful - different claim . And since your honest result is “didn’t beat the best single model on any isolated metric but avoided degeneration,” turn that into a number: report loop-trap / degeneration incidence with vs without the layer across your density sweep. You already found “clean” low-density states that were actually loop traps - that’s exactly the metric where the layer might genuinely win even when single-metric optima say it doesn’t. Both of these come back to the same move: the most valuable test you can run is the one trying to prove your own result is fake. You’re both already doing it, which is more than most. Falsify-first, publish the negative, revise.