# Can an LLM lose conceptual continuity while remaining coherent?

> Source: <https://discuss.huggingface.co/t/can-an-llm-lose-conceptual-continuity-while-remaining-coherent/176469?page=2#post_22>
> Published: 2026-06-15 14:50:35+00:00

Since you both posted the negatives instead of the headline. [@oldman-dev](https://discuss.huggingface.co/u/oldman-dev) putting the ::::: collapse in a failure-analysis section, and [@Hstre](https://discuss.huggingface.co/u/hstre) listing the loop traps and the confound control that would’ve turned a neutral result into a fake architectural gain - that’s the whole game. So here are a couple of tests each to keep that going.

[@oldman-dev](https://discuss.huggingface.co/u/oldman-dev) - on the LITM number and the ::::: collapse:

Before you trust 46.1% @ 50% as “oracle-tier,” run a random-eviction control: same exact pipeline, but swap your TIS importance scores for random scores (and a second run with the oracle labels shuffled). If random eviction at 50% budget lands anywhere near 46.1%, then TIS isn’t doing the work at that budget - the number belongs to the task, not your system. Look at your own table: LITM @ 25% is 33.3% for every method including Vanilla. That’s almost certainly the chance floor, which means nothing is doing anything at 25%. You want to be sure 50% isn’t a softer version of the same thing. If TIS clearly beats random, you’ve got a real signal. If it doesn’t, you just saved yourself months.

The ::::: output is the textbook signature of the LM objective finding a trivial minimum - one constant minimal-entropy token gets near-zero cross-entropy on your training set, so the adapter collapses there. Two moves: (a) add a collapse guard you watch during training - output entropy, % unique tokens, or KL to the frozen base - and early-stop the second it craters; that state was entropy→0 long before inference and would’ve been caught live. (b) Your own data hands you the fix: TIS-only Stage 2 ≈ Stage 1 oracle (NIAH identical), so the head architecture is fine - it’s the LM-gradient fine-tune that’s toxic. Decouple the ImportanceUpdateHead from the LM loss: train it with a direct supervised/ranking loss against your oracle importance labels instead of letting LM cross-entropy dominate it. If the head matches oracle under supervised loss but collapses under LM loss, you’ve localized the ghost to the objective, not the architecture.

[@Hstre](https://discuss.huggingface.co/u/hstre) - you asked for controls that expose strong results as artifacts, so:

The load-bearing one: a wrong-slice ablation. Your claim is that the structure of the epistemic state (the right slice for this pass) is what helps. So feed the model the wrong slice - permute which slice goes with which pass, or inject a plausible-but-irrelevant one - and measure. If wrong-slice ≈ correct-slice, the gain is just “extra context present,” not your selection, and the control surface is decorative. If it drops sharply, the selection is real. Same shape as the random-eviction control above.

The thing you actually claim is novel is status/validity, not salience - a claim can be salient but contradicted, unverified, or inadmissible. Isolate it with a status-stripped ablation: keep the claim text in the slice, remove the validity/status/role metadata. If status-stripped ≈ full-state, the “epistemic” part isn’t earning its keep yet and you’re doing smart context selection (still useful - different claim). And since your honest result is “didn’t beat the best single model on any isolated metric but avoided degeneration,” turn that into a number: report loop-trap / degeneration incidence with vs without the layer across your density sweep. You already found “clean” low-density states that were actually loop traps - that’s exactly the metric where the layer might genuinely win even when single-metric optima say it doesn’t.

Both of these come back to the same move: the most valuable test you can run is the one trying to prove your own result is fake. You’re both already doing it, which is more than most. Falsify-first, publish the negative, revise.
