Hi everyone,
Over the last few months I’ve been working on an independent research project exploring the internal dynamics of small and medium-sized language models.
Rather than evaluating models only by their outputs (benchmarks, perplexity, etc.), I’m trying to characterize how their hidden representations evolve during inference.
The project currently covers 7 open models:
The first part of the framework studies internal trajectories through hidden-state dynamics.
Instead of asking “Which model is more accurate?”, I ask:
This produced several reproducible dynamical fingerprints and architecture clusters.
The second phase moves away from pure dynamics and investigates whether different functional properties become linearly decodable inside hidden representations.
Across multiple probe experiments I observed evidence that:
One interesting observation is that the position of these high-capacity regions varies across architectures rather than appearing at identical absolute depths.
The result that surprised me the most came from a series of control experiments.
After training linear probes I compared:
Gaussian noise and feature permutation substantially reduced decodability.
Orthogonal rotations, however, preserved it almost entirely.
That suggests (at least empirically) that the functional signal depends more on the geometry of the representation space than on specific embedding dimensions.
This seems broadly consistent with ideas discussed in mechanistic interpretability about distributed feature directions.
Across several independent audits, the models repeatedly separate into two broad behavioral groups.
Cluster A
These models consistently exhibit similar dynamic and functional profiles.
Cluster B
Despite architectural differences, these models repeatedly cluster together across multiple analyses.
Seeing the same grouping emerge from different metrics was one of the motivations for continuing the project.
I’m now moving from observation toward causal testing.
The next experiments aim to answer questions such as:
This is entirely independent research, so I’d genuinely appreciate feedback.
I’m especially interested in hearing from people working on:
I’d love to know whether these observations resonate with existing work—or whether there are obvious control experiments I should run next.