I’m sharing an experimental research project called DRM Language Emitter.
It is a geometry-first language model lab for exploring generative AI without Transformer blocks, without self-attention, without Q/K/V attention, and without KV cache inside the DRM model. The central idea is to treat language generation not as attention over a token window, but as controlled motion through a learned relational manifold.
In DRM, generation follows a different path:
token
-> latent state
-> active directions
-> learned relational metric
-> controlled latent motion
-> next latent state
-> token logits
The working hypothesis is simple:
Maybe language can be generated as motion through a learned relational state space.
This is not a production model.
This is not a claim that DRM is better than Transformers in general.
This is not a claim that DRM is better than world models in general.
It is a research scaffold for testing whether explicit geometry, active directions, latent dynamics, and learned metrics can become useful components for small language models and symbolic dynamics. The README describes the model’s central computation as a latent trajectory through state, directions, gates, metric, velocity, state update, and logits.
Most current language modeling research is organized around the Transformer paradigm.
That makes sense. Transformers work extremely well.
But I wanted to test a different question:
What happens if the model does not attend backward over a context window, but instead carries an evolving latent state through a learned geometry?
DRM Language Emitter is my attempt to explore that question in code.
The model has:
diag + U U^T
;The model is autoregressive, but its memory is the evolving latent state rather than attention over a token sequence.
DRM Language Emitter does not use:
nn.MultiheadAttention
;Instead, it tries to make the geometry of generation explicit and measurable.
The project logs diagnostics such as cross-entropy, approximate perplexity, metric action, active dimension, gate entropy, metric norm, condition proxy, recurrence, stability, low-action path diagnostics, and symbolic world-modeling metrics.
That matters because I do not want the model to be only a black box that outputs tokens.
I want to inspect how it moves.
I want to measure whether the learned geometry collapses, stabilizes, expands, or forms useful trajectories.
The repository includes:
src/drm_language_emitter/ DRM model package
transformer/ tiny Transformer baseline
world_model/ tiny symbolic world-model baseline
scripts/ training, generation, evaluation, sweeps, dashboards
configs/ DRM and benchmark configs
docs/ math, limitations, competition notes, benchmark artifacts
tests/ smoke and invariant tests
It is CPU-runnable, with CUDA optional. The latest local benchmark reported CPU-only execution, so stronger CUDA and time-matched comparisons are still future work.
The long-term research question is:
Can learned geometry become a useful primitive for language generation?
More specifically:
I do not know the final answer yet.
That is why the repo exists.
Install:
pip install -e .
Train a tiny DRM model:
python scripts/train_tiny.py --config configs/tiny.yaml --text data/tiny.txt
Generate text:
python scripts/generate.py --checkpoint runs/tiny/drm_tiny.pt --prompt "DRM "
Run geometry diagnostics:
python scripts/eval_geometry.py --checkpoint runs/tiny/drm_tiny.pt
python scripts/eval_geodesic_paths.py --checkpoint runs/tiny/drm_tiny.pt
At the end of the current README, I also include benchmark artifacts comparing:
The latest tiny symbolic world-model benchmark used a deterministic gridworld serialized as text. It produced:
runs: 72
aggregate rows: 24
The top result by next-state exact match was:
drm_tiny @ 2000 steps
next_state_exact_match = 0.0751
In the same benchmark, transformer_tiny_220k @ 3000
had a lower invalid-state rate of 0.0026
, and the tiny supervised world model reached low CE but weak exact-match and rollout metrics.
The honest interpretation is:
DRM showed an early signal on symbolic next-state prediction, but the absolute accuracy is still low. This is diagnostic, not decisive.
The benchmark does not prove that DRM is better than Transformers.
It does not prove that DRM is better than world models.
It does not say anything about large multimodal world models.
It only shows that this geometry-first emitter is now testable against baselines in a controlled tiny symbolic environment.
Allowed claim:
DRM Language Emitter is a functional non-Transformer language model prototype with explicit, measurable geometry and controlled tiny comparisons against Transformer and symbolic world-model baselines.
Not allowed:
DRM is broadly better than Transformers or world models.
That distinction matters.
I am currently working on:
Repository:
https://github.com/gnai-creator/drm-language-emitter
Feedback, criticism, reproduction attempts, and benchmark suggestions are very welcome.