Introducing DRM Language Emitter

A researcher released DRM Language Emitter, an experimental language model that generates text through learned geometric motion instead of Transformer attention. The project tests whether language can be modeled as controlled latent trajectories on a relational manifold, using no self-attention or KV cache. It is not a production system but a research scaffold for exploring geometry-first generative AI.

I’m sharing an experimental research project called DRM Language Emitter . It is a geometry-first language model lab for exploring generative AI without Transformer blocks, without self-attention, without Q/K/V attention, and without KV cache inside the DRM model. The central idea is to treat language generation not as attention over a token window, but as controlled motion through a learned relational manifold. In DRM, generation follows a different path: php token - latent state - active directions - learned relational metric - controlled latent motion - next latent state - token logits The working hypothesis is simple: Maybe language can be generated as motion through a learned relational state space. This is not a production model. This is not a claim that DRM is better than Transformers in general. This is not a claim that DRM is better than world models in general. It is a research scaffold for testing whether explicit geometry, active directions, latent dynamics, and learned metrics can become useful components for small language models and symbolic dynamics. The README describes the model’s central computation as a latent trajectory through state, directions, gates, metric, velocity, state update, and logits. Most current language modeling research is organized around the Transformer paradigm. That makes sense. Transformers work extremely well. But I wanted to test a different question: What happens if the model does not attend backward over a context window, but instead carries an evolving latent state through a learned geometry? DRM Language Emitter is my attempt to explore that question in code. The model has: diag + U U^T ;The model is autoregressive, but its memory is the evolving latent state rather than attention over a token sequence. DRM Language Emitter does not use: nn.MultiheadAttention ;Instead, it tries to make the geometry of generation explicit and measurable. The project logs diagnostics such as cross-entropy, approximate perplexity, metric action, active dimension, gate entropy, metric norm, condition proxy, recurrence, stability, low-action path diagnostics, and symbolic world-modeling metrics. That matters because I do not want the model to be only a black box that outputs tokens. I want to inspect how it moves. I want to measure whether the learned geometry collapses, stabilizes, expands, or forms useful trajectories. The repository includes: src/drm language emitter/ DRM model package transformer/ tiny Transformer baseline world model/ tiny symbolic world-model baseline scripts/ training, generation, evaluation, sweeps, dashboards configs/ DRM and benchmark configs docs/ math, limitations, competition notes, benchmark artifacts tests/ smoke and invariant tests It is CPU-runnable, with CUDA optional. The latest local benchmark reported CPU-only execution, so stronger CUDA and time-matched comparisons are still future work. The long-term research question is: Can learned geometry become a useful primitive for language generation? More specifically: I do not know the final answer yet. That is why the repo exists. Install: pip install -e . Train a tiny DRM model: python scripts/train tiny.py --config configs/tiny.yaml --text data/tiny.txt Generate text: python scripts/generate.py --checkpoint runs/tiny/drm tiny.pt --prompt "DRM " Run geometry diagnostics: python scripts/eval geometry.py --checkpoint runs/tiny/drm tiny.pt python scripts/eval geodesic paths.py --checkpoint runs/tiny/drm tiny.pt At the end of the current README, I also include benchmark artifacts comparing: The latest tiny symbolic world-model benchmark used a deterministic gridworld serialized as text. It produced: runs: 72 aggregate rows: 24 The top result by next-state exact match was: drm tiny @ 2000 steps next state exact match = 0.0751 In the same benchmark, transformer tiny 220k @ 3000 had a lower invalid-state rate of 0.0026 , and the tiny supervised world model reached low CE but weak exact-match and rollout metrics. The honest interpretation is: DRM showed an early signal on symbolic next-state prediction, but the absolute accuracy is still low. This is diagnostic, not decisive. The benchmark does not prove that DRM is better than Transformers. It does not prove that DRM is better than world models. It does not say anything about large multimodal world models. It only shows that this geometry-first emitter is now testable against baselines in a controlled tiny symbolic environment. Allowed claim: DRM Language Emitter is a functional non-Transformer language model prototype with explicit, measurable geometry and controlled tiny comparisons against Transformer and symbolic world-model baselines. Not allowed: DRM is broadly better than Transformers or world models. That distinction matters. I am currently working on: Repository: https://github.com/gnai-creator/drm-language-emitter Feedback, criticism, reproduction attempts, and benchmark suggestions are very welcome.