{"slug": "introducing-drm-language-emitter-language-generation-as-motion-through-learned", "title": "Introducing DRM Language Emitter: Language Generation as Motion Through Learned Geometry", "summary": "A developer introduced DRM Language Emitter, an experimental language model that replaces the Transformer's attention mechanism with controlled latent motion through a learned relational manifold. The model generates language by evolving a latent state through a learned geometry, treating language generation as motion rather than attention over a context window. The repository includes tiny Transformer comparisons to benchmark the alternative approach.", "body_md": "Most language models today are built around the Transformer paradigm.\n\nThat makes sense.\n\nTransformers work.\n\nThey scale.\n\nThey dominate modern NLP.\n\nBut I wanted to explore a different question:\n\nWhat if language generation does not need to be modeled as attention over a context window?\n\nWhat if a model could generate language by carrying an evolving latent state through a learned geometry?\n\nThat is the idea behind **DRM Language Emitter**.\n\nRepository:\n\n[https://github.com/gnai-creator/drm-language-emitter](https://github.com/gnai-creator/drm-language-emitter)\n\n**DRM Language Emitter** is an experimental, geometry-first language model lab.\n\nIt is not a Transformer.\n\nInside the DRM model, it does not use:\n\n`nn.MultiheadAttention`\n\nInstead, it treats language generation as controlled motion through a learned relational manifold.\n\nThe basic flow is:\n\n``` php\ntoken\n  -> latent state z_t\n  -> active directions\n  -> learned relational metric\n  -> controlled latent motion\n  -> next latent state z_{t+1}\n  -> token logits\n```\n\nThe model is still autoregressive.\n\nBut its memory is not attention over a token sequence.\n\nIts memory is the evolving latent state.\n\nThe working hypothesis is:\n\nLanguage generation can be modeled as motion through a learned relational state space.\n\nThat means the model does not simply ask:\n\n```\nWhich previous tokens should I attend to?\n```\n\nIt asks something closer to:\n\n```\nWhere am I in latent space?\nWhich directions are active?\nHow expensive is movement under the learned metric?\nHow should the state move before emitting the next token?\n```\n\nThis is why I call it a geometry-first language emitter.\n\nThe architecture can be summarized as:\n\n```\ninput_ids\n   |\nTokenEmbedding\n   |\nfor each time step:\n   |\n   z_t\n   |\nDirectionField(z_t)\n   -> directions V(z_t)\n   -> gates a(z_t)\n   -> effective active dimension dimD\n   |\nRelationalMetric(z_t)\n   -> diag + U U^T\n   |\nDRMFlow(z_t, token_embedding, directions, gates)\n   -> dz\n   |\nMetric action g_z(dz, dz)\n   |\nStateUpdater\n   -> z_{t+1}\n   |\nLanguageEmitter(z_{t+1})\n   -> logits\n```\n\nA minimal conceptual version looks like this:\n\n```\nfor token in sequence:\n    embedding = token_embedding(token)\n\n    directions, gates = direction_field(z)\n    metric = relational_metric(z)\n\n    dz = drm_flow(z, embedding, directions, gates)\n    action = metric_action(metric, dz)\n\n    z = state_updater(z, dz)\n    logits = language_emitter(z)\n```\n\nThe important part is that the model has an explicit internal geometry.\n\nIt can log and measure:\n\nThis makes the model interesting not only as a generator, but also as an object of study.\n\nA Transformer is the correct baseline.\n\nThat is why the repository includes tiny Transformer comparisons.\n\nBut the goal of DRM is not to replace Transformers by declaration.\n\nThe goal is to test whether a different computational primitive can be useful in small regimes.\n\nThe Transformer primitive is attention.\n\nThe DRM primitive is controlled latent motion under a learned metric.\n\nThese are very different assumptions.\n\nA Transformer builds context by looking backward.\n\nDRM carries context by evolving state forward.\n\nA Transformer computes token-token interactions.\n\nDRM computes state-motion-emission dynamics.\n\nBecause geometry gives us measurable structure.\n\nIf language is treated as a trajectory, we can ask questions like:\n\nThis opens the door to diagnostics that are harder to express in a standard black-box token predictor.\n\nThe goal is not mystical geometry.\n\nThe goal is measurable geometry.\n\nThe repository contains:\n\n```\nsrc/drm_language_emitter/   DRM model package\ntransformer/                tiny Transformer baseline\nworld_model/                tiny symbolic world-model baseline\nscripts/                    training, generation, evaluation, sweeps, dashboards\nconfigs/                    DRM and benchmark configs\ndocs/                       math, limitations, competition notes, benchmark artifacts\ntests/                      smoke and invariant tests\n```\n\nThe project is CPU-runnable.\n\nCUDA is optional.\n\nInstall:\n\n```\npip install -e .\n```\n\nTrain a tiny DRM model:\n\n```\npython scripts/train_tiny.py \\\n  --config configs/tiny.yaml \\\n  --text data/tiny.txt\n```\n\nGenerate text:\n\n```\npython scripts/generate.py \\\n  --checkpoint runs/tiny/drm_tiny.pt \\\n  --prompt \"DRM \"\n```\n\nRun geometry diagnostics:\n\n```\npython scripts/eval_geometry.py \\\n  --checkpoint runs/tiny/drm_tiny.pt\n\npython scripts/eval_geodesic_paths.py \\\n  --checkpoint runs/tiny/drm_tiny.pt\n```\n\nThe repository also includes a small symbolic benchmark.\n\nThis benchmark compares:\n\nThe task is a deterministic symbolic gridworld serialized as text.\n\nThe models need to predict symbolic transitions such as:\n\n``` php\nstate + action -> next state + reward + done\n```\n\nThis is not visual world modeling.\n\nThis is not a benchmark against large multimodal world models.\n\nIt is a tiny symbolic text-world designed to test whether models can learn discrete dynamics expressed as language.\n\nThe benchmark reports:\n\nThis is important because low loss alone does not necessarily mean correct symbolic dynamics.\n\nA model can learn token-level regularities while still failing to predict exact state transitions.\n\nThe completed benchmark produced:\n\n```\nruns: 72\naggregate rows: 24\n```\n\nTop results by next-state exact match:\n\n| Model | Steps | Family | Next-state exact match | Rollout exact match | Best CE | Invalid state rate | Params |\n|---|---|---|---|---|---|---|---|\n`drm_tiny` |\n2000 | DRM | 0.0751 | 0.0058 | 0.5511 | 0.1328 | 92,710 |\n`transformer_tiny_220k` |\n3000 | Transformer | 0.0563 | 0.0000 | 0.4008 | 0.0026 | 220,208 |\n`transformer_tiny_93k` |\n2000 | Transformer | 0.0516 | 0.0000 | 0.4594 | 0.2969 | 93,872 |\n`world_model_tiny` |\n2000 | World Model | 0.0476 | 0.0000 | 0.2573 | 0.4668 | 102,051 |\n`world_model_tiny` |\n3000 | World Model | 0.0415 | 0.0000 | 0.2497 | 0.4668 | 102,051 |\n\nThe most interesting part is not that DRM “wins everything”.\n\nIt does not.\n\nThe result is more nuanced:\n\nSo the honest interpretation is:\n\nDRM shows an early signal on symbolic next-state prediction, but the benchmark is still diagnostic, not decisive.\n\nFor me, the most important takeaway is:\n\nLow token-level cross-entropy does not automatically imply correct symbolic transition modeling.\n\nThat matters for world-model-like tasks.\n\nIf a model is supposed to represent dynamics, then we should not only ask whether it predicts likely tokens.\n\nWe should also ask whether it predicts valid states, exact transitions, and coherent rollouts.\n\nI am not claiming that DRM is better than Transformers in general.\n\nI am not claiming that DRM is better than world models in general.\n\nI am not claiming that this benchmark says anything about large multimodal world models.\n\nI am not claiming robust long-horizon planning.\n\nThis is a small research scaffold.\n\nThe results are early.\n\nThe exact-match values are still low.\n\nThe model needs more work.\n\nDRM Language Emitter is a functional non-Transformer language model prototype.\n\nIt has explicit, measurable geometry.\n\nIt can be compared against Transformer and symbolic world-model baselines.\n\nAnd in a tiny symbolic text-world benchmark, it showed an interesting signal on next-state exact match.\n\nThat is enough to keep investigating.\n\nGenerate the dataset:\n\n```\npython scripts/make_tiny_world_dataset.py \\\n  --output-root data/tiny_world \\\n  --seed 1 \\\n  --grid-size 5 \\\n  --num-train 20000 \\\n  --num-val 2000 \\\n  --max-rollout-len 8\n```\n\nRun the sweep:\n\n```\npython scripts/sweep_world_model_competition.py \\\n  --steps 1000 2000 3000 \\\n  --seeds 1 2 3 \\\n  --dataset-root data/tiny_world \\\n  --output-root runs/world_model_competition\n```\n\nGenerate the dashboard:\n\n```\npython scripts/make_world_model_dashboard.py \\\n  --root runs/world_model_competition \\\n  --title \"DRM vs Transformer vs Tiny Symbolic World Model\"\n```\n\nThe next things I want to improve are:\n\nThis project started from a simple intuition:\n\nMaybe language generation can be treated as movement.\n\nNot metaphorically.\n\nComputationally.\n\nA token enters.\n\nA state moves.\n\nA geometry shapes the motion.\n\nA new token is emitted.\n\nThat is DRM Language Emitter.\n\nRepository:\n\n[https://github.com/gnai-creator/drm-language-emitter](https://github.com/gnai-creator/drm-language-emitter)\n\nFeedback, criticism, reproduction attempts, and benchmark suggestions are welcome.", "url": "https://wpnews.pro/news/introducing-drm-language-emitter-language-generation-as-motion-through-learned", "canonical_source": "https://dev.to/felipe_muniz_grsba/introducing-drm-language-emitter-language-generation-as-motion-through-learned-geometry-3a5l", "published_at": "2026-06-18 07:33:44+00:00", "updated_at": "2026-06-18 07:51:26.622196+00:00", "lang": "en", "topics": ["large-language-models", "machine-learning", "ai-research", "developer-tools"], "entities": ["DRM Language Emitter", "Transformer", "GitHub"], "alternates": {"html": "https://wpnews.pro/news/introducing-drm-language-emitter-language-generation-as-motion-through-learned", "markdown": "https://wpnews.pro/news/introducing-drm-language-emitter-language-generation-as-motion-through-learned.md", "text": "https://wpnews.pro/news/introducing-drm-language-emitter-language-generation-as-motion-through-learned.txt", "jsonld": "https://wpnews.pro/news/introducing-drm-language-emitter-language-generation-as-motion-through-learned.jsonld"}}