cd /news/large-language-models/introducing-drm-language-emitter-lan… · home topics large-language-models article
[ARTICLE · art-32251] src=dev.to ↗ pub= topic=large-language-models verified=true sentiment=· neutral

Introducing DRM Language Emitter: Language Generation as Motion Through Learned Geometry

A developer introduced DRM Language Emitter, an experimental language model that replaces the Transformer's attention mechanism with controlled latent motion through a learned relational manifold. The model generates language by evolving a latent state through a learned geometry, treating language generation as motion rather than attention over a context window. The repository includes tiny Transformer comparisons to benchmark the alternative approach.

read6 min views1 publishedJun 18, 2026

Most language models today are built around the Transformer paradigm.

That makes sense.

Transformers work.

They scale.

They dominate modern NLP.

But I wanted to explore a different question:

What if language generation does not need to be modeled as attention over a context window?

What if a model could generate language by carrying an evolving latent state through a learned geometry?

That is the idea behind DRM Language Emitter.

Repository:

https://github.com/gnai-creator/drm-language-emitter

DRM Language Emitter is an experimental, geometry-first language model lab.

It is not a Transformer.

Inside the DRM model, it does not use:

nn.MultiheadAttention

Instead, it treats language generation as controlled motion through a learned relational manifold.

The basic flow is:

token
  -> latent state z_t
  -> active directions
  -> learned relational metric
  -> controlled latent motion
  -> next latent state z_{t+1}
  -> token logits

The model is still autoregressive.

But its memory is not attention over a token sequence.

Its memory is the evolving latent state.

The working hypothesis is:

Language generation can be modeled as motion through a learned relational state space.

That means the model does not simply ask:

Which previous tokens should I attend to?

It asks something closer to:

Where am I in latent space?
Which directions are active?
How expensive is movement under the learned metric?
How should the state move before emitting the next token?

This is why I call it a geometry-first language emitter.

The architecture can be summarized as:

input_ids
   |
TokenEmbedding
   |
for each time step:
   |
   z_t
   |
DirectionField(z_t)
   -> directions V(z_t)
   -> gates a(z_t)
   -> effective active dimension dimD
   |
RelationalMetric(z_t)
   -> diag + U U^T
   |
DRMFlow(z_t, token_embedding, directions, gates)
   -> dz
   |
Metric action g_z(dz, dz)
   |
StateUpdater
   -> z_{t+1}
   |
LanguageEmitter(z_{t+1})
   -> logits

A minimal conceptual version looks like this:

for token in sequence:
    embedding = token_embedding(token)

    directions, gates = direction_field(z)
    metric = relational_metric(z)

    dz = drm_flow(z, embedding, directions, gates)
    action = metric_action(metric, dz)

    z = state_updater(z, dz)
    logits = language_emitter(z)

The important part is that the model has an explicit internal geometry.

It can log and measure:

This makes the model interesting not only as a generator, but also as an object of study.

A Transformer is the correct baseline.

That is why the repository includes tiny Transformer comparisons.

But the goal of DRM is not to replace Transformers by declaration.

The goal is to test whether a different computational primitive can be useful in small regimes.

The Transformer primitive is attention.

The DRM primitive is controlled latent motion under a learned metric.

These are very different assumptions.

A Transformer builds context by looking backward.

DRM carries context by evolving state forward.

A Transformer computes token-token interactions.

DRM computes state-motion-emission dynamics.

Because geometry gives us measurable structure.

If language is treated as a trajectory, we can ask questions like:

This opens the door to diagnostics that are harder to express in a standard black-box token predictor.

The goal is not mystical geometry.

The goal is measurable geometry.

The repository contains:

src/drm_language_emitter/   DRM model package
transformer/                tiny Transformer baseline
world_model/                tiny symbolic world-model baseline
scripts/                    training, generation, evaluation, sweeps, dashboards
configs/                    DRM and benchmark configs
docs/                       math, limitations, competition notes, benchmark artifacts
tests/                      smoke and invariant tests

The project is CPU-runnable.

CUDA is optional.

Install:

pip install -e .

Train a tiny DRM model:

python scripts/train_tiny.py \
  --config configs/tiny.yaml \
  --text data/tiny.txt

Generate text:

python scripts/generate.py \
  --checkpoint runs/tiny/drm_tiny.pt \
  --prompt "DRM "

Run geometry diagnostics:

python scripts/eval_geometry.py \
  --checkpoint runs/tiny/drm_tiny.pt

python scripts/eval_geodesic_paths.py \
  --checkpoint runs/tiny/drm_tiny.pt

The repository also includes a small symbolic benchmark.

This benchmark compares:

The task is a deterministic symbolic gridworld serialized as text.

The models need to predict symbolic transitions such as:

state + action -> next state + reward + done

This is not visual world modeling.

This is not a benchmark against large multimodal world models.

It is a tiny symbolic text-world designed to test whether models can learn discrete dynamics expressed as language.

The benchmark reports:

This is important because low loss alone does not necessarily mean correct symbolic dynamics.

A model can learn token-level regularities while still failing to predict exact state transitions.

The completed benchmark produced:

runs: 72
aggregate rows: 24

Top results by next-state exact match:

Model Steps Family Next-state exact match Rollout exact match Best CE Invalid state rate Params
drm_tiny
2000 DRM 0.0751 0.0058 0.5511 0.1328 92,710
transformer_tiny_220k
3000 Transformer 0.0563 0.0000 0.4008 0.0026 220,208
transformer_tiny_93k
2000 Transformer 0.0516 0.0000 0.4594 0.2969 93,872
world_model_tiny
2000 World Model 0.0476 0.0000 0.2573 0.4668 102,051
world_model_tiny
3000 World Model 0.0415 0.0000 0.2497 0.4668 102,051

The most interesting part is not that DRM “wins everything”.

It does not.

The result is more nuanced:

So the honest interpretation is:

DRM shows an early signal on symbolic next-state prediction, but the benchmark is still diagnostic, not decisive.

For me, the most important takeaway is:

Low token-level cross-entropy does not automatically imply correct symbolic transition modeling.

That matters for world-model-like tasks.

If a model is supposed to represent dynamics, then we should not only ask whether it predicts likely tokens.

We should also ask whether it predicts valid states, exact transitions, and coherent rollouts.

I am not claiming that DRM is better than Transformers in general.

I am not claiming that DRM is better than world models in general.

I am not claiming that this benchmark says anything about large multimodal world models.

I am not claiming robust long-horizon planning.

This is a small research scaffold.

The results are early.

The exact-match values are still low.

The model needs more work.

DRM Language Emitter is a functional non-Transformer language model prototype.

It has explicit, measurable geometry.

It can be compared against Transformer and symbolic world-model baselines.

And in a tiny symbolic text-world benchmark, it showed an interesting signal on next-state exact match.

That is enough to keep investigating.

Generate the dataset:

python scripts/make_tiny_world_dataset.py \
  --output-root data/tiny_world \
  --seed 1 \
  --grid-size 5 \
  --num-train 20000 \
  --num-val 2000 \
  --max-rollout-len 8

Run the sweep:

python scripts/sweep_world_model_competition.py \
  --steps 1000 2000 3000 \
  --seeds 1 2 3 \
  --dataset-root data/tiny_world \
  --output-root runs/world_model_competition

Generate the dashboard:

python scripts/make_world_model_dashboard.py \
  --root runs/world_model_competition \
  --title "DRM vs Transformer vs Tiny Symbolic World Model"

The next things I want to improve are:

This project started from a simple intuition:

Maybe language generation can be treated as movement.

Not metaphorically.

Computationally.

A token enters.

A state moves.

A geometry shapes the motion.

A new token is emitted.

That is DRM Language Emitter.

Repository:

https://github.com/gnai-creator/drm-language-emitter

Feedback, criticism, reproduction attempts, and benchmark suggestions are welcome.

── more in #large-language-models 4 stories · sorted by recency
── more on @drm language emitter 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/introducing-drm-lang…] indexed:0 read:6min 2026-06-18 ·