Introducing DRM Language Emitter: Language Generation as Motion Through Learned Geometry

wpnews.pro

Most language models today are built around the Transformer paradigm.

That makes sense.

Transformers work.

They scale.

They dominate modern NLP.

But I wanted to explore a different question:

What if language generation does not need to be modeled as attention over a context window?

What if a model could generate language by carrying an evolving latent state through a learned geometry?

That is the idea behind DRM Language Emitter.

Repository:

https://github.com/gnai-creator/drm-language-emitter

DRM Language Emitter is an experimental, geometry-first language model lab.

It is not a Transformer.

Inside the DRM model, it does not use:

nn.MultiheadAttention

Instead, it treats language generation as controlled motion through a learned relational manifold.

The basic flow is:

token
  -> latent state z_t
  -> active directions
  -> learned relational metric
  -> controlled latent motion
  -> next latent state z_{t+1}
  -> token logits

The model is still autoregressive.

But its memory is not attention over a token sequence.

Its memory is the evolving latent state.

The working hypothesis is:

Language generation can be modeled as motion through a learned relational state space.

That means the model does not simply ask:

Which previous tokens should I attend to?

It asks something closer to:

Where am I in latent space?
Which directions are active?
How expensive is movement under the learned metric?
How should the state move before emitting the next token?

This is why I call it a geometry-first language emitter.

The architecture can be summarized as:

input_ids
   |
TokenEmbedding
   |
for each time step:
   |
   z_t
   |
DirectionField(z_t)
   -> directions V(z_t)
   -> gates a(z_t)
   -> effective active dimension dimD
   |
RelationalMetric(z_t)
   -> diag + U U^T
   |
DRMFlow(z_t, token_embedding, directions, gates)
   -> dz
   |
Metric action g_z(dz, dz)
   |
StateUpdater
   -> z_{t+1}
   |
LanguageEmitter(z_{t+1})
   -> logits

A minimal conceptual version looks like this:

for token in sequence:
    embedding = token_embedding(token)

    directions, gates = direction_field(z)
    metric = relational_metric(z)

    dz = drm_flow(z, embedding, directions, gates)
    action = metric_action(metric, dz)

    z = state_updater(z, dz)
    logits = language_emitter(z)

The important part is that the model has an explicit internal geometry.

It can log and measure:

This makes the model interesting not only as a generator, but also as an object of study.

A Transformer is the correct baseline.

That is why the repository includes tiny Transformer comparisons.

But the goal of DRM is not to replace Transformers by declaration.

The goal is to test whether a different computational primitive can be useful in small regimes.

The Transformer primitive is attention.

The DRM primitive is controlled latent motion under a learned metric.

These are very different assumptions.

A Transformer builds context by looking backward.

DRM carries context by evolving state forward.

A Transformer computes token-token interactions.

DRM computes state-motion-emission dynamics.

Because geometry gives us measurable structure.

If language is treated as a trajectory, we can ask questions like:

This opens the door to diagnostics that are harder to express in a standard black-box token predictor.

The goal is not mystical geometry.

The goal is measurable geometry.

The repository contains:

src/drm_language_emitter/   DRM model package
transformer/                tiny Transformer baseline
world_model/                tiny symbolic world-model baseline
scripts/                    training, generation, evaluation, sweeps, dashboards
configs/                    DRM and benchmark configs
docs/                       math, limitations, competition notes, benchmark artifacts
tests/                      smoke and invariant tests

The project is CPU-runnable.

CUDA is optional.

Install:

pip install -e .

Train a tiny DRM model:

python scripts/train_tiny.py \
  --config configs/tiny.yaml \
  --text data/tiny.txt

Generate text:

python scripts/generate.py \
  --checkpoint runs/tiny/drm_tiny.pt \
  --prompt "DRM "

Run geometry diagnostics:

python scripts/eval_geometry.py \
  --checkpoint runs/tiny/drm_tiny.pt

python scripts/eval_geodesic_paths.py \
  --checkpoint runs/tiny/drm_tiny.pt

The repository also includes a small symbolic benchmark.

This benchmark compares:

The task is a deterministic symbolic gridworld serialized as text.

The models need to predict symbolic transitions such as:

state + action -> next state + reward + done

This is not visual world modeling.

This is not a benchmark against large multimodal world models.

It is a tiny symbolic text-world designed to test whether models can learn discrete dynamics expressed as language.

The benchmark reports:

This is important because low loss alone does not necessarily mean correct symbolic dynamics.

A model can learn token-level regularities while still failing to predict exact state transitions.

The completed benchmark produced:

runs: 72
aggregate rows: 24

Top results by next-state exact match:

Model	Steps	Family	Next-state exact match	Rollout exact match	Best CE	Invalid state rate
`drm_tiny`
2000	DRM	0.0751	0.0058	0.5511	0.1328	92,710
`transformer_tiny_220k`
3000	Transformer	0.0563	0.0000	0.4008	0.0026	220,208
`transformer_tiny_93k`
2000	Transformer	0.0516	0.0000	0.4594	0.2969	93,872
`world_model_tiny`
2000	World Model	0.0476	0.0000	0.2573	0.4668	102,051
`world_model_tiny`
3000	World Model	0.0415	0.0000	0.2497	0.4668	102,051

The most interesting part is not that DRM “wins everything”.

It does not.

The result is more nuanced:

So the honest interpretation is:

DRM shows an early signal on symbolic next-state prediction, but the benchmark is still diagnostic, not decisive.

For me, the most important takeaway is:

Low token-level cross-entropy does not automatically imply correct symbolic transition modeling.

That matters for world-model-like tasks.

If a model is supposed to represent dynamics, then we should not only ask whether it predicts likely tokens.

We should also ask whether it predicts valid states, exact transitions, and coherent rollouts.

I am not claiming that DRM is better than Transformers in general.

I am not claiming that DRM is better than world models in general.

I am not claiming that this benchmark says anything about large multimodal world models.

I am not claiming robust long-horizon planning.

This is a small research scaffold.

The results are early.

The exact-match values are still low.

The model needs more work.

DRM Language Emitter is a functional non-Transformer language model prototype.

It has explicit, measurable geometry.

It can be compared against Transformer and symbolic world-model baselines.

And in a tiny symbolic text-world benchmark, it showed an interesting signal on next-state exact match.

That is enough to keep investigating.

Generate the dataset:

python scripts/make_tiny_world_dataset.py \
  --output-root data/tiny_world \
  --seed 1 \
  --grid-size 5 \
  --num-train 20000 \
  --num-val 2000 \
  --max-rollout-len 8

Run the sweep:

python scripts/sweep_world_model_competition.py \
  --steps 1000 2000 3000 \
  --seeds 1 2 3 \
  --dataset-root data/tiny_world \
  --output-root runs/world_model_competition

Generate the dashboard:

python scripts/make_world_model_dashboard.py \
  --root runs/world_model_competition \
  --title "DRM vs Transformer vs Tiny Symbolic World Model"

The next things I want to improve are:

This project started from a simple intuition:

Maybe language generation can be treated as movement.

Not metaphorically.

Computationally.

A token enters.

A state moves.

A geometry shapes the motion.

A new token is emitted.

That is DRM Language Emitter.

Repository:

https://github.com/gnai-creator/drm-language-emitter

Feedback, criticism, reproduction attempts, and benchmark suggestions are welcome.

source & further reading

dev.to — original article Multi-Tenant AI Chat: From Hardcoded Config to BYOK in 4 Steps Build a Semantic Cache for Your LLM App in 40 Lines of Python (And Cut Costs by Half) A Threat Actor Used DeepSeek to Orchestrate 460 Attacks via Telegram

Introducing DRM Language Emitter: Language Generation as Motion Through Learned Geometry

Run your AI side-project on zahid.host