OLMo-core + Engram graft: 2B/600M-A debug comparison

wpnews.pro

cd /news/large-language-models/olmo-core-engram-graft-2b-600m-a-deb… · home › topics › large-language-models › article

[ARTICLE · art-39586] src=discuss.huggingface.co ↗ pub=2026-06-25T16:14Z topic=large-language-models verified=true sentiment=↑ positive

OLMo-core + Engram graft: 2B/600M-A debug comparison

A researcher ran a 200-step debug comparison between a base OLMo3 600M model and a DeepSeek-style Engram memory graft variant, finding the graft stable and showing improved early learning behavior. The Engram variant added ~1B parameters but only 40k more active parameters per token, and the integration required careful handling of FSDP/HSDP wrapping and optimizer policies.

read1 min views1 publishedJun 25, 2026

OLMo-core + Engram graft: 2B/600M-A debug comparison — Image: Discuss (auto-discovered)

I ran a 200-step

, with global_batch _size =32

debug comparison between a base OLMo3 600M model and the same dense backbone with a DeepSeek-style Engram memory graft.

The goal was to check whether the custom module was wired correctly, whether FSDP/HSDP wrapping and optimizer handling were stable, and whether the training/eval curves looked coherent.

Setup

Base model:

~676M trainable parameters

Engram variant:

~1.7B trainable parameters

Engram injected into layers 1 and 5

Most added parameters come from sparse/hash-memory capacity, but active param per token with engram was only 40k more than the dense backbone.

Both are trained with Dion optimizer. What I observed

Under the same short debug setup, the Engram variant showed:

The early signal is encouraging: the Engram graft is training-shaped, stable, and appears to improve early learning behavior in this setup.

Main systems lesson

Custom architecture work is not just “does the forward pass run?”

For this integration, the parameter hierarchy, wrapping policy, optimizer handling, memory profile, and training curves all had to line up. Earlier versions trained mathematically, but had poor memory behavior because the custom modules were not placed inside the wrapped block hierarchy.

W&B logs: [Weights & Biases](https://wandb.ai/jenwei0312/olmo3-engram-experiments)

I wrote an additional tradeoff analysis from the ablation design and eval metric point of view. Cross-posting here for completeness.

Post here

source & further reading

discuss.huggingface.co — original article Rakarrack-0.6.1 port making progress! ( AI assisted ) Cloud Storage Poll Welcome to Haiku basic(Haiku Docs, Haiku slide and Haiku sheets)

~/api · this article 200

$curl api.wpnews.pro/v1/news/olmo-core-engram-graft-2…

Read original on discuss.huggingface.co → discuss.huggingface.co/t/olmo-core-engram-graft-…

mentioned entities

OLMo3

DeepSeek

Engram

Dion

Weights & Biases

Microsoft

metadata

slugolmo-core-engram-graft-2b-600m-a-debug-comparison

topic#large-language-models

secondary2 topics

sentimentpositive

canonicaldiscuss.huggingface.co

navigation

← prevMeta-Qualcomm AI Chip Pact Threa…

next →Dell Introduces PowerEdge XE8812…

── more in #large-language-models 4 stories · sorted by recency

discuss.huggingface.co · 21 Jun · #large-language-models

OLMo-core + Engram graft: small-scale debug comparison

letsdatascience.com · 25 Jun · #large-language-models

Dell Introduces PowerEdge XE8812 for Vera Rubin NVL4

eetimes.com · 25 Jun · #large-language-models

OpenAI’s Jalapeño Will Be Spicy, But the Real Sizzle Is Its Chip Design AI

nypost.com · 25 Jun · #large-language-models

Anthropic accuses Alibaba of campaign to ‘brazenly’ and ‘illicitly’ rip off its AI capabilities

── more on @olmo3 3 stories trending now

wpnews · 22 Jun · #generative-ai

Bain tests software takeover targets using vibecoding AI replicas

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 19 Oct · #developer-tools

Windows Script to clean up and remove all ASUS software

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required