OLMo-core + Engram graft: 2B/600M-A debug comparison A researcher ran a 200-step debug comparison between a base OLMo3 600M model and a DeepSeek-style Engram memory graft variant, finding the graft stable and showing improved early learning behavior. The Engram variant added ~1B parameters but only 40k more active parameters per token, and the integration required careful handling of FSDP/HSDP wrapping and optimizer policies. I ran a 200-step , with global batch size =32 debug comparison between a base OLMo3 600M model and the same dense backbone with a DeepSeek-style Engram memory graft. The goal was to check whether the custom module was wired correctly, whether FSDP/HSDP wrapping and optimizer handling were stable, and whether the training/eval curves looked coherent. Setup Base model: - ~676M trainable parameters Engram variant: - ~1.7B trainable parameters - Engram injected into layers 1 and 5 - Most added parameters come from sparse/hash-memory capacity, but active param per token with engram was only 40k more than the dense backbone. Both are trained with Dion https://github.com/microsoft/dion/ https://github.com/microsoft/dion/ optimizer. What I observed Under the same short debug setup, the Engram variant showed: The early signal is encouraging: the Engram graft is training-shaped, stable, and appears to improve early learning behavior in this setup. Main systems lesson Custom architecture work is not just “does the forward pass run?” For this integration, the parameter hierarchy, wrapping policy, optimizer handling, memory profile, and training curves all had to line up. Earlier versions trained mathematically, but had poor memory behavior because the custom modules were not placed inside the wrapped block hierarchy. W&B logs: Weights & Biases https://wandb.ai/jenwei0312/olmo3-engram-experiments I wrote an additional tradeoff analysis from the ablation design and eval metric point of view. Cross-posting here for completeness. Post here https://www.linkedin.com/posts/jenweiprofile share-machinelearning-deeplearning-ugcPost-7475318099727003649-eHys/?utm source=share&utm medium=member desktop&rcm=ACoAAAXh gcBauwhR3xIRxe6bS1hQWshSOirhVI