Gefen: Optimized Stochastic Optimizer

wpnews.pro

cd /news/machine-learning/gefen-optimized-stochastic-optimizer · home › topics › machine-learning › article

[ARTICLE · art-27560] src=arxiv.org ↗ pub=2026-06-15T04:00Z topic=machine-learning verified=true sentiment=↑ positive

Gefen: Optimized Stochastic Optimizer

Researchers propose Gefen, a memory-efficient optimizer that reduces AdamW's memory footprint by ~8x while maintaining performance, enabling larger microbatches and improved throughput in deep learning training. Gefen automatically shares second-moment estimates across parameter blocks and quantizes the first moment using a learned codebook, reducing memory by 6.5 GiB per billion parameters. The method is validated across diverse experiments and is available as a drop-in replacement for AdamW.

read1 min views25 publishedJun 15, 2026

arXiv:2606.13894v1 Announce Type: new Abstract: AdamW is a default optimizer for modern deep learning, but its first and second moment states add roughly two parameter-sized buffers to training memory. We propose Gefen, a memory-efficient optimizer that automatically shares second-moment estimates across parameter blocks and quantizes the first moment using a learned codebook, thereby reducing AdamW's memory footprint by ~8x while maintaining the same performance, corresponding to a reduction of 6.5 GiB per billion parameters. The method is motivated by a theoretical result showing that large mixed Hessian entries constrain the ratio of squared gradients toward one, suggesting that Hessian-aligned parameters are natural candidates for sharing second-moment statistics. Since computing Hessians is impractical at scale, Gefen infers block structure from the initial squared gradients, requiring no architecture-specific metadata or hyperparameters beyond AdamW defaults. Gefen learns an exact histogram-based dynamic-programming quantization codebook and reuses the same blocks for first-moment scaling. Across diverse experiments, Gefen achieves the lowest peak optimizer memory among the compared AdamW-like methods while maintaining AdamW-level performance. In FSDP and DDP training, the reduced memory footprint enables larger microbatches and improves throughput significantly over AdamW, providing a practical drop-in replacement with lower memory usage that can increase throughput and enable training larger models or using larger batch sizes. We provide the complete Python implementation, including fused CUDA kernels at https://github.com/ndvbd/Gefen

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/gefen-optimized-stochast…

Read original on arxiv.org → arxiv.org/abs/2606.13894

mentioned entities

Gefen

AdamW

arXiv

FSDP

DDP

CUDA

metadata

sluggefen-optimized-stochastic-optimizer

topic#machine-learning

secondary3 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevDomain-Specific AI for Pharma, B…

next →Senior engineers are spending th…

── more in #machine-learning 4 stories · sorted by recency

runtimewire.com · 2 Aug · #machine-learning

Wafer says AMD's MI355X beats Nvidia B300 on Kimi K3 cost efficiency

ai.2it.onl · 2 Aug · #machine-learning

Testing LLM Concurrency on Consumer Hardware (RTX 5060)

marktechpost.com · 2 Aug · #machine-learning

NVIDIA AI Releases Molt: A PyTorch-Native Agentic Reinforcement Learning Framework

discuss.huggingface.co · 1 Aug · #machine-learning

High School Sophomore Seeking arXiv Endorser for Vision Transformer MoE Paper (cs.LG / cs.CV)

── more on @gefen 3 stories trending now

wpnews · 1 Aug · #ai-products

OpenAI Atlas Shuts Down August 9: Migration Guide

wpnews · 1 Aug · #ai-agents

Quality Isn't Accidental — Maker/Checker Separation and Automated Validation

wpnews · 1 Aug · #developer-tools

I Built a Portable AI Skill That Safely Upgrades .NET Applications

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required