Ångstrom used Claude Code to train a model that beat Meta's UMA-OMC

wpnews.pro

Ångstrom AI (YC S24), with the University of Cambridge (the Csanyi group) and AstraZeneca, released DFT Accuracy on Crystal Structure Prediction with Machine Learning Interatomic Potentials. The paper presented

CSP-MACE-Å, a machine learning model designed to replace DFT, the expensive quantum mechanical calculation at the heart of crystal structure prediction, with the same accuracy but a 10,000x speedup.

CSP-MACE-Å also significantly outperformed UMA-OMC on crystal-structure prediction benchmarks. UMA is Meta's general purpose model for atoms and molecules; UMA-OMC is the version adapted for organic molecular crystals.

Ångstrom built CSP-MACE-Å on anycloud, a CLI that runs GPU jobs across your own cloud accounts. Ångstrom pointed Claude Code at anycloud: the agent called the anycloud

CLI to drive the experiment loop, resulting in roughly 100,000 GPU jobs, almost entirely on multi-cloud spot, on their own cloud accounts.

Why CSP-MACE-Å matters to AstraZeneca #

Crystal structure prediction (CSP) answers a deceptively simple question: given a molecule, what solid crystal structures can it form? It matters because one molecule can pack into different crystals structures (known as polymorphs) with different physical characteristics. This creates a major risk for pharmaceutical development, especially when late-appearing forms emerge during manufacturing or storage and alter product performance. In 1998, that nearly sank the HIV drug ritonavir. The drug had to be pulled and reformulated when an unexpected, more stable crystal form of the same molecule appeared 2 years after market release. This cost Abbott more than $250 million. Veritasium tells the story well in the The Crystal That Could Destroy All Medicine video. It is imperative for drugmakers to map all the possible crystal forms of a molecule before release in order to derisk the possibility of an unexpectedly shift to a more stable form later on that may render the drug unusable once it has been distributed.

The workhorse of CSP is DFT (density functional theory). DFT is quantum-mechanical calculation that serves as the gold standard for CSP in industry and academia. However, DFT is extremely expensive and slow. The calculations for one molecule can take days to weeks, which slows down the scientists using it, and caps how many structures they can explore.

Ångstrom’s machine learning model, CSP-MACE-Å, is 10,000 times faster than DFT. Calculations go from taking weeks with DFT to minutes with CSP-MACE-Å. Not only does this save scientists time, but it ultimately means that far more candidate crystal structures may be evaluated, providing greater confidence when derisking crystal forms.

CSP-MACE-Å was also shown to outperform Meta's UMA-OMC model across Ångstrom’s and AstraZeneca's evaluation suites. Meta's UMA-OMC was the previous state of the art machine learning interatomic potential for CSP, however its accuracy was inferior to gold standard DFT. CSP-MACE-Å is the first model to demonstrate the accuracy of DFT for CSP, delivering a massive speed improvement without sacrificing accuracy.

The agent-driven experiment loop #

The bottleneck to develop the CSP-MACE-Å model at Ångstrom is the speed at which Ångstrom can iterate on the loop that underlies many AI research orgs: Forming a hypothesis, deciding what computational experiments to run to test it, launching the GPU jobs, pulling results back, analyzing the results and deciding on the next hypothesis to test. All the while, having to additionally reduce GPU costs, and manage hardware failures (and bugs!).

Ångstrom researchers used Claude Code in that loop. They talked through what computational experiments to run, which batches of jobs to launch, what outputs to compare, and what plots/metrics would answer the current question. Claude then turned that plan into concrete work: launching batches of anycloud

jobs, monitoring status, down results, and generating plots and summaries for the next research decision.

Claude used the same local anycloud

CLI and cloud configuration the team used by hand. The researchers stayed focused on the experiment plan and interpretation; Claude handled the execution: the fan-out and bookkeeping between decisions. However, the same fan-out that made the loop fast also made it dangerous: the wrong batch of GPU jobs could become thousands of dollars of real spend before anyone noticed.

How anycloud kept the AI research experiment loop under control #

“anycloud gives me the confidence to really let my agents loose without stressing that they will burn through all our compute. These days they continue to work throughout night, autonomously managing my research experiments, while I sleep."

Laurence Midgley, Co-founder & CTO, Ångstrom AI

The most recent feature to anycloud that Ångström loves are spend controls scoped to the agent session. This proved to be the necessary primitive to give researchers the level of control to comfortably let their agents manage research experiments autonomously. It was not "give the agent cloud access", it was "let this session spend up to X today / Y per hour." That matters because a runaway agent could keep launching GPU work which could turn into thousands of dollars of wasted cloud spend.

anycloud let the team set two independent guardrails for each Claude Code session: a throttle to limit concurrent GPU jobs, and a budget to cap total spend over a calendar window. When either cap is hit, new jobs wait in the queue; running jobs keep going. That matters because a runaway agent could keep launching GPU work which could turn into thousands of dollars of wasted cloud spend.

Throttle

Limits concurrent jobs

Measures
estimated live $/hr
Blocks at
running + next VM >= cap
Clears
when running VMs finish

$ anycloud throttle set 300 --agent-session

Throttle  $/hr right now
  per agent-session  cap $300/hr each
    csp-rank            ██████████  96%   $289.25 / $300/hr     $10.75/hr headroom

Budget

Limits total spend

Measures
settled + estimated spend
Blocks at
period spend >= cap
Clears
at UTC reset

$ anycloud budget set 4000 --per day --agent-session

Budget  day  (resets in 19h)
  per agent-session  cap $4,000 each
    csp-rank            ██████████  96%    $3,848 / $4,000      $152 left
    free-energy-rerank  ████████░░  78%    $3,136 / $4,000      $864 left
    blind-test-eval     █░░░░░░░░░   9%      $368 / $4,000      $3,632 left

Slack makes spend and blocked work visible without watching a terminal. anycloud notifications enable slack --webhook ...

posts a daily digest with total spend, job counts, interruption rate, median runtime, and active users. If a budget or rate cap starts blocking new jobs, anycloud posts a waiting-on-spend-cap alert. A daily budget block clears at the next daily reset; a rate-cap block clears when live spend falls back under the configured ceiling. Caps block only new jobs; already-running jobs keep running.

What this unlocked for Ångstrom #

"Our monthly compute spend is often more than 2x higher than our cash burn - so compute cost is a serious problem for us. anycloud has been critical for letting us use our credits across all major providers efficiently. We run our experiments almost exclusively on spot, which has significantly extended our compute runway. The bottleneck for an AI research company is the rate at which we iterate on the

run experiments -> analyse results -> plan next experimentsloop - anycloud lets us orchestrate hundreds of experiments each day."

Laurence Midgley, Co-founder & CTO, Ångstrom AI

Two problems sit behind Laurence's quote: cost and iteration speed. Cloud credits are the cheapest GPUs a startup will ever touch, but they're stranded - spread across providers, each with its own quotas, regions, and spot pools. And the rate of research is capped by how fast you can run the next batch of experiments. anycloud is built for exactly this: schedule across every connected account, take the cheapest capacity that's actually available, and run on spot without the workload having to care which cloud it lands on. In Ångstrom's case, Claude Code drove that research loop by calling the anycloud

CLI directly against the team's own clouds - exactly the workflow anycloud is built around.

Ångstrom AI is one of a new generation of AI research companies using Claude Code to increase the speed at which they can iterate on research. anycloud sits at the core of their infrastructure, powering the computational experiments behind that research loop.

source & further reading

anycloud.sh — original article