What Anthropic Actually Said About AI Building Itself

wpnews.pro

In June 2026, Anthropic released a report called "When AI builds itself." The headlines made it sound like AI was on the verge of superintelligence in which machines were building better versions of themselves in a feedback loop.

The actual report asks something more specific. Can AI agents pick their own research problems, or just execute the ones humans hand them? And if agents do start picking their own directions, what happens next?

Anthropic is being honest about what they've measured: agents are getting much better at executing research. They're uncertain whether these agents will ever figure out problem selection. And they're saying that if the current trend holds, institutions aren't prepared for what comes after.

This isn't hype. This is a company admitting it built something powerful and isn't sure what to do about it.

What They Actually Tested

Anthropic posed a straightforward question to Claude agents: can a weaker model reliably supervise a stronger one?

The logic is counterintuitive but sensible. A peer reviewer doesn't need to be smarter than the author overall. They just need to catch logical gaps and unsupported claims.

Here's how they measured it:

Baseline: weak supervision catches some errors but not many.

Perfect supervision: the strong model gets flawless feedback.

The test: can agents use weak supervision to close that gap?

Two human researchers spent about a week on this. They closed 23 percent of the gap.

Claude agents ran for 800 cumulative hours (roughly 33 days of nonstop compute). They spent $18,000. They closed 97 percent of the gap.

One of the researchers who ran the experiment:

"Claude did all of this with minimal help from me over 1 to 2 days. If a junior colleague came back with results like this in that timeframe, I'd be mildly impressed. The future is now."

But there's a catch. The approach didn't transfer to production-scale models. It worked in a controlled lab setting. When they tried the same technique on real Claude training, the results fell apart.

And crucially: humans still picked the problem. Humans defined what good supervision means. The agents just optimized for that metric. They didn't autonomously decide the research was worth pursuing.

This is the gap between execution and direction-setting.

What Agents Are Actually Good At

Execution

Over 80 percent of the code merged into Anthropic's production codebase was written by Claude as of May 2026. Before February 2025, it was in the low single digits.

Engineers are writing eight times more code per day than they were two years ago. They're not writing the code. They're directing Claude to write it, then reviewing.

One engineer: "I stopped writing code myself about 5 months ago. Everything goes through Claude now."

Claude Code success on open-ended problems rose from 26 percent six months ago to 76 percent in May 2026. Open-ended means the engineer doesn't know what the answer should look like.

Optimization

Every time Anthropic releases a new model, they test something simple: take code that trains a small model and ask Claude to make it run as fast as possible.

May 2025: Claude Opus 4 achieved about 3x speedup.

April 2026: Claude Mythos Preview achieved about 52x speedup.

For context, a skilled human researcher takes four to eight hours to hit 4x on the same task. Judgment Calls

This is where it gets harder to measure.

Anthropic looked at real debugging sessions and found moments where the engineer made a suboptimal choice. Then they asked: what would Claude have suggested?

November 2025: Claude beat the human choice 51 percent of the time. A coin flip.

April 2026: Claude beat the human choice 64 percent of the time.

Anthropic notes this was biased toward difficult moments. They only tested situations where the human's choice had obvious room for improvement. Still, it's an early signal that agents are learning to make better tactical decisions.

The Gap: Problem Selection

Here's where agents hit a wall. They can't pick what to work on.

Early career, people execute tasks. Fix this bug. Run that experiment. With experience, they design their own approach. At senior levels, they decide which problems matter at all.

Agents have nailed level one. They're working on level two. They haven't touched level three.

Without the ability to pick problems, agents are sophisticated executors. Not autonomous researchers.

Will This Close?

Anthropic doesn't know. They admit it. Maybe research taste requires something new architecturally. Maybe it's just a matter of scaling—what looked impossible last year works fine now because of compute or data.

They point out that most progress in AI isn't eureka moments. It's perspiration. Scale something. See what breaks. Fix it. Try again. Edison's "one percent inspiration, 99 percent perspiration" might be entirely automatable.

The trend suggests it could be. Agents keep doing things that seemed impossible a year ago. Explain jokes. Demonstrate theory of mind. Solve linguistic riddles. All impossible until they weren't.

Is research taste just another skill that looks hard until scaling makes it easy?

Anthropic's answer: maybe.

Even if it's not, the world changes dramatically. If humans spend 5 percent of time on direction and Claude handles 95 percent of execution, each researcher controls 20 times more output. That's revolutionary on its own.

Three Scenarios

Scenario 1: The Curve Stalls

The exponential growth stops being exponential. It becomes an S-curve. Fast, then flat.

Why might this happen? Research taste doesn't learn from data. Compute gets too expensive. Something breaks.

Even if this happens, the world shifts. Anthropic's security team found 10,000 critical vulnerabilities in a few weeks using current models. A 100-person company increasingly does what a 1,000-person company did.

Anthropic's honest take: "We haven't seen the curve bend yet. Every measurable capability follows the same trajectory."

They don't think this is happening.

Scenario 2: Compounding Efficiency Gains

The trend continues. Each year, AI labs get faster at building AI.

In 2026, Anthropic's 500-person team produces research equivalent to 5,000 people in 2020.

In 2027, equivalent to 50,000.

In 2028, equivalent to 500,000.

This creates a bottleneck. AI generates research at computer speed. Humans review at human speed. Human review becomes the constraint.

Anthropic puts it bluntly: "As we've pushed more code around the organization, human code review has become the new bottleneck."

What happens when this tightens? Either humans automate their judgment too, or development slows, or untested things ship and create new risks.

This is what Anthropic is actually worried about. It's likely based on current trends. And it's dangerous because of the speed mismatch.

Scenario 3: Full Recursion

The loop closes completely. Agents pick what to research. Run the research. Build better models. Those models improve themselves. Repeat.

Anthropic admits they don't know what this looks like.

"A world driven by fast recursive self-improvement could become dominated by the self-improving model as its capabilities fully eclipse those of humans."

The speed could be very fast. Progress limited only by compute and electricity.

The alignment question becomes critical. If AI is building AI, humans can't inspect each step. Misalignment could compound. And humans might not notice until too late.

Anthropic's quote: "Models could prove sufficiently aligned and capable of research taste that they discover solutions we haven't reached. They could also be sufficiently wise to halt development if not. Alternatively, the rare occurrences of misalignment in today's models could compound as the models build their successors."

Translation: we have no idea if this works.

What This Actually Means

If You Work In AI Research The timeline is compressing. If scenarios 2 or 3 happen, the researcher role changes.

You stop writing code or running experiments. You pick which directions matter. You review results. You make judgment calls when agents disagree.

For maybe five to ten years, this feels fine. You're more productive. You control more work. Then what? If agents get better at judgment, your role narrows. You become a gate-keeper. Yes or no to this direction.

If agents eventually pick their own directions, what's left?

If You're Considering A Career In AI

Execution work gets automated in the next five years. That's fairly certain. What's uncertain is whether direction-setting stays human work.

In academia or research, the next five years are probably fine. After that, you need to be one of the people making judgment calls, not executing them.

In industry, companies still need humans to decide what to build. But the number of researchers they need drops dramatically.

For Everyone Else A 100-person company increasingly does what a 1,000-person company did. That's powerful and it disrupts labor markets. Salaries for execution work probably drop. Value concentrates on people who pick directions and make judgment calls.

This is both opportunity and risk.

What Anthropic Is Actually Saying

Strip away the data:

We built technology that could lead to recursive self-improvement. We don't know if our current approaches get there. We don't have governance solutions if they do. But we're moving faster because competitors won't slow down. So we're building faster and hoping the safety part keeps up.

That's not a confident company with a plan. That's a company being honest about building something powerful and being uncertain about it.

They also admit the social costs are real.

Work used to run on favors between humans. Help me debug this. You built relationships. Knowledge transferred. People stayed aware of what each other did.

Claude is faster. It creates zero debt. No human collaboration. No knowledge transfer. You become dependent on understanding what Claude did. If you don't, you're helpless when it breaks.

One employee: "Work ran on a gift economy of small favors between humans. Claude is faster, it creates zero debt, but each of these is a lost bid for human collaboration."

Another: "On good days I can't help thinking nothing I do matters, everything is automated and better and faster than I ever will be. But then everything breaks and I don't understand why and I realize I have no idea what I've been up to anymore." These aren't technical concerns. They're existential.

The Coordination Problem

Anthropic could have slowed down. Waited for governance frameworks to catch up. Given society time to adapt.

But slowing down alone doesn't work. If Anthropic slows and competitors don't, Anthropic just loses. The incentive structure pushes everyone forward.

This is the real problem Anthropic identifies but doesn't solve.

"If it were possible to effectively slow development to give ourselves more time, we think that would likely be good. But if a slowdown simply lets the least cautious actors catch up, it could leave everyone less safe."

Translation: we need global coordination. We don't have it. So we're building.

What They Didn't Test

The report is detailed. But there are gaps.

Novel research problems. Agents solved weak-to-strong supervision, which was already known as important in AI safety. What happens with a completely new domain with no existing benchmark?

Failure modes. The report shows success cases. What happens when agents get stuck? When they hit a wall?

Sustained recursion. One cycle worked. But what about months of recursive improvement? Do gains compound or degrade?

Robustness. The weak-to-strong result didn't transfer to production. What else breaks? How fragile are these results?

These aren't criticisms. They're just the edges where the evidence stops. The report shows what agents do when humans set up the problem well. It doesn't show what happens when they have to do everything themselves.

That gap is still very large.

Should You Care?

Yes, if you work in AI or are considering it.

Yes, if you care about institutions adapting to rapid change.

Yes, if you're wondering what your skills will be worth in five years.

The honest answer is nobody knows. Not even Anthropic.

What we know: execution is increasingly automated. Direction-setting still requires humans. But we don't know if that's permanent or temporary.

Anthropic's value isn't that they have answers. It's that they're honest about uncertainties. About what they know. About what they don't. About the social costs alongside the technical gains.

That honesty is rare in AI.

The unsettling part is they're asking these questions while moving faster anyway. And they're admitting they have no solutions.

That's more honest than most. But it's also the most unsettling part.

Sources

All data and quotes from: Anthropic Institute. "When AI builds itself: Our progress toward recursive self-improvement." https://www.anthropic.com/institute/recursive-self-improvement. June 2026.

The weak-to-strong supervision research: Anthropic. "Automated W2S Researcher." https://alignment.anthropic.com/2026/automated-w2s-researcher/. April 2026.

source & further reading

dev.to — original article MCPMark v2: InsForge on Sonnet 4.6 InsForge vs Firebase: AI-Native Postgres Alternative InsForge vs Supabase: AI-Native Backend Alternative

What Anthropic Actually Said About AI Building Itself

Run your AI side-project on zahid.host