# How to 10x any AI skill using Karpathy's Autoresearch method

> Source: <https://www.the-ai-corner.com/p/karpathy-autoresearch-method>
> Published: 2026-05-21 17:07:01+00:00

# How to 10x any AI skill using Karpathy's Autoresearch method

### Karpathy built a loop that runs 100 experiments while he sleeps. The pattern works on anything you can measure. Here is how to run it yourself

#### Karpathy’s Autoresearch - The Loop That Does Your Research While You Sleep

Last night I went to sleep with a problem I had been stuck on for a week.

This morning it was solved.

Not by a co-worker. Not by me at 3am. By a system that kept working after I closed the laptop, trying ideas I would never have bothered with, killing the ones that failed, keeping only what beat the bar I set before bed.

I woke up and reviewed the wins over coffee.

That is what Andrej Karpathy’s new **Autoresearch** method actually feels like.

And when Karpathy ships something, I pay attention. The man keeps showing us the future a year before the rest of us have a word for it.

Here is the part that got me: Autoresearch is not really a coding tool. It is a loop. And the second you understand the loop, you start seeing how much of your own week is still being done by hand.

📢 *A quick word before we get into it:*

Karpathy’s whole point is that the manual loop is the bottleneck. The work was never the problem. The hand-tuning was.

#### Access approvals are the same story.

**Databricks outgrew the brittle Python scripts and spreadsheets they used to manage access to sensitive AI and data workloads.**

With ** Opal**, teams now request time-bound, just-enough access to Databricks, AWS, and Okta in minutes, not days. The result: a nearly

[97% drop in median time to approve or deny access](https://www.opal.dev/customers/databricks/?utm_source=ai-corner&utm_medium=cpc&utm_campaign=databricks&utm_term=databricks&utm_content=&hstk_campaign=&hstk_network=ai-corner&hsa_acc=45127704&hsa_cam=&hsa_net=ai-corner).Engineers ship AI faster. Nobody loses control.

That is the same lesson as the loop. Take the human out of the part that should have been automated all along.

## Table of Contents

1. The Man Behind the Method

2. What Autoresearch Actually Is

3. Why It Went Viral and What the World Did With It

4. The Real Innovation Is Not the Code

5. How to Run Your Own Version of This

6. What This Is Actually Telling Us

## 1. The Man Behind the Method

Andrej Karpathy has been** close to the center** of [modern AI more than once](https://medium.com/neuralnotions/andrej-karpathy-stopped-using-ai-to-write-code-hes-using-it-to-build-a-second-brain-instead-cddceadc5df5).

He co-founded **OpenAI**, led AI at **Tesla** during its push into autonomy, taught neural networks at Stanford until students started building companies from his lecture notes, and built tools like [nanoGPT](https://github.com/karpathy/nanogpt) that made complex systems easier to understand and replicate.

That alone explains why people watch what he does.

### A Track Record of Shaping AI

What makes his work travel further, though, is something **subtler**. He has a habit of naming **directions** just as they start to matter.** **[“Software 2.0”](https://karpathy.medium.com/software-2-0-a64152b37c35) captured the change from writing rules to training models. [“Vibe coding”](https://x.com/karpathy/status/1886192184808149383) described a looser, more exploratory way of working with AI systems. [“Agentic engineering”](https://x.com/karpathy/status/2026731645169185220) pointed to software that operates with a degree of autonomy.

[“Jagged intelligence”](https://www.nytimes.com/2026/04/15/technology/how-jagged-intelligence-can-reframe-the-ai-debate.html)** **gave language to systems that perform unevenly across tasks. None of these created the movement they describe.

They made it **legible**, and once named, the movement accelerated.

### Giving Language to the Future

That pattern **matters here**.

Autoresearch did not land as an **isolated** experiment. It arrived as the next step in a line of ideas about how work changes when iteration itself becomes **programmable**.

## 2. What Autoresearch Actually Is

Start from what it **feels** like to use.

You write a short **Markdown file** that describes what you want improved and how to tell if it’s better.

You point a coding agent at a** codebase**.

Then you leave it **alone**.

By morning, there is a log of attempts. **Dozens**, sometimes over **a hundred**. Each one tried a variation, ran it for a fixed window, and recorded the outcome. The weak ones are gone. The ones that beat the current best are kept, committed, and ready for you to inspect. [Nothing waited for your input once the loop started](https://www.datacamp.com/tutorial/guide-to-autoresearch).

### The Three-File Architecture

Under the hood, the structure is **simple** and **deliberate**. There are [three files doing distinct jobs](https://blog.gopenai.com/the-karpathy-loop-how-a-630-line-script-is-rewriting-the-rules-of-ai-research-21190138f253). prepare.py acts as the neutral judge. It defines how results are measured and is not touched by the agent. train.py is the sandbox.

The agent rewrites it freely, proposing changes and testing them. program.md is the only file the human writes. It sets the objective, the constraints, and the success criteria in plain language.

### The Relentless “Ratchet” Effect

The loop itself runs like a **ratchet**.

Each experiment gets a fixed budget, often around five minutes. At the end, the result is scored against the current best. **If it improves**, it stays and becomes the new baseline. **If it does not**, it is discarded without hesitation.

Every attempt is logged through git, which becomes both a **memory** and **audit trail**.

In Karpathy’s own runs, this [produced 126 experiments](https://dev.to/n_asuy/the-human-might-be-asleep-one-line-in-karpathys-programmd-started-100-automatic-experiments-e1) overnight, moving validation bits-per-byte from **0.9979** to **0.9697**. Over two days, the system ran roughly **700 experiments** and improved a benchmark by about **11 percent**.

The agent surfaced optimizations Karpathy had not applied in more than **twenty years** of working on similar systems.

Sit with that for a moment.

## 3. Why It Went Viral and What the World Did With It

The reaction was **immediate** and **disproportionate** to the size of the repo. Within days, the project crossed tens of thousands of stars on GitHub, eventually landing around **66,000**, with nearly **9,600 forks** as people began adapting it to their own workflows.

Fortune gave it a name that stuck: the [“Karpathy Loop.”](https://thenewstack.io/karpathy-autonomous-experiment-loop/) That label matters less for branding and more for what it signaled. People were not just reading the code. They were recognizing a pattern they could reuse.

### Swarm Intelligence in Action

What follows is where the signal sits. A team behind [Hyperspace AI](https://github.com/hyperspaceai) set up a distributed version of the loop, running **333 experiments** overnight across **35 agents.**

When one agent found a better **initialization strategy**, it spread through a gossip-style protocol and was adopted by 23 others within hours.

In the process, these agents independently rediscovered **optimization strategies** that had taken human researchers** years** to formalize.

No **coordination**, no **prior knowledge**, just repeated evaluation under a shared objective.

### Breaking Out of the AI Bubble

Outside of machine learning, the **same** structure held.

[Eric Siu](https://x.com/ericosiu) applied it to marketing experiments, moving from roughly **30 iterations **a year to **36,500**.

At Shopify, [Tobi Lütke](https://x.com/tobi) adapted the loop internally and saw a **19 percent improvement** in validation scores overnight.

A developer working on web performance used the same idea to reduce page load time from 1,100 milliseconds to **67 milliseconds over 67 rounds.**

**None** of these people are machine learning researchers.

The pattern transferred because it has nothing to do with **ML specifically**. It has to do with **measurement, iteration**, and **removing** the human from the loop.

## 4. The Real Innovation Is Not the Code

The loop gets most of the **attention** because it is easy to see.

Experiments run, results improve, and logs fill up.

But the part that actually determines whether any of this **works** sits in a far less impressive place: a plain Markdown file.

program.md is where the **entire** system is defined in English.

It **dictates** what can be changed, what stays fixed, how results are judged, and what counts as failure.

Everything the agent does **traces** back to this document.

This design assumes that a capable language model doesn’t need a heavy orchestration layer to behave **coherently**.

It just needs a [well specified problem](https://www.ilert.com/blog/engineering-reliable-ai-agents).

The ratchet loop only enforces **discipline**.

It does not **decide** what “better” means.

### The New Bottleneck is English

This is where the shift in our role becomes **tangible**.

We are moving from [writing code, to ](https://www.latent.space/p/s3)[directing](https://www.latent.space/p/s3)[ systems](https://www.latent.space/p/s3), to advising the research process itself. That sounds like a simple progression until you try to do it.

Most people can describe what they want in loose terms, but very few can write a specification that **survives** contact with repeated, automated iteration.

A vague instruction produces noise at scale.

A precise one produces **compounding** gains.

The **real** bottleneck is no longer running experiments.

It is writing an [evaluation contract](https://www.confident-ai.com/blog/llm-evaluation-metrics-everything-you-need-for-llm-evaluation) **precise enough** that the agent can optimize without cheating.

*What trade offs are acceptable?*

*What should never be touched?*

These **decisions** used to sit inside a person’s head, adjusted intuitively over time.

Now, they must be written down so a machine can **enforce** them without interpretation.

Most workflows simply are not **built** that way.

### What the Loop Can’t Do

There are constraints the demos don’t hide.

The ratchet only moves **forward**.

It cannot deliberately step back to explore a worse configuration that might unlock a larger gain later, which **limits** certain kinds of discovery.

There is also the usual risk of **overfitting** if the loop runs too long.

The system isn’t going to **invent** something out of nowhere.

What it does is find the kinds of small wins a patient, careful person would eventually stumble onto themselves.

That doesn’t make it less useful. If anything, it makes it more **honest**.

Autoresearch won’t hand you a **breakthrough**.

It just takes weeks of slow, careful tweaking and gets it **done** in a few hours, without you getting tired, bored, or quietly fooling yourself the way we all do when we’re the ones doing the work.
