Mini-SWE-agent scores up to 74% on SWE-bench in 100 lines of Python code

wpnews.pro

cd /news/ai-agents/mini-swe-agent-scores-up-to-74-on-sw… · home › topics › ai-agents › article

[ARTICLE · art-16095] src=mini-swe-agent.com ↗ pub=2026-05-28T05:05Z topic=ai-agents verified=true sentiment=↑ positive

Mini-SWE-agent scores up to 74% on SWE-bench in 100 lines of Python code

A new open-source coding agent, mini-SWE-agent, achieves up to 74% on the SWE-bench verified benchmark using just 100 lines of Python code. Developed by the Princeton and Stanford team behind SWE-bench, the agent operates with no tools other than bash and a completely linear history, making it simpler and faster than alternatives like Claude Code. The tool is already adopted by Meta, NVIDIA, IBM, and other major organizations for research and daily workflows.

read4 min views10 publishedMay 28, 2026

This is mini-swe-agent v2

Read the migration guide. For the previous version, check out the v1 documentation or the v1 branch.

In 2024, SWE-bench & SWE-agent helped kickstart the coding agent revolution.

We now ask: What if our agent was 100x simpler, and still worked nearly as well?

mini

Widely adopted: Used by Meta, NVIDIA, Essential AI, IBM, Nebius, Anyscale, Princeton University, Stanford University, and many more.** Minimal**: Just100 lines of python(+100 total forenv,model,script) — no fancy dependencies!Performant: Scores >74% on theSWE-bench verified benchmark; starts much faster than Claude CodeDeployable: Supportslocal environments,** docker/podman**,** singularity/apptainer**,** bublewrap**,** contree**, and more** Compatible:Supports all models via litellm**,** openrouter**,** portkey**, and more. Support for/completion

and/response

endpoints, interleaved thinking etc.- Built by the Princeton & Stanford team behind SWE-bench,SWE-agent, and more Tested:

Why use mini-SWE-agent for research? #

SWE-agent jump-started the development of AI agents in 2024. Back then, we placed a lot of emphasis on tools and special interfaces for the agent. However, one year later, a lot of this is not needed at all to build a useful agent!

In fact, the mini

agent:

Does not have any tools other than bash— it doesn't even use the tool-calling interface of the LMs. This means that you can run it with literally any model. When running in sandboxed environments you also don't need to take care of installing a single package — all it needs is bash.Has a completely linear history— every step of the agent just appends to the messages and that's it. So there's no difference between the trajectory and the messages that you pass on to the LM. Great for debugging & fine-tuning.Executes actions with— every action is completely independent (as opposed to keeping a stateful shell session running). This makes it trivial to execute the actions in sandboxes (literally just switch outsubprocess.run

subprocess.run

withdocker exec

) and to scale up effortlessly. Seriously, this isa big deal, trust me.

This makes it perfect as a baseline system and for a system that puts the language model (rather than the agent scaffold) in the middle of our attention. You can see the result on the SWE-bench (bash only) leaderboard, that evaluates the performance of different LMs with mini

Why use mini-SWE-agent as a tool? #

Some agents are overfitted research artifacts. Others are UI-heavy frontend monsters.

The mini

agent wants to be a hackable tool, not a black box.

Simple enough to understand at a glanceConvenient enough to use in daily workflowsFlexible to extend

Unlike other agents (including our own swe-agent), it is radically simpler, because it:

Does not have any tools other than bash— it doesn't even use the tool-calling interface of the LMs. Instead of implementing custom tools for every specific thing the agent might want to do, the focus is fully on the LM utilizing the shell to its full potential. Want it to do something specific like opening a PR? Just tell the LM to figure it out rather than spending time to implement it in the agent.Executes actions with— every action is completely independent (as opposed to keeping a stateful shell session running). This issubprocess.run

a big dealfor the stability of the agent, trust me.Has a completely linear history— every step of the agent just appends to the messages that are passed to the LM in the next step and that's it. This is great for debugging and understanding what the LM is prompted with.

Should I use mini-SWE-agent or swe-agent? #

You should consider mini-swe-agent

your default choice. In particular, you should use mini-swe-agent

You want a quick command line tool that works locally
You want an agent with a very simple control flow
You want even faster, simpler & more stable sandboxing & benchmark evaluations
You are doing FT or RL and don't want to overfit to a specific agent scaffold

You should use swe-agent

You want to experiment with different sets of tools, each with their own interface
You want to experiment with different history processors

What you get with both

Excellent performance on SWE-Bench
A trajectory browser

| CLI |

mini

) Batch inferenceTrajectory browserPython bindings

agent = DefaultAgent(
    LitellmModel(model_name=...),
    LocalEnvironment(),
)
agent.run("Write a sudoku game")

Upgrading to v2?

Check out our v2 migration guide for all the changes and how to update your code.

Continue reading: #

📣 News #

Run mini-swe-agent on our new & extremely challenging benchmark, ProgramBench New tutorial on building minimal AI agents- Nov 19: Gemini 3 Pro reaches 74% on SWE-bench verified with mini-swe-agent! - Aug 19: New blogpost: Randomly switching between GPT-5 and Sonnet 4 boosts performance

📣 New features #

Please check the github release notes for the latest updates.

📣 Documentation updates #

Jul 27: More notes on local models

source & further reading

mini-swe-agent.com — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/mini-swe-agent-scores-up…

Read original on mini-swe-agent.com → mini-swe-agent.com/latest/

mentioned entities

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required