# Why Most Multi-Agent AI Systems Waste 90% of Their Time (And How to Fix It)

> Source: <https://pub.towardsai.net/why-most-multi-agent-ai-systems-waste-90-of-their-time-and-how-to-fix-it-c0ce81f0e323?source=rss----98111c9905da---4>
> Published: 2026-06-16 07:52:18+00:00

Most engineers think multi-agent performance is a concurrency problem.

I did too.

So when five AI agents running in parallel barely outperformed a sequential run, I assumed something was wrong with my orchestration.

I was looking in the wrong place.

**Each agent was spending more time preparing to work than actually working.**

The fix wasn’t more threads, better async code, or a faster model.

**It was a memory snapshot.**

And once I saw where the time was really going, an entire class of multi-agent bottlenecks suddenly made sense.

Here is what that looks like, what took me three iterations to get right, and where it still has rough edges.

Let’s get the mental model first.

**The idea is straightforward:** instead of five agents each spending 90 seconds installing the same tools, install them once, freeze that environment, and stamp out five identical copies.

**Each copy runs a different analysis in parallel. A lead LLM reads all five results and tells you what to fix first.**

**In code:**

Setup time is paid once, upfront, before any agent runs. The rest of this article explains how.

**If you have not worked with sandboxes before:** think of one as a disposable computer that lives in the cloud.

You spin it up, run whatever code you need inside it, and throw it away when you’re done. It has its own filesystem, its own processes, its own network. Nothing it does can touch your machine or any other sandbox running at the same time.

**In short:** Sandboxes provide the agent with a secure and isolated enviornement

**That isolation is the whole point.**

Your agent can install packages, write files, crash badly, or spin up a browser, and none of it bleeds out. When the task is done, you terminate the VM, and it is gone.

**The next agent starts clean**.

Most agent frameworks treat the execution environment as an afterthought. The LLM call is the interesting part. The environment is just “wherever the code runs.”

**That works fine for single-turn tasks. It breaks down fast for anything multi-step.**

When an agent needs to install packages, write intermediate files, maintain a browser session across multiple pages, or resume a task from a different machine, you need the execution environment to behave like a persistent object, not a function call that resets on every invocation.

**Tensorlake gives each agent a MicroVM backed by Firecracker and CloudHypervisor, optimized for fast boot times and strong isolation. Each sandbox is a full Linux VM.**

It boots in hundreds of milliseconds, persists filesystem and memory state across sessions, and can be snapshotted at any point in its lifecycle.

**Tensorlake **also lets you spin up multiple sandboxes in parallel for concurrent agent execution, and honestly it is one of my favourite things about it.

It also ranks in the** top 5 of ****SandboxBenchmarks****.**

**What changes the math is a single question:** what does the snapshot actually capture?

Quick vocabulary before the details. Tensorlake sandboxes have four lifecycle modes.

**This project uses the last two.**

**Suspend** and **Snapshot **both preserve state, but serve different purposes : Suspend is for pausing *this* sandbox to resume later, while a snapshot is a reusable artifact for retrying from a checkpoint or cloning an environment.

**Tensorlake supports two checkpoint types. Most tutorials only mention one.**

The checkpoint type is not a performance detail. It determines whether your fork is a clone or a restart.

**The default when you call sandbox.checkpoint() with no arguments is filesystem. **That is the wrong choice for a parallel swarm where agents share a prepared environment. You want memory.

**One more constraint worth knowing upfront:** for memory snapshots, resources (CPUs, RAM) are baked into the snapshot at checkpoint time. You cannot override them when creating forks. Set the right cpus and memory_mb on the base sandbox before you checkpoint. Every fork inherits them automatically.

The pattern has five distinct phases. Each one has a single responsibility.

**Phase 1 — Base Snapshot:** Spins up a single baseline sandbox, installs analysis tools (bandit, radon), writes the target code, and checkpoints the entire running VM state using CheckpointType.MEMORY. The base sandbox is then terminated, leaving behind the reusable snapshot ID.

**Phase 2 — Agent Forking:** Restores 5 independent sandboxes concurrently from the base snapshot using sandbox.fork(…). Each fork is a warm start that inherits all installed tools, environment settings, and target files.

**Phase 3 — Sequential Baseline (Timing):** Runs each agent’s analysis script (analyze.py) one-by-one inside its respective sandbox to measure sequential time as a benchmark denominator.

**Phase 4 — Parallel Swarm:** Executes all 5 agents concurrently using asyncio.gather(…). Each agent runs the same analysis script inside its isolated sandbox but with a different focus configuration passed via the PERSPECTIVE environment variable.

**Phase 5 — LLM Aggregation:** Collects the individual reports (Security, Complexity, Docstrings, Tests, Structure) alongside the timing data, and passes them to the lead LLM (GPT) to synthesize a single prioritized fix list.

Phase 1 runs once. Phases 2 through 4 run every time you want results. The fork is cheap. The base environment build is not, but you only pay that cost once per snapshot.

The base sandbox installs the analysis tools, writes the target codebase into the VM, then snapshots the entire state. Every fork inherits both the tools and the target project automatically.

``` python
from tensorlake.sandbox import AsyncSandbox, CheckpointTypeasync def build_base_snapshot() -> str:    async with await AsyncSandbox.create(        name="base-swarm-env",        cpus=2.0,        memory_mb=2048,        timeout_secs=600,    ) as sandbox:        # Install analysis tools. These are baked into the snapshot        # and available to every forked agent at no extra install cost.        result = await sandbox.run(            "pip",            ["install", "bandit", "radon", "--user", "--break-system-packages", "-q"],            timeout=180,        )        if result.exit_code != 0:            raise RuntimeError(f"pip install failed:\n{result.stderr}")        # Write a sample Python project with intentional issues for agents to find.        # All forks inherit this from the snapshot; no need to write per-agent.        target_files = {            "/workspace/target/auth.py": b'''import subprocessDB_PASSWORD = "hardcoded_secret_123"def authenticate(user_input):    return eval(user_input)def run_command(cmd):    return subprocess.call(cmd, shell=True)''',            "/workspace/target/logic.py": b'''def classify(a, b, c, d, e, f, g, h):    if a and b:        if c or d:            if not e and f:                return "path_a"            elif e and not f:                return "path_b"            elif g and h:                return "path_c"            else:                return "path_d"        elif g:            return "path_e"    return "path_f"''',        }        for path, content in target_files.items():            parent = "/".join(path.split("/")[:-1])            await sandbox.run("mkdir", ["-p", parent])            await sandbox.write_file(path, content)        # Verify tools work before snapshotting.        # A broken tool in the snapshot means broken forks.        verify = await sandbox.run(            "python3", ["-m", "bandit", "--version"]        )        if verify.exit_code != 0:            raise RuntimeError(f"Tool verification failed:\n{verify.stderr}")        snapshot = await sandbox.checkpoint(            checkpoint_type=CheckpointType.MEMORY        )    # Context manager terminates the base sandbox here.    if snapshot.status.value != "completed":        raise RuntimeError(f"Snapshot failed: {snapshot.status.value}")    return snapshot.snapshot_id
```

The async with pattern guarantees terminate() is called on exit, including on exceptions. Without it, any exception before a manual terminate() call leaves an orphaned VM running in the background. TensorLake’s async documentation shows this pattern explicitly.

result.exit_code comes from CommandResult, the SDK’s return type for run(). It has stdout: str, stderr: str, and exit_code: int. Note that stdout is already a string, not bytes, so no .decode() is needed anywhere.

The status check after checkpoint(): SnapshotStatus is an enum, so .value gives you “completed”, “in_progress”, or “failed”. The documentation shows checkpoint() returns a SnapshotInfo with a status field. Checking that status before proceeding is a useful defensive practice. I learned this after a failed snapshot left me debugging downstream agent failures.

This is the actual fork.

The call is AsyncSandbox.create(snapshot_id=snapshot_id). No special fork() method. No copy-on-write API. Just create() with a snapshot ID. Every call produces a fully independent VM starting from that snapshot’s frozen state.

```
PERSPECTIVES = ["Security", "Complexity", "Docstrings", "Tests", "Structure"]async def run_agent(agent_id: int, snapshot_id: str) -> AgentReport:    perspective = PERSPECTIVES[agent_id % len(PERSPECTIVES)]    t_start = time.time()    # cpus and memory_mb intentionally omitted.    # For MEMORY snapshots, resources are inherited from the snapshot    # and cannot be overridden at restore time.    async with await AsyncSandbox.create(        snapshot_id=snapshot_id,        allow_internet_access=False,  # code analysis is offline; no outbound needed        timeout_secs=120,    ) as sandbox:        await sandbox.write_file(            "/workspace/analyze.py",            ANALYSIS_SCRIPT.encode("utf-8")        )        result = await sandbox.run(            "python3",            ["/workspace/analyze.py"],            env={"PERSPECTIVE": perspective},            timeout=60,        )    elapsed = time.time() - t_start    if result.exit_code != 0:        raise RuntimeError(f"Agent {agent_id} failed:\n{result.stderr}")    output = json.loads(result.stdout.strip())    return AgentReport(        agent_id=agent_id,        perspective=perspective,        score=output["score"],        finding=output["finding"],        execution_time_s=elapsed,    )
```

**allow_internet_access=False** is safe here because bandit and radon analyze source code and do not make network calls. This parameter is not locked by MEMORY snapshots. TensorLake’s networking documentation recommends disabling outbound internet access for untrusted code.

**The dispatch script gets written fresh into each forked VM via sandbox.write_file(). Each agent’s VM is fully isolated:** writing to /workspace/analyze.py in fork 0 does not affect fork 1. The target project files are already there, inherited from the snapshot.

Since result.stdout is already a Python string, json.loads(result.stdout.strip()) works directly. The .strip() handles the trailing newline from print() inside the sandbox.

The sequential baseline exists for one reason: to give the speedup calculation a real denominator. Without it, you have a time with no context.

``` php
async def run_sequential(snapshot_id: str, count: int) -> SwarmResult:    reports = []    for i in range(count):        reports.append(await run_agent(i, snapshot_id))    return SwarmResult(mode="sequential", ...)async def run_parallel(snapshot_id: str, count: int) -> SwarmResult:    # asyncio.gather returns a list of results when awaited.    reports = await asyncio.gather(        *(run_agent(i, snapshot_id) for i in range(count))    )    reports.sort(key=lambda r: r.agent_id)    return SwarmResult(mode="parallel", ...)
```

asyncio.gather is what TensorLake’s async documentation recommends for concurrent sandbox fan-out. The ThreadPoolExecutor approach works too (the sync Sandbox API supports it), but if you are already in an async context, gather is cleaner.

The dispatch script runs inside each forked sandbox. It reads the PERSPECTIVE environment variable, routes to the right analysis function, and prints one JSON line to stdout.

**All five analyses are fully offline, with no network calls needed.**

```
# ANALYSIS_SCRIPT — runs INSIDE each forked sandboximport json, os, subprocess, ast, pathlib, sysPERSPECTIVE = os.environ["PERSPECTIVE"]TARGET = "/workspace/target"def run_security():    """bandit: find hardcoded secrets, unsafe eval, shell injection."""    r = subprocess.run(        ["python3", "-m", "bandit", "-r", TARGET, "-f", "json", "-q"],        capture_output=True, text=True    )    try:        data = json.loads(r.stdout)    except json.JSONDecodeError:        return {"score": 0, "finding": "bandit parse error"}    issues = data.get("results", [])    high = [i for i in issues if i.get("issue_severity") == "HIGH"]    return {        "issues": len(issues), "high": len(high),        "score": max(0, 100 - len(issues) * 10),        "finding": high[0]["issue_text"] if high else ("Minor issues" if issues else "Clean"),    }def run_complexity():    """radon: cyclomatic complexity per function."""    r = subprocess.run(        ["python3", "-m", "radon", "cc", TARGET, "-j"],        capture_output=True, text=True    )    try:        data = json.loads(r.stdout)    except json.JSONDecodeError:        return {"score": 0, "finding": "radon parse error"}    blocks = [b for file_blocks in data.values() for b in file_blocks]    complex_blocks = [b for b in blocks if b.get("complexity", 0) > 5]    avg = sum(b["complexity"] for b in blocks) / len(blocks) if blocks else 0    top = f"{complex_blocks[0]['name']} (cc={complex_blocks[0]['complexity']})" if complex_blocks else "All within threshold"    return {        "functions": len(blocks), "complex_count": len(complex_blocks),        "avg_cc": round(avg, 2),        "score": max(0, 100 - len(complex_blocks) * 15),        "finding": top,    }def run_docstrings():    """ast: count functions and classes that lack docstrings."""    total, documented = 0, 0    for path in pathlib.Path(TARGET).rglob("*.py"):        tree = ast.parse(path.read_text())        for node in ast.walk(tree):            if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):                total += 1                if ast.get_docstring(node):                    documented += 1    pct = int(documented / total * 100) if total else 100    return {"total": total, "documented": documented, "score": pct,            "finding": f"{documented}/{total} documented ({pct}%)"}def run_tests():    """Count test files relative to source files."""    all_py = list(pathlib.Path(TARGET).rglob("*.py"))    test_files = [f for f in all_py if f.stem.startswith("test_") or f.stem.endswith("_test")]    ratio = len(test_files) / len(all_py) * 100 if all_py else 0    return {        "source_files": len(all_py), "test_files": len(test_files),        "score": min(100, int(ratio * 2)),        "finding": f"{len(test_files)}/{len(all_py)} files are tests ({ratio:.0f}%)",    }def run_structure():    """ast: count functions, classes, imports across the codebase."""    stats = {"functions": 0, "classes": 0, "imports": 0, "files": 0}    for path in pathlib.Path(TARGET).rglob("*.py"):        stats["files"] += 1        tree = ast.parse(path.read_text())        for node in ast.walk(tree):            if isinstance(node, ast.FunctionDef):          stats["functions"] += 1            elif isinstance(node, ast.ClassDef):           stats["classes"] += 1            elif isinstance(node, (ast.Import, ast.ImportFrom)): stats["imports"] += 1    fpr = stats["functions"] / stats["files"] if stats["files"] else 0    return {**stats, "functions_per_file": round(fpr, 1),            "score": min(100, int(fpr * 20)),            "finding": f"{stats['functions']} functions across {stats['files']} files"}dispatch = {    "Security":   run_security,    "Complexity": run_complexity,    "Docstrings": run_docstrings,    "Tests":      run_tests,    "Structure":  run_structure,}fn = dispatch.get(PERSPECTIVE)if fn is None:    print(json.dumps({"error": f"Unknown perspective: {PERSPECTIVE}"}))    sys.exit(1)result = fn()result["perspective"] = PERSPECTIVEprint(json.dumps(result))
```

**Two things worth keeping when you adapt this.**

**Parameters via environment variables:** sandbox.run(env={“KEY”: “val”}) passes per-command variables and avoids shell escaping issues when values contain spaces or special characters. It also keeps the dispatch script stateless, with no hardcoded perspective names inside the script itself.

JSON to stdout: the orchestrator reads result.stdout.strip() and passes it directly to json.loads(). The script has one job: print exactly one valid JSON line. Any other stdout output (debug prints, progress bars) breaks the parse. Keep it strict.

After all five agents return, a single GPT-4o call synthesizes their findings into a prioritized action list.

``` php
def aggregate_with_llm(parallel: SwarmResult, sequential: SwarmResult) -> str:    client = OpenAI()    speedup = sequential.total_time_s / parallel.total_time_s    reports_block = "\n".join(        f"[{r.perspective}] Score: {r.score}/100 | {r.finding}"        for r in parallel.reports    )    prompt = (        "You are a senior engineering lead reviewing a parallel code analysis report.\n\n"        f"Agent Findings:\n{reports_block}\n\n"        "Benchmark:\n"        f"  Sequential : {sequential.total_time_s:.2f}s\n"        f"  Parallel   : {parallel.total_time_s:.2f}s\n"        f"  Speedup    : {speedup:.2f}x\n\n"        "Provide: overall codebase health score, top three issues to fix immediately "        "(with file and severity), recommended next actions, and one sentence on what "        "the parallel speedup means for running this at scale."    )    response = client.chat.completions.create(        model="gpt-4o",        messages=[{"role": "user", "content": prompt}],    )    return response.choices[0].message.content
```

The lead agent sees both the analysis findings and the timing benchmark in the same context. That is the reduce step in a map-reduce agent pattern: give the aggregator everything the workers produced, not just the domain data. The call is synchronous because there is nothing left to concurrently await at this point.

Both timelines contain the same agents doing the same work. What changes is when setup happens.

**These numbers are structural projections based on typical pip install times and sandbox warm-restore behavior, not measured results.** Your numbers will vary by workload and network conditions. Run the demo to measure your case.

**Without memory snapshots:**

```
Agent 0: [setup ~90s][work ~8s]Agent 1: [setup ~90s][work ~9s]Agent 2: [setup ~90s][work ~8s]Agent 3: [setup ~90s][work ~9s]Agent 4: [setup ~90s][work ~8s]Sequential total: ~490sParallel total:   ~100s  (setup still paid by each fork separately)
```

The speedup ratio looks similar on paper. The absolute time is not. At five agents the gap is 450 seconds versus 5 seconds of overhead. **At fifty agents it is 4,500 seconds versus 50 seconds.**

Setup time does not scale down with parallelism. It multiplies. The snapshot moves it outside the loop entirely.

**The benchmark captures four numbers: **sequential total time (the denominator), parallel total time (wall-clock from first fork to last return), speedup (sequential divided by parallel), and efficiency (speedup divided by agent count, multiplied by 100).

**Efficiency is the one most benchmarks skip**.

A 4.2x speedup across five agents is 84% parallel efficiency: 16% is lost to fork startup, scheduling, and I/O contention. That number matters when you scale from five agents to fifty.

The demo covers the happy path. Three things to add before production:

**Use it when:**

**Skip it when:**

**On filesystem performance: **Tensorlake publishes performance benchmarks on their GitHub comparing sandbox execution times across providers. Refer to their repository for current numbers.

**Running This**

```
pip install tensorlake openaiexport TENSORLAKE_API_KEY="your-key"export OPENAI_API_KEY="your-key"python3 agent.py
```

**Free tier at cloud.tensorlake.ai, no credit card required. **The demo takes 3–5 minutes end to end. After it runs, benchmark_results.json has the full per-agent timing data.

Phase 1 (base build and snapshot) runs once. If you want to run the benchmark multiple times, pass your existing snapshot ID directly and skip Phase 1. The snapshot persists between runs until you delete it.

**The first version** had plain await sandbox.terminate() at the end of each function. Two exceptions during testing left sandboxes running and billing for idle compute. Switched to async with await AsyncSandbox.create(…) as sandbox: and that stopped.

**The second version **called sandbox.checkpoint(sandbox.sandbox_id). I had copied the pattern from a CLI reference (tl sbx checkpoint <sandbox-id>) and assumed the Python SDK matched. It does not. The Python instance method takes no positional arguments: sandbox.checkpoint(checkpoint_type=CheckpointType.MEMORY). That is it.

**The third version **was the first to run end-to-end, but with CheckpointType.FILESYSTEM by default because I had not read the snapshots documentation carefully. The benchmark looked reasonable. The forks were doing full cold boots, and I was measuring them alongside the actual work and switching to CheckpointType.MEMORY was the change that made setup time disappear from per-fork timing.

**Small mistakes individually. What they share: **Tensorlake’s API is well documented, but the snapshot docs, the SDK reference, and the async docs are three separate pages. Read only the quickstart, and you miss two of the three things that matter most for this pattern.

**You can also check the complete project on my GitHub here:**

Running the same five agents sequentially and then in parallel is one of those moments where the architecture becomes legible in a way that documentation does not fully convey.

The snapshot moves setup cost from inside the loop to outside it. The agents still do the same work on the same hardware. **The savings come from not rebuilding an environment five times when it only needed to be built once.**

**Most multi-agent optimization advice focuses on LLM calls:** batching, caching, cheaper models. That advice is right. But if you have five agents each spending 90 seconds on pip installs before making a single inference call, no amount of LLM optimization helps until you address setup time first.

**The bottleneck was never the agents.**

It was rebuilding the same environment on every run. Snapshot it once, fork cheaply, and parallel execution finally delivers what you expected when you first wrote asyncio.gather.

[Why Most Multi-Agent AI Systems Waste 90% of Their Time (And How to Fix It)](https://pub.towardsai.net/why-most-multi-agent-ai-systems-waste-90-of-their-time-and-how-to-fix-it-c0ce81f0e323) was originally published in [Towards AI](https://pub.towardsai.net) on Medium, where people are continuing the conversation by highlighting and responding to this story.
