Why Most Multi-Agent AI Systems Waste 90% of Their Time (And How to Fix It)

Engineers often assume multi-agent AI performance is a concurrency problem, but the real bottleneck is agents spending most of their time preparing to work rather than working. The fix is using memory snapshots to clone a pre-configured environment, allowing parallel agents to skip setup and run analyses simultaneously. Tensorlake's sandbox technology enables this by capturing full memory state for instant cloning, dramatically reducing overhead.

Most engineers think multi-agent performance is a concurrency problem. I did too. So when five AI agents running in parallel barely outperformed a sequential run, I assumed something was wrong with my orchestration. I was looking in the wrong place. Each agent was spending more time preparing to work than actually working. The fix wasn’t more threads, better async code, or a faster model. It was a memory snapshot. And once I saw where the time was really going, an entire class of multi-agent bottlenecks suddenly made sense. Here is what that looks like, what took me three iterations to get right, and where it still has rough edges. Let’s get the mental model first. The idea is straightforward: instead of five agents each spending 90 seconds installing the same tools, install them once, freeze that environment, and stamp out five identical copies. Each copy runs a different analysis in parallel. A lead LLM reads all five results and tells you what to fix first. In code: Setup time is paid once, upfront, before any agent runs. The rest of this article explains how. If you have not worked with sandboxes before: think of one as a disposable computer that lives in the cloud. You spin it up, run whatever code you need inside it, and throw it away when you’re done. It has its own filesystem, its own processes, its own network. Nothing it does can touch your machine or any other sandbox running at the same time. In short: Sandboxes provide the agent with a secure and isolated enviornement That isolation is the whole point. Your agent can install packages, write files, crash badly, or spin up a browser, and none of it bleeds out. When the task is done, you terminate the VM, and it is gone. The next agent starts clean . Most agent frameworks treat the execution environment as an afterthought. The LLM call is the interesting part. The environment is just “wherever the code runs.” That works fine for single-turn tasks. It breaks down fast for anything multi-step. When an agent needs to install packages, write intermediate files, maintain a browser session across multiple pages, or resume a task from a different machine, you need the execution environment to behave like a persistent object, not a function call that resets on every invocation. Tensorlake gives each agent a MicroVM backed by Firecracker and CloudHypervisor, optimized for fast boot times and strong isolation. Each sandbox is a full Linux VM. It boots in hundreds of milliseconds, persists filesystem and memory state across sessions, and can be snapshotted at any point in its lifecycle. Tensorlake also lets you spin up multiple sandboxes in parallel for concurrent agent execution, and honestly it is one of my favourite things about it. It also ranks in the top 5 of SandboxBenchmarks . What changes the math is a single question: what does the snapshot actually capture? Quick vocabulary before the details. Tensorlake sandboxes have four lifecycle modes. This project uses the last two. Suspend and Snapshot both preserve state, but serve different purposes : Suspend is for pausing this sandbox to resume later, while a snapshot is a reusable artifact for retrying from a checkpoint or cloning an environment. Tensorlake supports two checkpoint types. Most tutorials only mention one. The checkpoint type is not a performance detail. It determines whether your fork is a clone or a restart. The default when you call sandbox.checkpoint with no arguments is filesystem. That is the wrong choice for a parallel swarm where agents share a prepared environment. You want memory. One more constraint worth knowing upfront: for memory snapshots, resources CPUs, RAM are baked into the snapshot at checkpoint time. You cannot override them when creating forks. Set the right cpus and memory mb on the base sandbox before you checkpoint. Every fork inherits them automatically. The pattern has five distinct phases. Each one has a single responsibility. Phase 1 — Base Snapshot: Spins up a single baseline sandbox, installs analysis tools bandit, radon , writes the target code, and checkpoints the entire running VM state using CheckpointType.MEMORY. The base sandbox is then terminated, leaving behind the reusable snapshot ID. Phase 2 — Agent Forking: Restores 5 independent sandboxes concurrently from the base snapshot using sandbox.fork … . Each fork is a warm start that inherits all installed tools, environment settings, and target files. Phase 3 — Sequential Baseline Timing : Runs each agent’s analysis script analyze.py one-by-one inside its respective sandbox to measure sequential time as a benchmark denominator. Phase 4 — Parallel Swarm: Executes all 5 agents concurrently using asyncio.gather … . Each agent runs the same analysis script inside its isolated sandbox but with a different focus configuration passed via the PERSPECTIVE environment variable. Phase 5 — LLM Aggregation: Collects the individual reports Security, Complexity, Docstrings, Tests, Structure alongside the timing data, and passes them to the lead LLM GPT to synthesize a single prioritized fix list. Phase 1 runs once. Phases 2 through 4 run every time you want results. The fork is cheap. The base environment build is not, but you only pay that cost once per snapshot. The base sandbox installs the analysis tools, writes the target codebase into the VM, then snapshots the entire state. Every fork inherits both the tools and the target project automatically. python from tensorlake.sandbox import AsyncSandbox, CheckpointTypeasync def build base snapshot - str: async with await AsyncSandbox.create name="base-swarm-env", cpus=2.0, memory mb=2048, timeout secs=600, as sandbox: Install analysis tools. These are baked into the snapshot and available to every forked agent at no extra install cost. result = await sandbox.run "pip", "install", "bandit", "radon", "--user", "--break-system-packages", "-q" , timeout=180, if result.exit code = 0: raise RuntimeError f"pip install failed:\n{result.stderr}" Write a sample Python project with intentional issues for agents to find. All forks inherit this from the snapshot; no need to write per-agent. target files = { "/workspace/target/auth.py": b'''import subprocessDB PASSWORD = "hardcoded secret 123"def authenticate user input : return eval user input def run command cmd : return subprocess.call cmd, shell=True ''', "/workspace/target/logic.py": b'''def classify a, b, c, d, e, f, g, h : if a and b: if c or d: if not e and f: return "path a" elif e and not f: return "path b" elif g and h: return "path c" else: return "path d" elif g: return "path e" return "path f"''', } for path, content in target files.items : parent = "/".join path.split "/" :-1 await sandbox.run "mkdir", "-p", parent await sandbox.write file path, content Verify tools work before snapshotting. A broken tool in the snapshot means broken forks. verify = await sandbox.run "python3", "-m", "bandit", "--version" if verify.exit code = 0: raise RuntimeError f"Tool verification failed:\n{verify.stderr}" snapshot = await sandbox.checkpoint checkpoint type=CheckpointType.MEMORY Context manager terminates the base sandbox here. if snapshot.status.value = "completed": raise RuntimeError f"Snapshot failed: {snapshot.status.value}" return snapshot.snapshot id The async with pattern guarantees terminate is called on exit, including on exceptions. Without it, any exception before a manual terminate call leaves an orphaned VM running in the background. TensorLake’s async documentation shows this pattern explicitly. result.exit code comes from CommandResult, the SDK’s return type for run . It has stdout: str, stderr: str, and exit code: int. Note that stdout is already a string, not bytes, so no .decode is needed anywhere. The status check after checkpoint : SnapshotStatus is an enum, so .value gives you “completed”, “in progress”, or “failed”. The documentation shows checkpoint returns a SnapshotInfo with a status field. Checking that status before proceeding is a useful defensive practice. I learned this after a failed snapshot left me debugging downstream agent failures. This is the actual fork. The call is AsyncSandbox.create snapshot id=snapshot id . No special fork method. No copy-on-write API. Just create with a snapshot ID. Every call produces a fully independent VM starting from that snapshot’s frozen state. PERSPECTIVES = "Security", "Complexity", "Docstrings", "Tests", "Structure" async def run agent agent id: int, snapshot id: str - AgentReport: perspective = PERSPECTIVES agent id % len PERSPECTIVES t start = time.time cpus and memory mb intentionally omitted. For MEMORY snapshots, resources are inherited from the snapshot and cannot be overridden at restore time. async with await AsyncSandbox.create snapshot id=snapshot id, allow internet access=False, code analysis is offline; no outbound needed timeout secs=120, as sandbox: await sandbox.write file "/workspace/analyze.py", ANALYSIS SCRIPT.encode "utf-8" result = await sandbox.run "python3", "/workspace/analyze.py" , env={"PERSPECTIVE": perspective}, timeout=60, elapsed = time.time - t start if result.exit code = 0: raise RuntimeError f"Agent {agent id} failed:\n{result.stderr}" output = json.loads result.stdout.strip return AgentReport agent id=agent id, perspective=perspective, score=output "score" , finding=output "finding" , execution time s=elapsed, allow internet access=False is safe here because bandit and radon analyze source code and do not make network calls. This parameter is not locked by MEMORY snapshots. TensorLake’s networking documentation recommends disabling outbound internet access for untrusted code. The dispatch script gets written fresh into each forked VM via sandbox.write file . Each agent’s VM is fully isolated: writing to /workspace/analyze.py in fork 0 does not affect fork 1. The target project files are already there, inherited from the snapshot. Since result.stdout is already a Python string, json.loads result.stdout.strip works directly. The .strip handles the trailing newline from print inside the sandbox. The sequential baseline exists for one reason: to give the speedup calculation a real denominator. Without it, you have a time with no context. php async def run sequential snapshot id: str, count: int - SwarmResult: reports = for i in range count : reports.append await run agent i, snapshot id return SwarmResult mode="sequential", ... async def run parallel snapshot id: str, count: int - SwarmResult: asyncio.gather returns a list of results when awaited. reports = await asyncio.gather run agent i, snapshot id for i in range count reports.sort key=lambda r: r.agent id return SwarmResult mode="parallel", ... asyncio.gather is what TensorLake’s async documentation recommends for concurrent sandbox fan-out. The ThreadPoolExecutor approach works too the sync Sandbox API supports it , but if you are already in an async context, gather is cleaner. The dispatch script runs inside each forked sandbox. It reads the PERSPECTIVE environment variable, routes to the right analysis function, and prints one JSON line to stdout. All five analyses are fully offline, with no network calls needed. ANALYSIS SCRIPT — runs INSIDE each forked sandboximport json, os, subprocess, ast, pathlib, sysPERSPECTIVE = os.environ "PERSPECTIVE" TARGET = "/workspace/target"def run security : """bandit: find hardcoded secrets, unsafe eval, shell injection.""" r = subprocess.run "python3", "-m", "bandit", "-r", TARGET, "-f", "json", "-q" , capture output=True, text=True try: data = json.loads r.stdout except json.JSONDecodeError: return {"score": 0, "finding": "bandit parse error"} issues = data.get "results", high = i for i in issues if i.get "issue severity" == "HIGH" return { "issues": len issues , "high": len high , "score": max 0, 100 - len issues 10 , "finding": high 0 "issue text" if high else "Minor issues" if issues else "Clean" , }def run complexity : """radon: cyclomatic complexity per function.""" r = subprocess.run "python3", "-m", "radon", "cc", TARGET, "-j" , capture output=True, text=True try: data = json.loads r.stdout except json.JSONDecodeError: return {"score": 0, "finding": "radon parse error"} blocks = b for file blocks in data.values for b in file blocks complex blocks = b for b in blocks if b.get "complexity", 0 5 avg = sum b "complexity" for b in blocks / len blocks if blocks else 0 top = f"{complex blocks 0 'name' } cc={complex blocks 0 'complexity' } " if complex blocks else "All within threshold" return { "functions": len blocks , "complex count": len complex blocks , "avg cc": round avg, 2 , "score": max 0, 100 - len complex blocks 15 , "finding": top, }def run docstrings : """ast: count functions and classes that lack docstrings.""" total, documented = 0, 0 for path in pathlib.Path TARGET .rglob " .py" : tree = ast.parse path.read text for node in ast.walk tree : if isinstance node, ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef : total += 1 if ast.get docstring node : documented += 1 pct = int documented / total 100 if total else 100 return {"total": total, "documented": documented, "score": pct, "finding": f"{documented}/{total} documented {pct}% "}def run tests : """Count test files relative to source files.""" all py = list pathlib.Path TARGET .rglob " .py" test files = f for f in all py if f.stem.startswith "test " or f.stem.endswith " test" ratio = len test files / len all py 100 if all py else 0 return { "source files": len all py , "test files": len test files , "score": min 100, int ratio 2 , "finding": f"{len test files }/{len all py } files are tests {ratio:.0f}% ", }def run structure : """ast: count functions, classes, imports across the codebase.""" stats = {"functions": 0, "classes": 0, "imports": 0, "files": 0} for path in pathlib.Path TARGET .rglob " .py" : stats "files" += 1 tree = ast.parse path.read text for node in ast.walk tree : if isinstance node, ast.FunctionDef : stats "functions" += 1 elif isinstance node, ast.ClassDef : stats "classes" += 1 elif isinstance node, ast.Import, ast.ImportFrom : stats "imports" += 1 fpr = stats "functions" / stats "files" if stats "files" else 0 return { stats, "functions per file": round fpr, 1 , "score": min 100, int fpr 20 , "finding": f"{stats 'functions' } functions across {stats 'files' } files"}dispatch = { "Security": run security, "Complexity": run complexity, "Docstrings": run docstrings, "Tests": run tests, "Structure": run structure,}fn = dispatch.get PERSPECTIVE if fn is None: print json.dumps {"error": f"Unknown perspective: {PERSPECTIVE}"} sys.exit 1 result = fn result "perspective" = PERSPECTIVEprint json.dumps result Two things worth keeping when you adapt this. Parameters via environment variables: sandbox.run env={“KEY”: “val”} passes per-command variables and avoids shell escaping issues when values contain spaces or special characters. It also keeps the dispatch script stateless, with no hardcoded perspective names inside the script itself. JSON to stdout: the orchestrator reads result.stdout.strip and passes it directly to json.loads . The script has one job: print exactly one valid JSON line. Any other stdout output debug prints, progress bars breaks the parse. Keep it strict. After all five agents return, a single GPT-4o call synthesizes their findings into a prioritized action list. php def aggregate with llm parallel: SwarmResult, sequential: SwarmResult - str: client = OpenAI speedup = sequential.total time s / parallel.total time s reports block = "\n".join f" {r.perspective} Score: {r.score}/100 | {r.finding}" for r in parallel.reports prompt = "You are a senior engineering lead reviewing a parallel code analysis report.\n\n" f"Agent Findings:\n{reports block}\n\n" "Benchmark:\n" f" Sequential : {sequential.total time s:.2f}s\n" f" Parallel : {parallel.total time s:.2f}s\n" f" Speedup : {speedup:.2f}x\n\n" "Provide: overall codebase health score, top three issues to fix immediately " " with file and severity , recommended next actions, and one sentence on what " "the parallel speedup means for running this at scale." response = client.chat.completions.create model="gpt-4o", messages= {"role": "user", "content": prompt} , return response.choices 0 .message.content The lead agent sees both the analysis findings and the timing benchmark in the same context. That is the reduce step in a map-reduce agent pattern: give the aggregator everything the workers produced, not just the domain data. The call is synchronous because there is nothing left to concurrently await at this point. Both timelines contain the same agents doing the same work. What changes is when setup happens. These numbers are structural projections based on typical pip install times and sandbox warm-restore behavior, not measured results. Your numbers will vary by workload and network conditions. Run the demo to measure your case. Without memory snapshots: Agent 0: setup ~90s work ~8s Agent 1: setup ~90s work ~9s Agent 2: setup ~90s work ~8s Agent 3: setup ~90s work ~9s Agent 4: setup ~90s work ~8s Sequential total: ~490sParallel total: ~100s setup still paid by each fork separately The speedup ratio looks similar on paper. The absolute time is not. At five agents the gap is 450 seconds versus 5 seconds of overhead. At fifty agents it is 4,500 seconds versus 50 seconds. Setup time does not scale down with parallelism. It multiplies. The snapshot moves it outside the loop entirely. The benchmark captures four numbers: sequential total time the denominator , parallel total time wall-clock from first fork to last return , speedup sequential divided by parallel , and efficiency speedup divided by agent count, multiplied by 100 . Efficiency is the one most benchmarks skip . A 4.2x speedup across five agents is 84% parallel efficiency: 16% is lost to fork startup, scheduling, and I/O contention. That number matters when you scale from five agents to fifty. The demo covers the happy path. Three things to add before production: Use it when: Skip it when: On filesystem performance: Tensorlake publishes performance benchmarks on their GitHub comparing sandbox execution times across providers. Refer to their repository for current numbers. Running This pip install tensorlake openaiexport TENSORLAKE API KEY="your-key"export OPENAI API KEY="your-key"python3 agent.py Free tier at cloud.tensorlake.ai, no credit card required. The demo takes 3–5 minutes end to end. After it runs, benchmark results.json has the full per-agent timing data. Phase 1 base build and snapshot runs once. If you want to run the benchmark multiple times, pass your existing snapshot ID directly and skip Phase 1. The snapshot persists between runs until you delete it. The first version had plain await sandbox.terminate at the end of each function. Two exceptions during testing left sandboxes running and billing for idle compute. Switched to async with await AsyncSandbox.create … as sandbox: and that stopped. The second version called sandbox.checkpoint sandbox.sandbox id . I had copied the pattern from a CLI reference tl sbx checkpoint <sandbox-id and assumed the Python SDK matched. It does not. The Python instance method takes no positional arguments: sandbox.checkpoint checkpoint type=CheckpointType.MEMORY . That is it. The third version was the first to run end-to-end, but with CheckpointType.FILESYSTEM by default because I had not read the snapshots documentation carefully. The benchmark looked reasonable. The forks were doing full cold boots, and I was measuring them alongside the actual work and switching to CheckpointType.MEMORY was the change that made setup time disappear from per-fork timing. Small mistakes individually. What they share: Tensorlake’s API is well documented, but the snapshot docs, the SDK reference, and the async docs are three separate pages. Read only the quickstart, and you miss two of the three things that matter most for this pattern. You can also check the complete project on my GitHub here: Running the same five agents sequentially and then in parallel is one of those moments where the architecture becomes legible in a way that documentation does not fully convey. The snapshot moves setup cost from inside the loop to outside it. The agents still do the same work on the same hardware. The savings come from not rebuilding an environment five times when it only needed to be built once. Most multi-agent optimization advice focuses on LLM calls: batching, caching, cheaper models. That advice is right. But if you have five agents each spending 90 seconds on pip installs before making a single inference call, no amount of LLM optimization helps until you address setup time first. The bottleneck was never the agents. It was rebuilding the same environment on every run. Snapshot it once, fork cheaply, and parallel execution finally delivers what you expected when you first wrote asyncio.gather. Why Most Multi-Agent AI Systems Waste 90% of Their Time And How to Fix It https://pub.towardsai.net/why-most-multi-agent-ai-systems-waste-90-of-their-time-and-how-to-fix-it-c0ce81f0e323 was originally published in Towards AI https://pub.towardsai.net on Medium, where people are continuing the conversation by highlighting and responding to this story.