Open Source Project of the Day (#104): AgentScope 2.0 — Alibaba's Production-Ready Agent Framework Built Around Model Reasoning

Alibaba DAMO Academy released AgentScope 2.0, an open-source production-ready agent framework designed to leverage LLM reasoning without rigid pipeline constraints. The framework adds event systems, permission controls, multi-tenant isolation, and sandbox execution for shipping reliable agent systems.

"Build and run agents you can see, understand, and trust." This is article 104 in the Open Source Project of the Day series. Today's project is AgentScope 2.0 — Alibaba DAMO Academy's open-source production-ready agent framework. The agent framework space is crowded. LangChain centers on chain-based orchestration. AutoGen centers on multi-agent conversation. CrewAI centers on role-based collaboration. AgentScope's differentiation is in its design philosophy: when LLM reasoning is strong enough, the framework should step back rather than constraining the model's decision space with rigid pipelines. AgentScope 2.0 adds the production infrastructure that philosophy requires: event system, permission controls, multi-tenant isolation, sandbox execution, middleware hooks. The goal is not a demo that runs — it's a system that ships. AgentScope 2.0 is a production-ready agent framework — "an agent development platform with essential abstractions, designed to work with rising model capability, with built-in production support." The core problem it addresses: traditional agent frameworks constrain LLMs with rigid pipelines and opinionated prompt templates. As LLM reasoning capability has improved rapidly, that constraint has become a bottleneck. AgentScope shifts to "letting the model's native reasoning and tool-use capabilities drive agent behavior" — the framework provides production infrastructure, not execution path constraints. The minimum working unit in AgentScope 2.0 is an Agent , extended by composing systems: python import asyncio from agentscope import Agent, Toolkit, DashScopeChatModel, DashScopeCredential from agentscope.tools import Bash, Grep, Glob, Read, Write from agentscope.message import UserMsg Define a toolkit toolkit = Toolkit tools= Bash , Grep , Glob , Read , Write Create an agent agent = Agent name="code-assistant", system prompt="You are a code assistant that helps users analyze and modify codebases.", model=DashScopeChatModel credential=DashScopeCredential api key="your key" , model="qwen3.6-plus" , toolkit=toolkit Streaming reasoning loop async def run : async for evt in agent.reply stream UserMsg "user", "Analyze the structure of this codebase" : match evt.type: case EventType.TEXT BLOCK DELTA: print evt.delta, end="", flush=True case EventType.TOOL CALL START: print f"\n Tool call {evt.tool name}" asyncio.run run 1. Event System A unified event bus connecting all phases of the agent's reasoning process: EventType.REPLY START Agent begins responding EventType.MODEL CALL START Model call initiated EventType.TEXT BLOCK START Text block starts EventType.TEXT BLOCK DELTA Streaming text delta EventType.TEXT BLOCK END Text block complete EventType.TOOL CALL START Tool call initiated EventType.TOOL CALL END Tool call complete Human-in-the-loop workflows attach through the event system: pause the agent on a specific event, wait for human confirmation, resume execution. 2. Permission System Fine-grained control over which tool calls require approval vs. automatic execution: python from agentscope.permission import PermissionConfig, ApprovalMode config = PermissionConfig File writes require confirmation Write: ApprovalMode.ALWAYS, Shell execution requires confirmation Bash: ApprovalMode.ALWAYS, Reads are automatic Read: ApprovalMode.NEVER, Operations over $0.10 require confirmation default cost threshold=0.10 Permission Bypass Mode : For testing or trusted scenarios, disable all approvals and let the agent run fully autonomously. 3. Multi-Tenancy / Session Isolation The FastAPI service layer provides production-grade tenant and session isolation: 4. Workspace / Sandbox Execution Three backend options for isolated tool execution: | Backend | Best for | |---|---| | Local | Development and testing, fastest | | Docker | Production, dependency isolation | | E2B | Cloud sandbox, highest security | 5. Middleware System Insert composable hooks into the agent's reasoning-acting loop without modifying core agent code: python from agentscope.middleware import LoggingMiddleware, GuardrailMiddleware agent = Agent ... middlewares= LoggingMiddleware log tool calls=True , GuardrailMiddleware blocked patterns= "rm -rf", "DROP TABLE" , Leader-Worker pattern: a Leader Agent decomposes tasks and creates Worker agents via built-in team tools, then aggregates results. python from agentscope.tools import TeamTools Leader has team tools — can create and coordinate workers leader = Agent name="research-leader", system prompt="You lead a research team. Decompose tasks and synthesize results.", model=model, toolkit=Toolkit tools= TeamTools At runtime, the leader automatically decomposes: "Analyze the core arguments of these 5 papers" → Creates 5 workers, one per paper → Aggregates results Worker agents' capabilities are determined dynamically by the leader at runtime — no need to predefine all possible worker types. Agents decompose complex tasks into tracked plan steps, updating state in real time as execution proceeds: Task: "Write a complete test suite for this Python project" Agent generates plan: Step 1: In progress Scan project structure, identify all modules Step 2: Waiting Analyze public API of each module Step 3: Waiting Generate unit tests Step 4: Waiting Generate integration tests Step 5: Waiting Run test suite, fix failures Step 1 completes → Step 2 starts automatically, plan state updates Long-running tool calls file processing, network requests, code compilation shift to background without blocking the agent conversation stream: User: "Compile this large C++ project and run the tests" Agent: Launches background task, continues conversation immediately Agent: "Compilation started in background, estimated 5 minutes. I can help with other things while you wait." ... 5 minutes later System notification: background task complete Agent: "Compilation complete. Test results: ..." This is the most fundamental difference between AgentScope 2.0 and many comparable frameworks: Traditional approach LangChain-style : Developer defines a fixed chain: Step 1 → Step 2 → Step 3 developer decides what happens at each step The model fills in blanks within each step AgentScope approach: Developer provides: toolkit + permissions + constraints Model decides: what to do, in what order, with which tools Framework handles: production safety, observability, human-in-the-loop When model reasoning was weak, fixed pipelines were correct — models needed guidance. When model reasoning is strong enough, fixed pipelines become constraints — the model has better plans it can't execute. AgentScope 2.0's timing judgment: mainstream models from 2025 onward are capable enough to deserve more autonomy. The standard async for evt in agent.reply stream pattern enables: A separate AgentScope Runtime runtime.agentscope.io provides a complete production service layer: AgentScope is not just a framework — there's a complete toolchain behind it: | Component | Function | |---|---| AgentScope Studio | Visual debugging tool for agent runs | ReMe | Cross-session persistent memory file-based + vector-based | OpenJudge | 50+ judges code, math, tool use, multimodal output | Trinity-RFT | Agent fine-tuning framework decoupled Explorer/Trainer/Buffer | Mem0 integration | Long-term memory added June 2026 | | Dimension | LangChain | AutoGen | AgentScope 2.0 | |---|---|---|---| | Core pattern | Chain-based | Multi-agent conversation | Model-reasoning-led | | Production infra | Third-party | Third-party | Built-in | | Sandbox execution | None | Limited | Local / Docker / E2B | | Human-in-the-loop | Plugin | Native | Event system native | | Evaluation system | None | None | OpenJudge 50+ judges | | Fine-tuning support | None | None | Trinity-RFT | | Academic backing | Yes | Yes | Yes 2 arXiv papers | The most significant gap: AgentScope covers the full agent lifecycle — framework → memory → evaluation → fine-tuning → apps. LangChain and AutoGen stop at the framework and memory layers. Install: pip install agentscope Or from source: git clone https://github.com/agentscope-ai/agentscope.git pip install -e . Run the web UI: cd agentscope pnpm install && pnpm run dev frontend python -m agentscope.service backend AgentScope 2.0's timing is deliberate: at a moment when LLM reasoning capability is advancing fast, it chooses "reduce framework constraints, let the model lead" as its direction. The five core systems Event / Permission / Workspace / Multi-tenancy / Middleware address the production pain points of traditional frameworks: poor observability, no fine-grained tool permission control, difficulty serving multiple users, and security constraints mixed into business logic. The ecosystem coverage is what separates it most clearly. Framework → memory → evaluation → fine-tuning is a complete chain that LangChain and AutoGen haven't built. OpenJudge alone — 50+ judges covering code, math, tool use, and multimodal output — fills a gap that most teams solve by writing evaluation scripts from scratch. 27.1k Stars, 40 releases, two arXiv papers, and an Alibaba engineering team behind it. Among production-grade agent frameworks, AgentScope 2.0 is one of the most thorough options currently available. Explore PrimeSkills — A marketplace for handpicked AI Agents and skills. Each is validated in real enterprise workflows, stripping away hype and keeping only what truly works. Welcome to my Homepage for more useful insights and interesting products.