"Build and run agents you can see, understand, and trust."
This is article #104 in the Open Source Project of the Day series. Today's project is AgentScope 2.0 — Alibaba DAMO Academy's open-source production-ready agent framework.
The agent framework space is crowded. LangChain centers on chain-based orchestration. AutoGen centers on multi-agent conversation. CrewAI centers on role-based collaboration. AgentScope's differentiation is in its design philosophy: when LLM reasoning is strong enough, the framework should step back rather than constraining the model's decision space with rigid pipelines.
AgentScope 2.0 adds the production infrastructure that philosophy requires: event system, permission controls, multi-tenant isolation, sandbox execution, middleware hooks. The goal is not a demo that runs — it's a system that ships.
AgentScope 2.0 is a production-ready agent framework — "an agent development platform with essential abstractions, designed to work with rising model capability, with built-in production support."
The core problem it addresses: traditional agent frameworks constrain LLMs with rigid pipelines and opinionated prompt templates. As LLM reasoning capability has improved rapidly, that constraint has become a bottleneck. AgentScope shifts to "letting the model's native reasoning and tool-use capabilities drive agent behavior" — the framework provides production infrastructure, not execution path constraints.
The minimum working unit in AgentScope 2.0 is an Agent
, extended by composing systems:
import asyncio
from agentscope import Agent, Toolkit, DashScopeChatModel, DashScopeCredential
from agentscope.tools import Bash, Grep, Glob, Read, Write
from agentscope.message import UserMsg
toolkit = Toolkit(tools=[Bash(), Grep(), Glob(), Read(), Write()])
agent = Agent(
name="code-assistant",
system_prompt="You are a code assistant that helps users analyze and modify codebases.",
model=DashScopeChatModel(
credential=DashScopeCredential(api_key="your_key"),
model="qwen3.6-plus"
),
toolkit=toolkit
)
async def run():
async for evt in agent.reply_stream(UserMsg("user", "Analyze the structure of this codebase")):
match evt.type:
case EventType.TEXT_BLOCK_DELTA:
print(evt.delta, end="", flush=True)
case EventType.TOOL_CALL_START:
print(f"\n[Tool call] {evt.tool_name}")
asyncio.run(run())
1. Event System
A unified event bus connecting all phases of the agent's reasoning process:
EventType.REPLY_START # Agent begins responding
EventType.MODEL_CALL_START # Model call initiated
EventType.TEXT_BLOCK_START # Text block starts
EventType.TEXT_BLOCK_DELTA # Streaming text delta
EventType.TEXT_BLOCK_END # Text block complete
EventType.TOOL_CALL_START # Tool call initiated
EventType.TOOL_CALL_END # Tool call complete
Human-in-the-loop workflows attach through the event system: the agent on a specific event, wait for human confirmation, resume execution.
2. Permission System
Fine-grained control over which tool calls require approval vs. automatic execution:
from agentscope.permission import PermissionConfig, ApprovalMode
config = PermissionConfig(
Write: ApprovalMode.ALWAYS,
Bash: ApprovalMode.ALWAYS,
Read: ApprovalMode.NEVER,
default_cost_threshold=0.10
)
Permission Bypass Mode: For testing or trusted scenarios, disable all approvals and let the agent run fully autonomously.
3. Multi-Tenancy / Session Isolation
The FastAPI service layer provides production-grade tenant and session isolation:
4. Workspace / Sandbox Execution
Three backend options for isolated tool execution:
| Backend | Best for |
|---|---|
| Local | Development and testing, fastest |
| Docker | Production, dependency isolation |
| E2B | Cloud sandbox, highest security |
5. Middleware System
Insert composable hooks into the agent's reasoning-acting loop without modifying core agent code:
from agentscope.middleware import LoggingMiddleware, GuardrailMiddleware
agent = Agent(
...
middlewares=[
LoggingMiddleware(log_tool_calls=True),
GuardrailMiddleware(blocked_patterns=["rm -rf", "DROP TABLE"]),
]
)
Leader-Worker pattern: a Leader Agent decomposes tasks and creates Worker agents via built-in team tools, then aggregates results.
from agentscope.tools import TeamTools
leader = Agent(
name="research-leader",
system_prompt="You lead a research team. Decompose tasks and synthesize results.",
model=model,
toolkit=Toolkit(tools=[*TeamTools()])
)
Worker agents' capabilities are determined dynamically by the leader at runtime — no need to predefine all possible worker types.
Agents decompose complex tasks into tracked plan steps, updating state in real time as execution proceeds:
Task: "Write a complete test suite for this Python project"
Agent generates plan:
Step 1: [In progress] Scan project structure, identify all modules
Step 2: [Waiting] Analyze public API of each module
Step 3: [Waiting] Generate unit tests
Step 4: [Waiting] Generate integration tests
Step 5: [Waiting] Run test suite, fix failures
Step 1 completes → Step 2 starts automatically, plan state updates
Long-running tool calls (file processing, network requests, code compilation) shift to background without blocking the agent conversation stream:
User: "Compile this large C++ project and run the tests"
Agent: [Launches background task, continues conversation immediately]
Agent: "Compilation started in background, estimated 5 minutes.
I can help with other things while you wait."
...(5 minutes later)
System notification: background task complete
Agent: "Compilation complete. Test results: ..."
This is the most fundamental difference between AgentScope 2.0 and many comparable frameworks:
Traditional approach (LangChain-style):
Developer defines a fixed chain:
Step 1 → Step 2 → Step 3 (developer decides what happens at each step)
The model fills in blanks within each step
AgentScope approach:
Developer provides: toolkit + permissions + constraints
Model decides: what to do, in what order, with which tools
Framework handles: production safety, observability, human-in-the-loop
When model reasoning was weak, fixed pipelines were correct — models needed guidance. When model reasoning is strong enough, fixed pipelines become constraints — the model has better plans it can't execute. AgentScope 2.0's timing judgment: mainstream models from 2025 onward are capable enough to deserve more autonomy.
The standard async for evt in agent.reply_stream()
pattern enables:
A separate AgentScope Runtime (runtime.agentscope.io) provides a complete production service layer:
AgentScope is not just a framework — there's a complete toolchain behind it:
| Component | Function |
|---|---|
| AgentScope Studio | |
| Visual debugging tool for agent runs | |
| ReMe | |
| Cross-session persistent memory (file-based + vector-based) | |
| OpenJudge | |
| 50+ judges (code, math, tool use, multimodal output) | |
| Trinity-RFT | |
| Agent fine-tuning framework (decoupled Explorer/Trainer/Buffer) | |
| Mem0 integration | |
| Long-term memory (added June 2026) |
| Dimension | LangChain | AutoGen | AgentScope 2.0 |
|---|---|---|---|
| Core pattern | Chain-based | Multi-agent conversation | Model-reasoning-led |
| Production infra | Third-party | Third-party | Built-in |
| Sandbox execution | None | Limited | Local / Docker / E2B |
| Human-in-the-loop | Plugin | Native | Event system native |
| Evaluation system | None | None | OpenJudge (50+ judges) |
| Fine-tuning support | None | None | Trinity-RFT |
| Academic backing | Yes | Yes | Yes (2 arXiv papers) |
The most significant gap: AgentScope covers the full agent lifecycle — framework → memory → evaluation → fine-tuning → apps. LangChain and AutoGen stop at the framework and memory layers.
Install:
pip install agentscope
Or from source:
git clone https://github.com/agentscope-ai/agentscope.git
pip install -e .
Run the web UI:
cd agentscope
pnpm install && pnpm run dev # frontend
python -m agentscope.service # backend
AgentScope 2.0's timing is deliberate: at a moment when LLM reasoning capability is advancing fast, it chooses "reduce framework constraints, let the model lead" as its direction.
The five core systems (Event / Permission / Workspace / Multi-tenancy / Middleware) address the production pain points of traditional frameworks: poor observability, no fine-grained tool permission control, difficulty serving multiple users, and security constraints mixed into business logic.
The ecosystem coverage is what separates it most clearly. Framework → memory → evaluation → fine-tuning is a complete chain that LangChain and AutoGen haven't built. OpenJudge alone — 50+ judges covering code, math, tool use, and multimodal output — fills a gap that most teams solve by writing evaluation scripts from scratch.
27.1k Stars, 40 releases, two arXiv papers, and an Alibaba engineering team behind it. Among production-grade agent frameworks, AgentScope 2.0 is one of the most thorough options currently available.
Explore PrimeSkills — A marketplace for handpicked AI Agents and skills. Each is validated in real enterprise workflows, stripping away hype and keeping only what truly works.
Welcome to my Homepage for more useful insights and interesting products.