Open Source Project of the Day (#104): AgentScope 2.0 — Alibaba's Production-Ready Agent Framework Built Around Model Reasoning

wpnews.pro

"Build and run agents you can see, understand, and trust."

This is article #104 in the Open Source Project of the Day series. Today's project is AgentScope 2.0 — Alibaba DAMO Academy's open-source production-ready agent framework.

The agent framework space is crowded. LangChain centers on chain-based orchestration. AutoGen centers on multi-agent conversation. CrewAI centers on role-based collaboration. AgentScope's differentiation is in its design philosophy: when LLM reasoning is strong enough, the framework should step back rather than constraining the model's decision space with rigid pipelines.

AgentScope 2.0 adds the production infrastructure that philosophy requires: event system, permission controls, multi-tenant isolation, sandbox execution, middleware hooks. The goal is not a demo that runs — it's a system that ships.

AgentScope 2.0 is a production-ready agent framework — "an agent development platform with essential abstractions, designed to work with rising model capability, with built-in production support."

The core problem it addresses: traditional agent frameworks constrain LLMs with rigid pipelines and opinionated prompt templates. As LLM reasoning capability has improved rapidly, that constraint has become a bottleneck. AgentScope shifts to "letting the model's native reasoning and tool-use capabilities drive agent behavior" — the framework provides production infrastructure, not execution path constraints.

The minimum working unit in AgentScope 2.0 is an Agent

, extended by composing systems:

import asyncio
from agentscope import Agent, Toolkit, DashScopeChatModel, DashScopeCredential
from agentscope.tools import Bash, Grep, Glob, Read, Write
from agentscope.message import UserMsg

toolkit = Toolkit(tools=[Bash(), Grep(), Glob(), Read(), Write()])

agent = Agent(
    name="code-assistant",
    system_prompt="You are a code assistant that helps users analyze and modify codebases.",
    model=DashScopeChatModel(
        credential=DashScopeCredential(api_key="your_key"),
        model="qwen3.6-plus"
    ),
    toolkit=toolkit
)

async def run():
    async for evt in agent.reply_stream(UserMsg("user", "Analyze the structure of this codebase")):
        match evt.type:
            case EventType.TEXT_BLOCK_DELTA:
                print(evt.delta, end="", flush=True)
            case EventType.TOOL_CALL_START:
                print(f"\n[Tool call] {evt.tool_name}")

asyncio.run(run())

1. Event System

A unified event bus connecting all phases of the agent's reasoning process:

EventType.REPLY_START          # Agent begins responding
EventType.MODEL_CALL_START     # Model call initiated
EventType.TEXT_BLOCK_START     # Text block starts
EventType.TEXT_BLOCK_DELTA     # Streaming text delta
EventType.TEXT_BLOCK_END       # Text block complete
EventType.TOOL_CALL_START      # Tool call initiated
EventType.TOOL_CALL_END        # Tool call complete

Human-in-the-loop workflows attach through the event system: the agent on a specific event, wait for human confirmation, resume execution.

2. Permission System

Fine-grained control over which tool calls require approval vs. automatic execution:

from agentscope.permission import PermissionConfig, ApprovalMode

config = PermissionConfig(
    Write: ApprovalMode.ALWAYS,
    Bash: ApprovalMode.ALWAYS,
    Read: ApprovalMode.NEVER,
    default_cost_threshold=0.10
)

Permission Bypass Mode: For testing or trusted scenarios, disable all approvals and let the agent run fully autonomously.

3. Multi-Tenancy / Session Isolation

The FastAPI service layer provides production-grade tenant and session isolation:

4. Workspace / Sandbox Execution

Three backend options for isolated tool execution:

Backend	Best for
Local	Development and testing, fastest
Docker	Production, dependency isolation
E2B	Cloud sandbox, highest security

5. Middleware System

Insert composable hooks into the agent's reasoning-acting loop without modifying core agent code:

from agentscope.middleware import LoggingMiddleware, GuardrailMiddleware

agent = Agent(
    ...
    middlewares=[
        LoggingMiddleware(log_tool_calls=True),
        GuardrailMiddleware(blocked_patterns=["rm -rf", "DROP TABLE"]),
    ]
)

Leader-Worker pattern: a Leader Agent decomposes tasks and creates Worker agents via built-in team tools, then aggregates results.

from agentscope.tools import TeamTools

leader = Agent(
    name="research-leader",
    system_prompt="You lead a research team. Decompose tasks and synthesize results.",
    model=model,
    toolkit=Toolkit(tools=[*TeamTools()])
)

Worker agents' capabilities are determined dynamically by the leader at runtime — no need to predefine all possible worker types.

Agents decompose complex tasks into tracked plan steps, updating state in real time as execution proceeds:

Task: "Write a complete test suite for this Python project"
Agent generates plan:
  Step 1: [In progress] Scan project structure, identify all modules
  Step 2: [Waiting]     Analyze public API of each module
  Step 3: [Waiting]     Generate unit tests
  Step 4: [Waiting]     Generate integration tests
  Step 5: [Waiting]     Run test suite, fix failures

Step 1 completes → Step 2 starts automatically, plan state updates

Long-running tool calls (file processing, network requests, code compilation) shift to background without blocking the agent conversation stream:

User: "Compile this large C++ project and run the tests"
Agent: [Launches background task, continues conversation immediately]
Agent: "Compilation started in background, estimated 5 minutes.
        I can help with other things while you wait."
...(5 minutes later)
System notification: background task complete
Agent: "Compilation complete. Test results: ..."

This is the most fundamental difference between AgentScope 2.0 and many comparable frameworks:

Traditional approach (LangChain-style):

Developer defines a fixed chain:
Step 1 → Step 2 → Step 3 (developer decides what happens at each step)
The model fills in blanks within each step

AgentScope approach:

Developer provides: toolkit + permissions + constraints
Model decides:      what to do, in what order, with which tools
Framework handles:  production safety, observability, human-in-the-loop

When model reasoning was weak, fixed pipelines were correct — models needed guidance. When model reasoning is strong enough, fixed pipelines become constraints — the model has better plans it can't execute. AgentScope 2.0's timing judgment: mainstream models from 2025 onward are capable enough to deserve more autonomy.

The standard async for evt in agent.reply_stream()

pattern enables:

A separate AgentScope Runtime (runtime.agentscope.io) provides a complete production service layer:

AgentScope is not just a framework — there's a complete toolchain behind it:

Component	Function
AgentScope Studio
Visual debugging tool for agent runs
ReMe
Cross-session persistent memory (file-based + vector-based)
OpenJudge
50+ judges (code, math, tool use, multimodal output)
Trinity-RFT
Agent fine-tuning framework (decoupled Explorer/Trainer/Buffer)
Mem0 integration
Long-term memory (added June 2026)

Dimension	LangChain	AutoGen	AgentScope 2.0
Core pattern	Chain-based	Multi-agent conversation	Model-reasoning-led
Production infra	Third-party	Third-party	Built-in
Sandbox execution	None	Limited	Local / Docker / E2B
Human-in-the-loop	Plugin	Native	Event system native
Evaluation system	None	None	OpenJudge (50+ judges)
Fine-tuning support	None	None	Trinity-RFT
Academic backing	Yes	Yes	Yes (2 arXiv papers)

The most significant gap: AgentScope covers the full agent lifecycle — framework → memory → evaluation → fine-tuning → apps. LangChain and AutoGen stop at the framework and memory layers.

Install:

pip install agentscope

Or from source:

git clone https://github.com/agentscope-ai/agentscope.git
pip install -e .

Run the web UI:

cd agentscope
pnpm install && pnpm run dev   # frontend
python -m agentscope.service   # backend

AgentScope 2.0's timing is deliberate: at a moment when LLM reasoning capability is advancing fast, it chooses "reduce framework constraints, let the model lead" as its direction.

The five core systems (Event / Permission / Workspace / Multi-tenancy / Middleware) address the production pain points of traditional frameworks: poor observability, no fine-grained tool permission control, difficulty serving multiple users, and security constraints mixed into business logic.

The ecosystem coverage is what separates it most clearly. Framework → memory → evaluation → fine-tuning is a complete chain that LangChain and AutoGen haven't built. OpenJudge alone — 50+ judges covering code, math, tool use, and multimodal output — fills a gap that most teams solve by writing evaluation scripts from scratch.

27.1k Stars, 40 releases, two arXiv papers, and an Alibaba engineering team behind it. Among production-grade agent frameworks, AgentScope 2.0 is one of the most thorough options currently available.

Explore PrimeSkills — A marketplace for handpicked AI Agents and skills. Each is validated in real enterprise workflows, stripping away hype and keeping only what truly works.

Welcome to my Homepage for more useful insights and interesting products.

source & further reading

dev.to — original article Building My AI SaaS Developer Portfolio 🚀 The Hidden Cost of the AI Hype Your AI-tool usage is invisible. Here are 4 tiny local tools to see it.

Open Source Project of the Day (#104): AgentScope 2.0 — Alibaba's Production-Ready Agent Framework Built Around Model Reasoning

Run your AI side-project on zahid.host