{"slug": "scalable-voice-agent-design-with-amazon-nova-sonic-multi-agent-tools-and-session", "title": "Scalable voice agent design with Amazon Nova Sonic: multi-agent, tools, and session segmentation", "summary": "Amazon has released design patterns for scalable voice agents using Amazon Nova Sonic, Amazon Bedrock AgentCore, and Strands BidiAgent to address challenges like high latency and multi-agent coordination. The three patterns—tool-driven agents, sub-agents acting as tools, and session segmentation—enable organizations to build faster, more reliable voice experiences by decomposing large assistants into specialized, reusable components.", "body_md": "[Artificial Intelligence](https://aws.amazon.com/blogs/machine-learning/)\n\n# Scalable voice agent design with Amazon Nova Sonic: multi-agent, tools, and session segmentation\n\nDesign patterns for scalable voice agents matter for organizations that need to deliver fast, natural, and reliable voice experiences. Many teams face challenges like high latency, managing real-time audio, and coordinating multiple agents in complex workflows.\n\nIn this post, you’ll learn how to use [Amazon Nova Sonic](https://docs.aws.amazon.com/ai/responsible-ai/nova-2-sonic/overview.html), [Amazon Bedrock AgentCore](https://aws.amazon.com/blogs/machine-learning/securely-launch-and-scale-your-agents-and-tools-on-amazon-bedrock-agentcore-runtime/), and [Strands BidiAgent](https://strandsagents.com/docs/user-guide/concepts/bidirectional-streaming/agent/) to build scalable, maintainable voice agents that handle these challenges efficiently, resulting in more responsive and intelligent customer interactions.\n\nWe’ll explore three popular architectural patterns for voice agents, highlighting their trade-offs and best practices for minimizing latency.\n\n## The building blocks\n\nBefore diving deeper into the architecture patterns, here’s a quick overview of the three key components used as the sample solution in this post.\n\nAmazon Nova Sonic is a foundation model that creates natural, human-like speech-to-speech conversations for generative AI applications. Users can interact with AI through voice in real time, with capabilities for understanding tone, natural conversational flow, and performing actions.\n\nAmazon Bedrock AgentCore Runtime is a serverless hosting environment for AI agents. You package your agent as a container, deploy to AgentCore Runtime, and it handles scaling, session isolation, and billing. For voice agents, it provides [bidirectional WebSocket streaming](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-bidirectional-streaming.html) with SigV4 auth, microVM-level session isolation to avoid noisy-neighbor latency spikes, [AgentCore Gateway](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-get-started-websocket.html) for shared tool hosting using the Model Context Protocol (MCP) open source protocol, persistent memory across sessions, and telemetry for voice-specific metrics like time-to-first-audio.\n\n[Strands Agents](https://strandsagents.com/) is an open source framework for building AI agents. Its [BidiAgent](https://strandsagents.com/docs/user-guide/concepts/bidirectional-streaming/quickstart/) class is one integration option between Nova Sonic and your application. It manages the bidirectional stream lifecycle, routes tool calls, and handles session management, simplifying the voice agent application through the model SDK interface.\n\n## Three integration patterns: tool, agent-as-tool (sub-agent), and session segmentation\n\nInstead of building one all-powerful agent, modern voice systems are increasingly composed of **tool-driven agents**, **sub-agents acting as tools** and **session segmentation strategies** that isolate prompts, memory, and permissions. These patterns allow teams to decompose large assistants into smaller, specialized, and reusable components while maintaining clear security boundaries.\n\nBefore running the samples in the following sections, install Python and the required dependencies, including strands-agents and boto3, and make sure your IAM setup has the necessary permissions for the required services. For the full example, refer to the [GitHub repository](https://github.com/awslabs/agentcore-samples/tree/main/01-tutorials/01-AgentCore-runtime/06-bi-directional-streaming/02-strands-ws).\n\n### Pattern 1: AgentCore Gateway – tool selection for low latency\n\nA tool call is when a voice agent sends input to an external function or service, which processes it and returns output. It lets the agent perform tasks like querying a database or triggering a service quickly and securely, without extra reasoning steps.\n\nWith [AgentCore Gateway](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-get-started-websocket.html), you expose your existing business logic as **tools**, discrete functions that Nova Sonic can call directly during a conversation. The voice model selects which tool to invoke, passes parameters, gets a result, and speaks it back. There’s no intermediate reasoning layer between the model and the tool.\n\nAgentCore Gateway hosts MCP servers as managed endpoints. MCP is the protocol, AgentCore Gateway is the AWS feature that runs them. The voice agent connects via Gateway ARNs.\n\nWhen a user says “What’s my account balance?”, Nova Sonic:\n\n- Understands the intent from speech.\n- Selects\n`get_account_balance`\n\nfrom the available MCP tools. - Calls the tool with the right parameters.\n- Speaks the result back.\n\n**Trade-off:** Nova Sonic makes all the decisions. If a tool call requires multi-step validation, conditional logic, or chaining multiple operations together, that reasoning burden falls entirely on the voice model’s system prompt. For simple tools this is fine. For complex workflows, it gets brittle.\n\n### Pattern 2: Sub-agent – additional reasoning with decoupled agents\n\nWith the sub-agent or agent-as-tool pattern, your existing business logic runs in autonomous agents, each with its own model, system prompt, tools, and reasoning capabilities. The voice orchestrator delegates whole tasks to these sub-agents instead of calling individual tools.\n\nThere are many ways to connect to a sub-agent from your voice agent. Agent-to-Agent (A2A) and Strands Agent-as-Tool are two common approaches:\n\n**Local agent-as-tool:** The sub-agent runs in-process, wrapped as a`@tool`\n\nfunction using the[Agents as Tools pattern](https://aws.amazon.com/blogs/machine-learning/multi-agent-collaboration-patterns-with-strands-agents-and-amazon-nova/)in Strands. This is the most straightforward approach with no network hop and no separate deployment. The trade-off is that the sub-agent shares the same process and scales with the orchestrator.**Remote agent via A2A protocol:** The sub-agent is deployed as an independent[A2A server on AgentCore Runtime](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-a2a.html)(or a remote server) and invoked over the network. A2A is an open protocol for agent-to-agent communication. As MCP connects agents to tools, A2A connects agents to other agents. As the AWS blog on[A2A protocol support in AgentCore Runtime](https://aws.amazon.com/blogs/machine-learning/introducing-agent-to-agent-protocol-support-in-amazon-bedrock-agentcore-runtime/)explains, agents built with different frameworks (Strands, OpenAI, LangGraph, Google ADK) can share context and reasoning in a common format. This provides full deployment independence and cross-framework interoperability.\n\nStrands Agents has built-in support for both protocols, MCP for tool access and A2A for agent-to-agent communication. For a hands-on walkthrough, see the community guide on [Agent Collaboration: Strands Agents, MCP, and the Agent2Agent Protocol](https://community.aws/content/2xu47bH8LZYPr11tDN2eTDiEL95/agent-collaboration-strands-agents-mcp-and-the-agent2agent-protocol).\n\nHere’s the local agent-as-tool approach, each sub-agent is a `@tool`\n\nwrapping a full Strands `Agent`\n\n:\n\nThe voice orchestrator then uses BidiAgent with these sub-agent tools:\n\nThe sub-agent does its own thinking. Nova Sonic doesn’t need to orchestrate the individual steps. It delegates and speaks the result.\n\n**Trade-off:** Each sub-agent call adds latency: the sub-agent’s own model inference plus its tool calls. In a voice conversation, this means longer silence while the sub-agent reasons. The AWS blog on [multi-agent voice assistants](https://aws.amazon.com/blogs/machine-learning/building-a-multi-agent-voice-assistant-with-amazon-nova-sonic-and-amazon-bedrock-agentcore/) recommends starting with smaller, efficient models like Amazon Nova 2 Lite for sub-agents to reduce latency while still handling specialized tasks effectively.\n\nAmazon Nova 2 Sonic supports [asynchronous tool calling](https://docs.aws.amazon.com/nova/latest/nova2-userguide/sonic-async-tools.html), so the conversation continues naturally while tools run in the background. It keeps accepting input, can run multiple tools in parallel, and gracefully adapts if the user changes their request mid-process, delivering all results while focusing on what’s still relevant.\n\n## Pattern 3: Session segmentation for ultra-low latency\n\nThere’s a third approach worth considering. It doesn’t map neatly to the MCP or sub-agent patterns, but is purpose-built for voice scenarios where latency is the overriding concern.\n\nInstead of delegating external tools or sub-agents, you **segment the conversation into logical phases**, each with its own Nova Sonic session, system prompt, and tool set. When the conversation transitions from one phase to the next (for example, from authentication to account inquiry), you close the current session and open a new one with a different prompt and tools, within the same WebSocket connection. Each sub-voice-agent can use its own MCP gateways, tools, or even sub-agents — the differences that it operates with a focused prompt and minimal tool surface, reducing reasoning overhead and latency.\n\nThink of a banking voice assistant with three conversation phases: authentication, account management, and mortgage inquiry. Rather than loading one massive system prompt with every tool, you run each phase as a focused Nova Sonic session:\n\nEach phase gets a clean Nova Sonic session with:\n\n- A focused system prompt: Shorter, more specific, less room for the model to get confused.\n- Only the relevant tools: via MCP gateways, local tools, or both. The model doesn’t waste reasoning cycles choosing between 15 tools when it only needs 3.\n- Optionally its own sub-agents: a phase that requires deeper reasoning can use Pattern 2 internally, while simpler phases stay tool-only.\n- The previous session context can be passed into the new session as chat history, so the overall conversation retains continuity.\n\n### Compared to tool, sub-agent, and session segmentation patterns\n\nFactor |\nTool |\nSub-Agent (Agent-as-Tool) |\nSession Segmentation |\n| Latency | Low | Higher (sub-agent reasoning) | Lowest (with latency during session transitions) |\n| Tool set per turn | Tools loaded | Sub-agent’s tools | Only phase-relevant tools |\n| System prompt | One large prompt | Orchestrator + sub-agent prompts | Small, phase-specific prompts |\n| Reasoning depth | Voice model only | Voice model + sub-agent | Voice model only (per phase) |\n| Reuse of existing agents | High (same MCP tools) | Highest (same sub-agents) | Medium (composes tools/sub-agents per phase) |\n| Conversation continuity | Seamless | Seamless | Requires handoff logic between phases |\n\n## Latency best practices for voice agents\n\nLatency is a key consideration when building voice versus text agents. Here are practical techniques to keep response times fast and responsive:\n\n**Start with small models for sub-agents.** Your voice orchestrator uses Nova Sonic for the conversation, but sub-agents don’t need a large model. Start with Amazon Nova 2 Lite or Nova 2 Micro. They’re fast, cost optimized, and handle most specialized tasks well. You can always upgrade a specific sub-agent to a larger model if quality requires it, but default to small.\n\n**Design stateful sub-agents with caching.** A stateless sub-agent that hits a database or API on every call adds latency every time. Instead, design sub-agents to cache results from data sources (APIs, AWS Lambda functions, databases) within a session. If the banking sub-agent fetches account details once, it should hold that data in memory and serve subsequent questions (balance, transactions, summary) from cache rather than making repeated backend calls.\n\n**Prefetch data after authentication.** This is especially valuable for contact center scenarios. After a customer authenticates, you already know who they are. Don’t wait for them to ask before pulling their data. Immediately fetch account balances, recent transactions, pending alerts, and mortgage status in the background. When the customer asks “What’s my balance?”, the answer is already in memory.\n\n**Parallelize independent tool calls.** If the user asks “Give me an overview of my accounts”, don’t call `get_checking_balance`\n\n, then `get_savings_balance`\n\n, then `get_credit_card_balance`\n\nsequentially. Use concurrent execution so three calls happen at once. Strands supports this natively. The agent’s tool executor runs independent calls in parallel by default.\n\n**Use filler phrases to mask tool latency.** When a tool call or sub-agent delegation is unavoidable, instruct the voice model to speak a brief filler while waiting: “Let me check that for you…” or “One moment while I look that up…” This keeps the conversation feeling alive instead of dropping into silence.\n\n**Minimize tool count per session.** Tool selection gets slower as the number of available tools grows. If your agent has 15 tools but a typical conversation only uses 3 to 4, consider the session segmentation pattern to load only the relevant tools per phase.\n\n## Clean up\n\nAfter you finish testing the sample, remember to clean up the resources you created to avoid unnecessary costs. Follow the repository instructions to stop services and delete any deployed infrastructure.\n\n## Conclusion\n\nMigrating a text chatbot to a voice assistant isn’t a straightforward wrapper job. The interaction model is fundamentally different, from response design to latency budgets to turn-taking behavior. But with a well-structured multi-agent architecture and Amazon Bedrock AgentCore, the business logic layer stays intact.\n\nThe sub-agents you’ve already built are your biggest asset. Reuse them.\n\nFor a working example of a Strands BidiAgent voice assistant deployed on AgentCore Runtime with WebSocket streaming, see the [AgentCore bidirectional streaming sample](https://github.com/awslabs/amazon-bedrock-agentcore-samples/tree/main/01-tutorials/01-AgentCore-runtime/06-bi-directional-streaming/02-strands-ws).\n\n## Next steps\n\nNext, you can extend the sample to fit your own use case, integrate your business tools, refine prompts for voice interactions, and test the agent in real-world scenarios to prepare for production deployment. To learn more about voice agents on AWS, visit:\n\n[Amazon Nova 2 Sonic Service Card](https://docs.aws.amazon.com/ai/responsible-ai/nova-2-sonic/overview.html)[Amazon Bedrock AgentCore Runtime Bidirectional Streaming](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/runtime-bidirectional-streaming.html)[Strands BidiAgent](https://strandsagents.com/docs/user-guide/concepts/bidirectional-streaming/agent/)[Amazon Bedrock AgentCore Bidirectional Streaming Sample Code](https://github.com/awslabs/agentcore-samples/tree/main/01-tutorials/01-AgentCore-runtime/06-bi-directional-streaming)", "url": "https://wpnews.pro/news/scalable-voice-agent-design-with-amazon-nova-sonic-multi-agent-tools-and-session", "canonical_source": "https://aws.amazon.com/blogs/machine-learning/scalable-voice-agent-design-with-amazon-nova-sonic-multi-agent-tools-and-session-segmentation/", "published_at": "2026-05-19 15:26:37+00:00", "updated_at": "2026-05-26 08:08:36.644275+00:00", "lang": "en", "topics": ["artificial-intelligence", "generative-ai", "ai-agents", "natural-language-processing", "ai-tools"], "entities": ["Amazon Nova Sonic", "Amazon Bedrock AgentCore", "Strands BidiAgent", "AWS"], "alternates": {"html": "https://wpnews.pro/news/scalable-voice-agent-design-with-amazon-nova-sonic-multi-agent-tools-and-session", "markdown": "https://wpnews.pro/news/scalable-voice-agent-design-with-amazon-nova-sonic-multi-agent-tools-and-session.md", "text": "https://wpnews.pro/news/scalable-voice-agent-design-with-amazon-nova-sonic-multi-agent-tools-and-session.txt", "jsonld": "https://wpnews.pro/news/scalable-voice-agent-design-with-amazon-nova-sonic-multi-agent-tools-and-session.jsonld"}}