{"slug": "build-self-service-aws-health-analytics-to-find-actionable-health-insights-with", "title": "Build self-service AWS Health analytics to find actionable health insights with AI agents powered by Amazon Bedrock", "summary": "AWS announced Chaplin, an open-source solution using AI agents powered by Amazon Bedrock and the Model Context Protocol, to enable self-service health event analytics. The tool lets operations teams query AWS Health notifications in natural language, reducing reliance on Technical Account Managers and helping prioritize events like Linux end-of-life or EC2 retirements across accounts.", "body_md": "[Artificial Intelligence](https://aws.amazon.com/blogs/machine-learning/)\n\n# Build self-service AWS Health analytics to find actionable health insights with AI agents powered by Amazon Bedrock\n\nOn a typical Monday morning, an enterprise operations team receives multiple AWS Health notifications about Amazon Linux 2 end-of-life, RDS version deprecations, and EC2 instance retirements across 50+ accounts. Without self-service analytics, the team has no way to quickly identify the events that affect production systems, the events that require immediate action versus long-term planning, and the business impact of each event category.\n\nOperations teams also spend time waiting for Technical Account Managers (TAMs) to interpret health events, adding delays to critical operational decisions. The result is time spent on reactive firefighting rather than innovation.\n\nIn this post, we show you how to build **Chaplin** (Customer Health and Planned Lifecycle Intelligence Nexus), an open source solution that uses AI agents exposed through the Model Context Protocol (MCP) to provide self-service health event analytics. With Chaplin, teams can ask questions in natural language directly from MCP-compatible AI assistants and receive precise, contextualized answers without depending on AWS Support for routine analysis. Detailed deployment instructions are available in the [Chaplin AWS Health Agentic Assistant GitHub repository](https://github.com/aws-samples/sample-aws-health-agentic-assistant).\n\n## The challenge: Reactive health event management\n\nEnterprises running production workloads on AWS manage a constant stream of health events – service changes, maintenance windows, security patches, and operational notifications – across dozens or hundreds of accounts. AWS Health provides comprehensive event data through the AWS Health API and Amazon EventBridge, but reactive management approaches leave gaps.\n\n- Teams depend on TAMs for health event interpretation and impact analysis, creating bottlenecks in decision-making. Business intelligence dashboards with predefined schemas cannot adapt to dynamic questions or provide the contextual insights that operations teams need in the moment.\n- DevOps and cloud operations teams spend significant time manually categorizing and prioritizing thousands of health events scattered across multiple accounts and regions. Without a central location for analysis, it is difficult to assess overall impact, coordinate responses across teams, or identify proactive opportunities – such as planning migrations or scheduling maintenance before issues become critical.\n\nEligible Health events will soon be linked directly to AWS Transform templates, enabling customers to act on events directly. Chaplin can surface and prioritize these actionable events for your environment.\n\n## Solution overview: Self-service analytics with Chaplin\n\nChaplin implements self-service health event analytics using agentic AI powered by Amazon Bedrock, delivered through the Model Context Protocol (MCP). Instead of predefined dashboard schemas, Chaplin exposes AI-powered tools that MCP-compatible clients can consume. Teams interact with Chaplin directly from their AI assistant – such as Claude Code or Kiro CLI – and ask questions in natural language. For example, a team member might ask for upcoming RDS lifecycle events in the next 60 days, request a summary of open EC2 events prioritized by urgency, query security patches affecting production environments, or check which maintenance windows could affect high-priority applications.\n\nYour teams can continue to query until you have all the information required to make an informed decision and draw up a remediation plan. This approach enables DevOps, security, and operations teams to independently analyze health events, plan migrations, and assess operational impacts without creating bottlenecks. Because Chaplin uses MCP, teams can also combine it with other MCP-enabled tools (like JIRA, GitHub, or ServiceNow) in their workflow to perform actions with agentic experience.\n\nAdditionally, MCP enables direct association of AWS data and metadata with business or application-level context – such as resource tags, environment classifications, and ownership information – enriching health event analysis with organizational relevance.\n\n## How agentic AI unifies structured and unstructured data\n\nChaplin uses a multi-agent architecture that addresses a fundamental challenge in enterprise data analytics: effectively combining structured and unstructured data processing. Traditional Retrieval-Augmented Generation (RAG) systems and generative AI approaches face a critical limitation: they are inherently non-deterministic when handling numerical operations and aggregations. Vector similarity search, the foundation of RAG, retrieves semantically similar content but cannot guarantee mathematical accuracy. When asked to count, sum, or aggregate data, RAG-based systems may hallucinate results (for example, reporting 190 health events related to End-of-life when the actual count is 958). This non-determinism stems from the probabilistic nature of both the retrieval mechanism (which ranks documents by semantic similarity rather than exact matches) and the language model’s generation process (which predicts likely tokens rather than computing precise values).\n\nAWS Health events present this exact challenge. Each event contains structured metadata – event type, service name, affected resources, timestamps, severity levels, and account IDs – that requires precise filtering and aggregation. Each event also contains unstructured descriptions with natural language explanations of the issue, impact assessments, and recommended actions that require semantic understanding and contextual analysis.\n\n### Intelligent query processing\n\nWhen you ask Chaplin a question, three specialized components work together. The Natural Language to Structured Query Agent converts plain English questions into precise structured data queries against health event metadata. It understands the schema of your health events – which fields exist, such as event_type, affected_accounts, and start_time – and constructs filters that match your intent. A question like “Show me EC2 retirements in production accounts” becomes a structured query with exact field filters rather than keyword matching.\n\n- The Contextual Impact Analysis Agent handles unstructured health event descriptions by combining them with your customer metadata – production vs. non-production environments, business units, application tiers, and ownership information. This agent performs system-level reasoning, interpreting not just what the event says but what it means for your specific infrastructure and organizational context.\n- The Pattern-Based Classification Engine categorizes health events using rule-based pattern matching, which eliminates AI processing costs for routine categorization while maintaining high accuracy. This cost optimization layer makes the solution practical at scale.\n\n### Cost-optimized AI architecture\n\nChaplin implements intelligent cost optimization through selective AI enhancement. The system uses a pattern-first processing approach where rule-based classification handles most events without incurring AI costs. Pre-built summarized views for 30-day, 60-day, and 120-day windows with filters help teams quickly identify critical alerts. In the current implementation, Amazon Bedrock with Claude processes only unstructured data that requires contextual analysis. But the solution is also LLM-agnostic, supporting multiple model providers such as Amazon Bedrock, OpenAI, Anthropic, or local models like Ollama, providing flexibility based on your requirements and cost constraints. Intelligent caching reduces redundant AI processing, and structured query precision uses the AWS Health API schema for exact numerical analysis without AI inference costs.\n\n## Architecture overview\n\nThe following diagram illustrates the complete Chaplin architecture. It shows how health events flow from multiple AWS accounts through a centralized data pipeline, into an MCP server powered by AI agents built on Amazon Bedrock, and finally to MCP-compatible AI assistants where teams interact with the data through natural language. Each layer is described in detail after the diagram.\n\nThe architecture consists of three primary layers working together to deliver intelligent health event analytics.\n\n#### 1. Data tier – Collection layer (multi-account)\n\nThe data tier collects health events from across your AWS Organization and centralizes them for analysis. In each member account, AWS Health API serves as the source of health events. Amazon EventBridge provides event-driven triggers for real-time capture, and AWS Lambda collector functions retrieve events using cross-account IAM roles configured with least-privilege access.\n\nThese events flow to a centralized management account where an Amazon Simple Storage Service (Amazon S3) data lake stores collected health events with intelligent partitioning by account, date, and event type. When new events arrive, S3 event notifications trigger an AWS Lambda function that processes the JSON health events and loads them into Amazon DynamoDB for fast querying.\n\nThis multi-account architecture supports two deployment models:\n\n**Option 1**: AWS Organizations API for centralized, automated deployment across your accounts.** Option 2**: Individual account deployments for organizations with security restrictions.\n\n#### 2. Middle tier – MCP server and intelligence layer\n\nThe middle tier is where raw health event data is transformed into actionable intelligence and exposed through an MCP server. Amazon DynamoDB serves as the primary data store for structured health event metadata, optimized for fast queries with indexes on event type, severity, date, and account. This enables real-time access for both pattern-based classification and AI analysis.\n\nA pattern-based event classifier provides the first layer of intelligence. This rule-based categorization engine uses regex patterns on event types to map events to five business categories: Migration Requirements, Security & Compliance, Maintenance & Updates, Cost Impact Events, and Operational Notifications. Because most events follow predictable patterns, this approach processes the majority of events through efficient pattern matching without incurring AI costs.\n\nFor events requiring deeper analysis, the AI-powered analysis engine built on Amazon Bedrock takes over. This engine uses the Strands Agents framework, an open-source agentic framework developed by AWS, with Claude 4.5 Sonnet as the large language model. You can switch this to a preferred LLM of your choice. Three specialized agents handle different aspects of analysis: a SQL Query Agent converts natural language queries to structured DynamoDB queries for precise numerical analysis, an Impact Analysis Agent evaluates unstructured event descriptions against customer metadata such as environment, business unit, and ownership, and a DBQueryBuilder Agent generates optimized database queries for multi-dimensional aggregations. All these capabilities are exposed as MCP tools that compatible clients can invoke.\n\n#### 3. Presentation tier – MCP client – AI assistant integration\n\nThe presentation tier consists of an MCP-compatible AI assistant, such as Claude Code or Kiro CLI. Instead of a custom front end, Chaplin exposes its capabilities as MCP tools that these clients consume natively. Users interact through natural language in their existing development environment, and the AI assistant orchestrates calls to Chaplin’s MCP server to retrieve health event data, run AI-powered analysis, and present contextualized results – all within the same conversational interface they already use for development tasks.\n\nSecurity relies on AWS Identity and Access Management (AWS IAM) for authentication and authorization. The MCP client mounts AWS credentials as read-only, and access is controlled through IAM roles with least-privilege principles. Data is encrypted with TLS 1.2+ in transit and AES-256 at rest, and AWS CloudTrail provides audit logging for API calls.\n\n## Key capabilities\n\nChaplin provides three core capabilities that address gaps in how organizations manage AWS Health events today.\n\nChaplin offers dynamic conversational analytics. It generates actionable insights on demand based on your specific questions, providing precise breakdowns with exact counts, affected accounts, and contextual analysis – generated dynamically within your AI assistant without pre-built reports or dashboards. Chaplin delivers this through three integrated capabilities:\n\n### MCP-powered health intelligence tools\n\nChaplin exposes a comprehensive set of MCP tools organized into three categories. Summary tools query DynamoDB directly and return instantly, providing high-level counts by service, status, category, and region. Detail tools let you drill into specific event categories, event types, or filtered event lists. AI analysis tools use Strands Agents with Amazon Bedrock to interpret your natural language queries, fetch relevant data, and generate contextual insights.\n\n### Multi-account data pipeline\n\nChaplin collects health events from your AWS accounts and centralizes data in Amazon S3, supporting flexible deployment models based on your security posture. The data pipeline consists of AWS Lambda functions for automated health event ingestion, Amazon EventBridge schedulers with configurable collection frequency (daily or hourly), cross-account IAM roles for secure multi-account data collection with least-privilege principles, an Amazon S3 data lake with partitioning for efficient querying, and automated lifecycle management with configurable retention policies.\n\n### Precise analytical processing\n\nChaplin combines structured and unstructured data processing for comprehensive analysis. For structured data, it delivers exact numerical results including event counts and distributions, timeline analysis with trend detection, multi-dimensional aggregations across account, service, and severity dimensions, and categorical breakdowns with precise percentages. For unstructured data, it provides contextual insights such as impact assessment based on event descriptions, architectural deficiency identification, risk correlation across related events, and recommended actions based on event context.\n\n## Implementation walkthrough: EC2 instance lifecycle management\n\nTo illustrate the breadth of what Chaplin can do, the following walkthrough shows a series of natural language queries issued through Kiro CLI. These examples are organized by use case – from getting a quick operational overview to deep-diving into specific services and planning remediation.\n\nNote: The following examples use sample data to illustrate Chaplin’s capabilities. Your actual responses will vary based on the health events in your AWS environment.\n\n### Getting the operational picture\n\nA team starting their day can quickly assess the current state of health events across your accounts and identify what needs immediate attention:\n\n### Drilling into specific services and events\n\nOnce the team identifies areas of concern, they can drill down into specific services, event categories, or filtered views.\n\nFor example, asking about scheduled changes returns a detailed breakdown by service, status, and timeline:\n\n### AI-powered analysis and remediation planning\n\nEach of these queries demonstrates a different facet of Chaplin’s capabilities: instant structured lookups from DynamoDB for operational overviews, filtered drill-downs by service, region, or event type, and AI-powered contextual analysis that generates insights and recommends specific remediation steps. Because this runs in an MCP-compatible AI assistant, the conversation is fluid – each answer naturally leads to the next question, and teams can combine Chaplin with other MCP tools in the same session.\n\nThe following example demonstrates how Chaplin combines multiple MCP tools to answer a complex analytical question. When asked about recurring health events and architectural remediation, the AI agent orchestrates calls to `get_health_summary`\n\n, `get_event_categories`\n\n, `get_estimated_blast_radius`\n\n, and `get_event_type_stats`\n\nto produce a comprehensive response:\n\n## Deployment walkthrough\n\nChaplin offers two deployment options to match your team’s needs. Both are fully scripted – clone the repository and follow the guided setup in the README.\n\n**Option A**: Local install – runs the MCP server on your machine, connecting directly to DynamoDB and Bedrock using your local AWS credentials. Best for individual developers or quick evaluation. One-click install buttons are available in the repository for Kiro IDE, Cursor, and VS Code.\n\n**Option B**: Remote deploy (Lambda) – deploys the MCP server as a Lambda function in your AWS account. Team members connect via a lightweight local proxy – no local dependencies needed and a single instance of the server is hosted at a central location. Best for team-wide rollouts.\n\nBoth options deploy the backend infrastructure (DynamoDB table, S3-to-DynamoDB Lambda, and S3 event notifications) and configure your MCP client automatically.\n\nOnce deployed, open your MCP-compatible AI assistant and verify that the Chaplin health tools are available. Try a simple query like “What are the Scheduled Change events – planned maintenance and changes?” to confirm the connection is working.\n\n### Data pipeline\n\nChaplin requires AWS Health Events data. You can deploy Chaplin before or after setting up the data pipeline. The data pipeline supports two deployment models:\n\n**Option 1**: AWS Organizations – bulk deployment across multiple accounts (recommended)\n\n**Option 2**: Individual Accounts – manual deployment to specific accounts\n\nFor step-by-step deployment instructions, data pipeline setup, see the [Chaplin GitHub repository](https://github.com/aws-samples/sample-aws-health-agentic-assistant).\n\n## Benefits and impact\n\nOrganizations implementing Chaplin experience measurable improvements across three dimensions of AWS Health event management: operational efficiency, cost optimization, and risk mitigation.\n\nFrom an operational efficiency perspective, Chaplin enables proactive technology lifecycle management by identifying upcoming migrations and deprecations 60-90 days in advance, reducing emergency firefighting. Automated event categorization reduces the manual triage burden on operations teams. Self-service analytics removes dependencies on TAMs for routine analysis, enabling same-day remediation planning. Teams also benefit from early identification of deprecated services and configurations, preventing the accumulation of technical debt.\n\nCost optimization comes from multiple angles. Keeping up with lifecycle changes prevents costly emergency migrations and extended support fees. The pattern-first processing approach minimizes AI inference costs by routing majority of events through rule-based classification rather than LLM calls. Self-service capabilities reduce TAM engagement for routine inquiries, and better visibility into cost-impacting events enables proactive identification of Reserved Instance expirations and capacity changes. Configurable Amazon S3 retention policies help manage storage costs over time.\n\nFor risk mitigation, Chaplin provides early security visibility through proactive identification of security patches and vulnerabilities before they are exploited. Automated monitoring of compliance-related health events with audit trails supports compliance tracking. Contextual analysis of event impact on production systems helps prevent outages, and detection of configuration issues and architectural deficiencies catches problems before they cause incidents.\n\n## Looking ahead: From self-service analytics to autonomous operations with AWS DevOps Agent\n\nWhile the current release focuses on conversational analytics and self-service capabilities, the long-term vision for Chaplin extends toward autonomous operations. Because Chaplin is built on MCP, it integrates naturally with AWS DevOps Agent – a frontier agent that autonomously investigates incidents, identifies root causes, and provides detailed mitigation plans. By registering Chaplin’s MCP server as a capability provider in an AWS DevOps Agent Space, operations teams gain health event intelligence directly within their incident response workflows. AWS DevOps Agent can correlate Chaplin’s health event data with application topology, telemetry, and deployment history to surface impact scope, prioritize remediation, and coordinate response through channels like Slack and ServiceNow.\n\nThis integration creates a powerful feedback loop. When AWS DevOps Agent investigates an incident, it can query Chaplin to determine whether a related health event – such as an upcoming instance retirement or service deprecation – is contributing to the issue. Chaplin’s impact scope analysis provides business context, showing which accounts and workloads are at risk and their associated spend, while AWS DevOps Agent maps that to specific application resources and their dependencies through its topology graph. Together, they enable automated triaging where health events are not just categorized but correlated with real-time infrastructure state, helping teams move from reactive firefighting to proactive incident prevention. As AWS Health introduces native prioritization capabilities, this pipeline will become even richer, allowing customers to define their own prioritization rules enriched by both health event metadata and operational telemetry.\n\nFuture enhancements will build on this foundation with predictive maintenance through event pattern analysis and guided remediation workflows with rollback capabilities – transforming operations teams from reactive responders to strategic orchestrators.\n\n## Conclusion\n\nIn this post, we showed how to build a self-service AWS Health Event analytics solution using agentic AI powered by Amazon Bedrock, delivered through the Model Context Protocol (MCP). **Chaplin **(Customer Health and Planned Lifecycle Intelligence Nexus) demonstrates a shift from static dashboard monitoring to proactive conversational analytics by combining the precision of structured data querying with the contextual understanding of AI-powered analysis – accessible directly from your AI assistant.\n\nTo get started, clone the Chaplin GitHub repository and deploy Option A (local install) for a quick evaluation with your own AWS Health data. Once running, try querying your upcoming lifecycle events or drilling into specific service categories. Share your experience and questions in the comments below.\n\n### Next steps\n\n- Deploy Chaplin in your AWS environment using the\n[GitHub repository](https://github.com/aws-samples/sample-aws-health-agentic-assistant) - Explore the\n[Strands Agents framework](https://github.com/awslabs/strands)for building custom AI agents - Read about\n[AI agents unifying structured and unstructured data](https://aws.amazon.com/blogs/machine-learning/ai-agents-unifying-structured-and-unstructured-data-transforming-support-analytics-and-beyond-with-amazon-q-plugins/) - Learn more about\n[Amazon Bedrock](https://aws.amazon.com/bedrock/)capabilities\n\n### Learn more\n\n[Amazon Bedrock Documentation](https://docs.aws.amazon.com/bedrock/)[AWS Health API Documentation](https://docs.aws.amazon.com/health/)[AWS Organizations Multi-Account Strategy](https://docs.aws.amazon.com/organizations/)[Model Context Protocol (MCP) Specification](https://modelcontextprotocol.io/specification/2025-11-25)\n\nFor questions and feedback, visit [AWS re:Post](https://repost.aws/) or contact [AWS Support](https://aws.amazon.com/support/).", "url": "https://wpnews.pro/news/build-self-service-aws-health-analytics-to-find-actionable-health-insights-with", "canonical_source": "https://aws.amazon.com/blogs/machine-learning/build-self-service-aws-health-analytics-to-find-actionable-health-insights-with-ai-agents-powered-by-amazon-bedrock/", "published_at": "2026-06-25 16:38:12+00:00", "updated_at": "2026-06-25 16:43:15.975861+00:00", "lang": "en", "topics": ["artificial-intelligence", "ai-agents", "ai-tools", "ai-infrastructure", "generative-ai"], "entities": ["AWS", "Amazon Bedrock", "Chaplin", "Model Context Protocol", "Amazon Linux 2", "RDS", "EC2", "Amazon EventBridge"], "alternates": {"html": "https://wpnews.pro/news/build-self-service-aws-health-analytics-to-find-actionable-health-insights-with", "markdown": "https://wpnews.pro/news/build-self-service-aws-health-analytics-to-find-actionable-health-insights-with.md", "text": "https://wpnews.pro/news/build-self-service-aws-health-analytics-to-find-actionable-health-insights-with.txt", "jsonld": "https://wpnews.pro/news/build-self-service-aws-health-analytics-to-find-actionable-health-insights-with.jsonld"}}