{"slug": "slack-outlines-four-phase-journey-to-a-multi-cloud-ai-serving-platform", "title": "Slack Outlines Four-Phase Journey to a Multi-Cloud AI Serving Platform", "summary": "Slack detailed its four-phase evolution to a multi-cloud AI serving platform, moving from self-managed SageMaker to AWS Bedrock and Google Cloud Vertex AI, which improved complex reasoning quality by 10% and reduced short-prompt latency by 67%. The shift addressed capacity forecasting, traffic variability, and single-provider dependency, enabling better resilience and access to diverse models.", "body_md": "Slack has [outlined](https://slack.engineering/slack-ai-the-path-to-multi-cloud/) how its AI serving infrastructure evolved through four distinct phases, moving from a self-managed Amazon SageMaker deployment to a multi-cloud architecture spanning AWS Bedrock and Google Cloud Vertex AI. According to the company, the final configuration improved quality on complex reasoning workloads by around 10% while reducing latency for short prompts by approximately 67%.\n\nThe publication comes as engineering teams increasingly evaluate multi-provider AI strategies to improve resilience, gain access to a broader range of models, and reduce dependence on individual cloud platforms.\n\nSlack’s initial AI serving platform ran on Amazon SageMaker inside an escrow VPC using cross-account IAM roles. While the approach provided strong isolation, it also required manual capacity forecasting, scheduled cluster expansion, and advance planning for scarce A100 and H100 GPU resources. With millions of daily users relying on AI-powered features, capacity shortfalls and infrastructure issues could quickly become customer-facing problems.\n\nTo reduce that operational burden, Slack migrated to Amazon Bedrock. The company said the move eliminated infrastructure management overhead, reduced feature lag, and provided faster access to newer Anthropic models. Engineers no longer needed to manage GPU reservations directly, allowing the team to focus more on model performance and product quality. Slack completed the migration through compliance reviews, load testing, and feature-flag-driven rollouts, reporting no customer-facing incidents during the transition.\n\nTraffic variability remained another challenge. Slack reported that AI workloads can fluctuate by as much as 10× between peak and off-peak periods. To accommodate those swings, the team combined Bedrock’s Provisioned Throughput (PT) and On-Demand offerings, routing interactive traffic to lower-latency PT endpoints while allowing bursty background workloads to spill over into On-Demand capacity.\n\nThe hybrid capacity model addressed many of the scaling challenges associated with serving large AI workloads. However, Slack noted that an important limitation remained: its AI platform was still dependent on a single provider. Slack said that dependence on a single provider continued to create resiliency concerns and limited access to models available through competing ecosystems.\n\nThose concerns led Slack to pursue a multi-cloud strategy.\n\nAdding Google Cloud Vertex AI required the company to build a provider-agnostic serving layer that could operate consistently across cloud environments. The platform introduced secretless authentication, API normalization, unified observability, and intelligent routing between providers. Endpoints are continuously evaluated using metrics such as time-to-first-token, p90 latency, and 5xx error rates, allowing traffic to be redirected away from degraded services. The same abstraction layer also supports A/B testing and controlled model rollouts.\n\nSlack said the resulting architecture improved both performance and flexibility. Beyond the reported gains in quality and latency, the company highlighted access to a broader range of foundation models, improved geographic failover capabilities, and reduced dependence on any single cloud AI platform.\n\nSimilar approaches are emerging elsewhere in AI infrastructure. Engineers at [Padiso](https://www.padiso.co/blog/multi-cloud-claude-failover-bedrock-vertex-direct-api/) have described routing Anthropic Claude traffic across Bedrock, Vertex AI, and Anthropic’s direct API to improve resiliency and control provider dependencies. [BentoML](https://bentoml.com/llm/infrastructure-and-operations/multi-cloud-and-cross-region-inference) likewise advocates multi-cloud and cross-region inference strategies that route traffic based on latency and availability. These examples reflect many of the same concerns around portability, failover, and operational flexibility highlighted by Slack.\n\nFor platform teams building AI-powered applications, Slack’s experience highlights how abstraction layers can help separate application logic from underlying model providers while balancing resilience, performance, and access to rapidly changing model ecosystems.", "url": "https://wpnews.pro/news/slack-outlines-four-phase-journey-to-a-multi-cloud-ai-serving-platform", "canonical_source": "https://www.infoq.com/news/2026/06/slack-multicloud/?utm_campaign=infoq_content&utm_source=infoq&utm_medium=feed&utm_term=global", "published_at": "2026-06-25 07:02:00+00:00", "updated_at": "2026-06-25 07:23:51.685793+00:00", "lang": "en", "topics": ["ai-infrastructure", "ai-tools", "ai-products", "ai-research", "ai-policy"], "entities": ["Slack", "Amazon SageMaker", "AWS Bedrock", "Google Cloud Vertex AI", "Anthropic", "BentoML", "Padiso", "A100"], "alternates": {"html": "https://wpnews.pro/news/slack-outlines-four-phase-journey-to-a-multi-cloud-ai-serving-platform", "markdown": "https://wpnews.pro/news/slack-outlines-four-phase-journey-to-a-multi-cloud-ai-serving-platform.md", "text": "https://wpnews.pro/news/slack-outlines-four-phase-journey-to-a-multi-cloud-ai-serving-platform.txt", "jsonld": "https://wpnews.pro/news/slack-outlines-four-phase-journey-to-a-multi-cloud-ai-serving-platform.jsonld"}}