Designing TikTok from Scratch — A System Design Deep Dive

A system design deep dive into TikTok's architecture reveals a platform handling over 1 billion monthly active users, 34 million daily video uploads, and 26 Tbps of peak egress bandwidth, with a target P99 feed latency of 167ms. The platform's infrastructure is divided into four domains—ingestion, serving, recommendation, and social graph—with approximately 70% of video traffic served directly from edge nodes across 150+ cities using Anycast routing. Key technical components include chunked multi-part uploads with SHA-256 deduplication, a two-tower neural network for recommendation, and asynchronous communication via Kafka for non-critical paths.

Who is this for?Mid-to-senior engineers preparing for system design interviews, or anyone curious how a short-video platform at billion-user scale actually works under the hood. | Metric | Number | |---|---| | Monthly active users | 1B+ | | Videos uploaded per day | ~34 million | | Target feed latency P99 | ~167ms | | Peak egress bandwidth | ~26 Tbps | Before drawing a single box, nail down what the system must do — and what it doesn't need to do perfectly on day one. Functional requirements: Non-functional requirements: The system splits into four major domains: ingestion upload pipeline , serving read path , recommendation ML feed , and social graph . ┌─────────────────────────────────────────────────┐ │ Mobile / Web Clients │ └─────────────────────┬───────────────────────────┘ │ ┌─────────────────────▼───────────────────────────┐ │ Global CDN / Edge PoPs │ │ Video delivery, static assets, geo-routing │ └─────────────────────┬───────────────────────────┘ │ ┌─────────────────────▼───────────────────────────┐ │ API Gateway + Load Balancer │ │ Auth, rate limiting, routing, TLS termination │ └────────┬────────────┴────────────────┬──────────┘ │ │ ┌─────▼──────┐ ┌──────────────┐ ┌▼────────────────┐ │ Upload │ │ Feed Service │ │ Social Graph │ │ Service │ │ pre-compute │ │ Service │ │ │ │ + real-time │ │ │ └─────┬──────┘ └──────┬───────┘ └┬────────────────┘ │ │ │ ┌─────▼──────┐ ┌──────▼───────┐ ┌▼────────────────┐ │ Transcode │ │Recommendation│ │ Notification │ │ Workers │ │ Engine │ │ Service │ └─────┬──────┘ └──────┬───────┘ └┬────────────────┘ │ │ │ ┌─────▼──────┐ ┌──────▼───────┐ ┌▼────────────────┐ │ Object │ │ Feature Store│ │ Search Service │ │ Storage │ │ Redis+Cassie │ │ Elasticsearch │ └─────┬──────┘ └──────┬───────┘ └┬────────────────┘ │ │ │ ┌────────▼────────────────▼────────────▼──────────────┐ │ Async Message Bus Kafka │ └──────────┬──────────────┬──────────────┬────────────┘ │ │ │ ┌──────▼─────┐ ┌──────▼────┐ ┌──────▼──────┐ │MySQL/Vitess│ │ Redis │ │ Cassandra │ │ user data, │ │ counters,│ │ timelines, │ │ metadata │ │ cache │ │ history │ └────────────┘ └───────────┘ └─────────────┘ All services communicate asynchronously via Kafka for non-critical paths. TikTok's secret weapon. ~70% of video traffic is served directly from edge nodes in 150+ cities, bypassing origin entirely. It uses Anycast routing to send users to the nearest PoP. Manifest files playlist URLs are invalidated within seconds of a video going viral. Chunked multi-part upload 5 MB chunks tolerates flaky mobile connections. Workers dedup via SHA-256 before writing. Transcode jobs run on GPU fleets — outputs include 360p , 720p , 1080p , and HEVC variants. Thumbnails and stills are extracted for ML feature generation. A two-tower neural network : Dot product gives a relevance score. The model runs online for top-k retrieval, then a ranker applies real-time signals trending, friend activity before the feed is assembled. This is where TikTok differs from Twitter/Instagram: The feed service merges both lists, injects ML-recommended videos, and applies diversity rules to avoid repetition. Final feed is cached in Redis with a 300s TTL. All write events upload complete, like, follow, watch-complete are published to Kafka topics. Downstream consumers include: Topics are partitioned by user id for ordered processing per user. This decouples services and allows independent scaling. | Store | Use Case | Why | |---|---|---| MySQL / Vitess | User profiles, video metadata, social graph | ACID, sharded by user id | Redis Cluster | Counters likes, views , session tokens, feed cache | Sub-millisecond reads | Cassandra | Watch history, timelines, notification logs | Wide-row reads, high write throughput | The classic dilemma in social feed systems. TikTok uses a hybrid approach the "celebrity problem" split : Fan-out on write for accounts with millions of followers : Fan-out on read for regular users : Like/view counts can lag by a few seconds — nobody notices. But user authentication tokens and billing events require strong consistency . TikTok segments these into separate storage tiers with different consistency guarantees, accepting complexity for throughput on hot paths. Likes and comments use WebSocket push for real-time delivery. Less critical notifications weekly summaries, suggested follows use a pull-based batch pipeline that runs every few hours — no need to maintain a persistent connection for a weekly digest email. Assumptions:1B MAU, 500M DAU, avg user watches 45 min/day, avg video = 30 sec ~= 8 MB 720p . 34M uploads/day ~= 400 uploads/sec peak. Storage: 34M uploads/day x 8 MB x 3 resolutions = ~816 TB/day of new video With 3x replication over 5 years = ~4.4 EB total raw storage Feed reads: 500M DAU x 20 feed refreshes/day / 86,400 sec = ~115,000 feed reads/sec With 95% Redis cache hit rate - recommendation backend sees ~5,750 rps Bandwidth: 500M users x 45 min x 2 Mbps 720p / 86,400 = ~26 Tbps peak egress This is why TikTok operates its own backbone in many regions and has deep-peering agreements with major ISPs. Most social platforms optimize for social graph traversal — show me what people I follow posted . TikTok inverted this: the algorithm is the product . The architecture is built around a recommendation pipeline that must be both blazing-fast and constantly learning from watch signals. Three things stand out: Aggressive edge caching — they push video delivery as close to the user as physically possible. The CDN is not a performance optimization; it is the entire delivery strategy. Real-time ML feedback loops — a video's trajectory is decided in the first 30 minutes based on completion rate signals. A new creator can go viral without any followers. Microservice isolation — upload, serving, recommendation, and social graph are independently deployable and scalable, preventing any single bottleneck from cascading. If you're using this for a system design interview: Found this useful? Follow for more system design deep dives — next up: designing YouTube's upload pipeline at scale.