Who is this for?Mid-to-senior engineers preparing for system design interviews, or anyone curious how a short-video platform at billion-user scale actually works under the hood.
| Metric | Number |
|---|---|
| Monthly active users | 1B+ |
| Videos uploaded per day | ~34 million |
| Target feed latency (P99) | ~167ms |
| Peak egress bandwidth | ~26 Tbps |
Before drawing a single box, nail down what the system must do — and what it doesn't need to do perfectly on day one.
Functional requirements:
Non-functional requirements:
The system splits into four major domains: ingestion (upload pipeline), serving (read path), recommendation (ML feed), and social graph.
┌─────────────────────────────────────────────────┐
│ Mobile / Web Clients │
└─────────────────────┬───────────────────────────┘
│
┌─────────────────────▼───────────────────────────┐
│ Global CDN / Edge PoPs │
│ Video delivery, static assets, geo-routing │
└─────────────────────┬───────────────────────────┘
│
┌─────────────────────▼───────────────────────────┐
│ API Gateway + Load Balancer │
│ Auth, rate limiting, routing, TLS termination │
└────────┬────────────┴────────────────┬──────────┘
│ │
┌─────▼──────┐ ┌──────────────┐ ┌▼────────────────┐
│ Upload │ │ Feed Service │ │ Social Graph │
│ Service │ │(pre-compute │ │ Service │
│ │ │ + real-time) │ │ │
└─────┬──────┘ └──────┬───────┘ └┬────────────────┘
│ │ │
┌─────▼──────┐ ┌──────▼───────┐ ┌▼────────────────┐
│ Transcode │ │Recommendation│ │ Notification │
│ Workers │ │ Engine │ │ Service │
└─────┬──────┘ └──────┬───────┘ └┬────────────────┘
│ │ │
┌─────▼──────┐ ┌──────▼───────┐ ┌▼────────────────┐
│ Object │ │ Feature Store│ │ Search Service │
│ Storage │ │(Redis+Cassie)│ │ (Elasticsearch) │
└─────┬──────┘ └──────┬───────┘ └┬────────────────┘
│ │ │
┌────────▼────────────────▼────────────▼──────────────┐
│ Async Message Bus (Kafka) │
└──────────┬──────────────┬──────────────┬────────────┘
│ │ │
┌──────▼─────┐ ┌──────▼────┐ ┌──────▼──────┐
│MySQL/Vitess│ │ Redis │ │ Cassandra │
│(user data, │ │ (counters,│ │ (timelines, │
│ metadata) │ │ cache) │ │ history) │
└────────────┘ └───────────┘ └─────────────┘
All services communicate asynchronously via Kafka for non-critical paths.
TikTok's secret weapon. ~70% of video traffic is served directly from edge nodes in 150+ cities, bypassing origin entirely. It uses Anycast routing to send users to the nearest PoP. Manifest files (playlist URLs) are invalidated within seconds of a video going viral.
Chunked multi-part upload (5 MB chunks) tolerates flaky mobile connections. Workers dedup via SHA-256
before writing. Transcode jobs run on GPU fleets — outputs include 360p
, 720p
, 1080p
, and HEVC variants. Thumbnails and stills are extracted for ML feature generation.
A two-tower neural network:
Dot product gives a relevance score. The model runs online for top-k retrieval, then a ranker applies real-time signals (trending, friend activity) before the feed is assembled.
This is where TikTok differs from Twitter/Instagram:
The feed service merges both lists, injects ML-recommended videos, and applies diversity rules to avoid repetition. Final feed is cached in Redis with a 300s
TTL.
All write events (upload complete, like, follow, watch-complete) are published to Kafka topics. Downstream consumers include:
Topics are partitioned by user_id
for ordered processing per user. This decouples services and allows independent scaling.
| Store | Use Case | Why |
|---|---|---|
| MySQL / Vitess | ||
| User profiles, video metadata, social graph | ACID, sharded by user_id |
|
| Redis Cluster | ||
| Counters (likes, views), session tokens, feed cache | Sub-millisecond reads | |
| Cassandra | ||
| Watch history, timelines, notification logs | Wide-row reads, high write throughput |
The classic dilemma in social feed systems. TikTok uses a hybrid approach (the "celebrity problem" split):
Fan-out on write (for accounts with millions of followers):
Fan-out on read (for regular users):
Like/view counts can lag by a few seconds — nobody notices. But user authentication tokens and billing events require strong consistency. TikTok segments these into separate storage tiers with different consistency guarantees, accepting complexity for throughput on hot paths.
Likes and comments use WebSocket push for real-time delivery. Less critical notifications (weekly summaries, suggested follows) use a pull-based batch pipeline that runs every few hours — no need to maintain a persistent connection for a weekly digest email.
Assumptions:1B MAU, 500M DAU, avg user watches 45 min/day, avg video = 30 sec ~= 8 MB (720p). 34M uploads/day ~= 400 uploads/sec peak.
Storage:
34M uploads/day x 8 MB x 3 resolutions = ~816 TB/day of new video
With 3x replication over 5 years = ~4.4 EB total raw storage
Feed reads:
500M DAU x 20 feed refreshes/day / 86,400 sec = ~115,000 feed reads/sec
With 95% Redis cache hit rate -> recommendation backend sees ~5,750 rps
Bandwidth:
500M users x 45 min x 2 Mbps (720p) / 86,400 = ~26 Tbps peak egress
This is why TikTok operates its own backbone in many regions and has deep-peering agreements with major ISPs.
Most social platforms optimize for social graph traversal — show me what people I follow posted. TikTok inverted this: the algorithm is the product. The architecture is built around a recommendation pipeline that must be both blazing-fast and constantly learning from watch signals.
Three things stand out:
Aggressive edge caching — they push video delivery as close to the user as physically possible. The CDN is not a performance optimization; it is the entire delivery strategy.
Real-time ML feedback loops — a video's trajectory is decided in the first 30 minutes based on completion rate signals. A new creator can go viral without any followers.
Microservice isolation — upload, serving, recommendation, and social graph are independently deployable and scalable, preventing any single bottleneck from cascading.
If you're using this for a system design interview:
Found this useful? Follow for more system design deep dives — next up: designing YouTube's upload pipeline at scale.