cd /news/machine-learning/designing-tiktok-from-scratch-a-syst… · home topics machine-learning article
[ARTICLE · art-13909] src=dev.to pub= topic=machine-learning verified=true sentiment=· neutral

Designing TikTok from Scratch — A System Design Deep Dive

A system design deep dive into TikTok's architecture reveals a platform handling over 1 billion monthly active users, 34 million daily video uploads, and 26 Tbps of peak egress bandwidth, with a target P99 feed latency of 167ms. The platform's infrastructure is divided into four domains—ingestion, serving, recommendation, and social graph—with approximately 70% of video traffic served directly from edge nodes across 150+ cities using Anycast routing. Key technical components include chunked multi-part uploads with SHA-256 deduplication, a two-tower neural network for recommendation, and asynchronous communication via Kafka for non-critical paths.

read5 min publishedMay 25, 2026

Who is this for?Mid-to-senior engineers preparing for system design interviews, or anyone curious how a short-video platform at billion-user scale actually works under the hood.

Metric Number
Monthly active users 1B+
Videos uploaded per day ~34 million
Target feed latency (P99) ~167ms
Peak egress bandwidth ~26 Tbps

Before drawing a single box, nail down what the system must do — and what it doesn't need to do perfectly on day one.

Functional requirements:

Non-functional requirements:

The system splits into four major domains: ingestion (upload pipeline), serving (read path), recommendation (ML feed), and social graph.

┌─────────────────────────────────────────────────┐
│              Mobile / Web Clients                │
└─────────────────────┬───────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────┐
│         Global CDN / Edge PoPs                   │
│   Video delivery, static assets, geo-routing    │
└─────────────────────┬───────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────┐
│       API Gateway + Load Balancer                │
│   Auth, rate limiting, routing, TLS termination │
└────────┬────────────┴────────────────┬──────────┘
         │                             │
   ┌─────▼──────┐  ┌──────────────┐  ┌▼────────────────┐
   │  Upload    │  │ Feed Service │  │  Social Graph   │
   │  Service   │  │(pre-compute  │  │    Service      │
   │            │  │ + real-time) │  │                 │
   └─────┬──────┘  └──────┬───────┘  └┬────────────────┘
         │                │            │
   ┌─────▼──────┐  ┌──────▼───────┐  ┌▼────────────────┐
   │ Transcode  │  │Recommendation│  │  Notification   │
   │  Workers   │  │   Engine     │  │    Service      │
   └─────┬──────┘  └──────┬───────┘  └┬────────────────┘
         │                │            │
   ┌─────▼──────┐  ┌──────▼───────┐  ┌▼────────────────┐
   │  Object    │  │ Feature Store│  │  Search Service │
   │  Storage   │  │(Redis+Cassie)│  │ (Elasticsearch) │
   └─────┬──────┘  └──────┬───────┘  └┬────────────────┘
         │                │            │
┌────────▼────────────────▼────────────▼──────────────┐
│              Async Message Bus (Kafka)               │
└──────────┬──────────────┬──────────────┬────────────┘
           │              │              │
    ┌──────▼─────┐ ┌──────▼────┐ ┌──────▼──────┐
    │MySQL/Vitess│ │   Redis   │ │  Cassandra  │
    │(user data, │ │ (counters,│ │ (timelines, │
    │ metadata)  │ │  cache)   │ │  history)   │
    └────────────┘ └───────────┘ └─────────────┘

All services communicate asynchronously via Kafka for non-critical paths.

TikTok's secret weapon. ~70% of video traffic is served directly from edge nodes in 150+ cities, bypassing origin entirely. It uses Anycast routing to send users to the nearest PoP. Manifest files (playlist URLs) are invalidated within seconds of a video going viral.

Chunked multi-part upload (5 MB chunks) tolerates flaky mobile connections. Workers dedup via SHA-256

before writing. Transcode jobs run on GPU fleets — outputs include 360p

, 720p

, 1080p

, and HEVC variants. Thumbnails and stills are extracted for ML feature generation.

A two-tower neural network:

Dot product gives a relevance score. The model runs online for top-k retrieval, then a ranker applies real-time signals (trending, friend activity) before the feed is assembled.

This is where TikTok differs from Twitter/Instagram:

The feed service merges both lists, injects ML-recommended videos, and applies diversity rules to avoid repetition. Final feed is cached in Redis with a 300s

TTL.

All write events (upload complete, like, follow, watch-complete) are published to Kafka topics. Downstream consumers include:

Topics are partitioned by user_id

for ordered processing per user. This decouples services and allows independent scaling.

Store Use Case Why
MySQL / Vitess
User profiles, video metadata, social graph ACID, sharded by user_id
Redis Cluster
Counters (likes, views), session tokens, feed cache Sub-millisecond reads
Cassandra
Watch history, timelines, notification logs Wide-row reads, high write throughput

The classic dilemma in social feed systems. TikTok uses a hybrid approach (the "celebrity problem" split):

Fan-out on write (for accounts with millions of followers):

Fan-out on read (for regular users):

Like/view counts can lag by a few seconds — nobody notices. But user authentication tokens and billing events require strong consistency. TikTok segments these into separate storage tiers with different consistency guarantees, accepting complexity for throughput on hot paths.

Likes and comments use WebSocket push for real-time delivery. Less critical notifications (weekly summaries, suggested follows) use a pull-based batch pipeline that runs every few hours — no need to maintain a persistent connection for a weekly digest email.

Assumptions:1B MAU, 500M DAU, avg user watches 45 min/day, avg video = 30 sec ~= 8 MB (720p). 34M uploads/day ~= 400 uploads/sec peak.

Storage:

34M uploads/day x 8 MB x 3 resolutions = ~816 TB/day of new video
With 3x replication over 5 years = ~4.4 EB total raw storage

Feed reads:

500M DAU x 20 feed refreshes/day / 86,400 sec = ~115,000 feed reads/sec
With 95% Redis cache hit rate -> recommendation backend sees ~5,750 rps

Bandwidth:

500M users x 45 min x 2 Mbps (720p) / 86,400 = ~26 Tbps peak egress

This is why TikTok operates its own backbone in many regions and has deep-peering agreements with major ISPs.

Most social platforms optimize for social graph traversal — show me what people I follow posted. TikTok inverted this: the algorithm is the product. The architecture is built around a recommendation pipeline that must be both blazing-fast and constantly learning from watch signals.

Three things stand out:

Aggressive edge caching — they push video delivery as close to the user as physically possible. The CDN is not a performance optimization; it is the entire delivery strategy.

Real-time ML feedback loops — a video's trajectory is decided in the first 30 minutes based on completion rate signals. A new creator can go viral without any followers.

Microservice isolation — upload, serving, recommendation, and social graph are independently deployable and scalable, preventing any single bottleneck from cascading.

If you're using this for a system design interview:

Found this useful? Follow for more system design deep dives — next up: designing YouTube's upload pipeline at scale.

── more in #machine-learning 4 stories · sorted by recency
── more on @tiktok 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/designing-tiktok-fro…] indexed:0 read:5min 2026-05-25 ·