# Designing TikTok from Scratch — A System Design Deep Dive

> Source: <https://dev.to/danikeya/designing-tiktok-from-scratch-a-system-design-deep-dive-57j8>
> Published: 2026-05-25 22:24:05+00:00

Who is this for?Mid-to-senior engineers preparing for system design interviews, or anyone curious how a short-video platform at billion-user scale actually works under the hood.

| Metric | Number |
|---|---|
| Monthly active users | 1B+ |
| Videos uploaded per day | ~34 million |
| Target feed latency (P99) | ~167ms |
| Peak egress bandwidth | ~26 Tbps |

Before drawing a single box, nail down what the system must do — and what it doesn't need to do perfectly on day one.

**Functional requirements:**

**Non-functional requirements:**

The system splits into four major domains: **ingestion** (upload pipeline), **serving** (read path), **recommendation** (ML feed), and **social graph**.

```
┌─────────────────────────────────────────────────┐
│              Mobile / Web Clients                │
└─────────────────────┬───────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────┐
│         Global CDN / Edge PoPs                   │
│   Video delivery, static assets, geo-routing    │
└─────────────────────┬───────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────┐
│       API Gateway + Load Balancer                │
│   Auth, rate limiting, routing, TLS termination │
└────────┬────────────┴────────────────┬──────────┘
         │                             │
   ┌─────▼──────┐  ┌──────────────┐  ┌▼────────────────┐
   │  Upload    │  │ Feed Service │  │  Social Graph   │
   │  Service   │  │(pre-compute  │  │    Service      │
   │            │  │ + real-time) │  │                 │
   └─────┬──────┘  └──────┬───────┘  └┬────────────────┘
         │                │            │
   ┌─────▼──────┐  ┌──────▼───────┐  ┌▼────────────────┐
   │ Transcode  │  │Recommendation│  │  Notification   │
   │  Workers   │  │   Engine     │  │    Service      │
   └─────┬──────┘  └──────┬───────┘  └┬────────────────┘
         │                │            │
   ┌─────▼──────┐  ┌──────▼───────┐  ┌▼────────────────┐
   │  Object    │  │ Feature Store│  │  Search Service │
   │  Storage   │  │(Redis+Cassie)│  │ (Elasticsearch) │
   └─────┬──────┘  └──────┬───────┘  └┬────────────────┘
         │                │            │
┌────────▼────────────────▼────────────▼──────────────┐
│              Async Message Bus (Kafka)               │
└──────────┬──────────────┬──────────────┬────────────┘
           │              │              │
    ┌──────▼─────┐ ┌──────▼────┐ ┌──────▼──────┐
    │MySQL/Vitess│ │   Redis   │ │  Cassandra  │
    │(user data, │ │ (counters,│ │ (timelines, │
    │ metadata)  │ │  cache)   │ │  history)   │
    └────────────┘ └───────────┘ └─────────────┘
```

*All services communicate asynchronously via Kafka for non-critical paths.*

TikTok's secret weapon. **~70% of video traffic** is served directly from edge nodes in 150+ cities, bypassing origin entirely. It uses Anycast routing to send users to the nearest PoP. Manifest files (playlist URLs) are invalidated within seconds of a video going viral.

Chunked multi-part upload (5 MB chunks) tolerates flaky mobile connections. Workers dedup via `SHA-256`

before writing. Transcode jobs run on GPU fleets — outputs include `360p`

, `720p`

, `1080p`

, and HEVC variants. Thumbnails and stills are extracted for ML feature generation.

A **two-tower neural network**:

Dot product gives a relevance score. The model runs online for top-k retrieval, then a ranker applies real-time signals (trending, friend activity) before the feed is assembled.

This is where TikTok differs from Twitter/Instagram:

The feed service merges both lists, injects ML-recommended videos, and applies diversity rules to avoid repetition. Final feed is cached in Redis with a `300s`

TTL.

All write events (upload complete, like, follow, watch-complete) are published to Kafka topics. Downstream consumers include:

Topics are partitioned by `user_id`

for ordered processing per user. This decouples services and allows independent scaling.

| Store | Use Case | Why |
|---|---|---|
MySQL / Vitess |
User profiles, video metadata, social graph | ACID, sharded by `user_id`
|
Redis Cluster |
Counters (likes, views), session tokens, feed cache | Sub-millisecond reads |
Cassandra |
Watch history, timelines, notification logs | Wide-row reads, high write throughput |

The classic dilemma in social feed systems. TikTok uses a **hybrid approach** (the "celebrity problem" split):

**Fan-out on write** (for accounts with millions of followers):

**Fan-out on read** (for regular users):

Like/view counts can lag by a few seconds — nobody notices. But user authentication tokens and billing events require **strong consistency**. TikTok segments these into separate storage tiers with different consistency guarantees, accepting complexity for throughput on hot paths.

Likes and comments use **WebSocket push** for real-time delivery. Less critical notifications (weekly summaries, suggested follows) use a **pull-based batch pipeline** that runs every few hours — no need to maintain a persistent connection for a weekly digest email.

Assumptions:1B MAU, 500M DAU, avg user watches 45 min/day, avg video = 30 sec ~= 8 MB (720p). 34M uploads/day ~= 400 uploads/sec peak.

**Storage:**

```
34M uploads/day x 8 MB x 3 resolutions = ~816 TB/day of new video
With 3x replication over 5 years = ~4.4 EB total raw storage
```

**Feed reads:**

```
500M DAU x 20 feed refreshes/day / 86,400 sec = ~115,000 feed reads/sec
With 95% Redis cache hit rate -> recommendation backend sees ~5,750 rps
```

**Bandwidth:**

```
500M users x 45 min x 2 Mbps (720p) / 86,400 = ~26 Tbps peak egress
```

This is why TikTok operates its own backbone in many regions and has deep-peering agreements with major ISPs.

Most social platforms optimize for social graph traversal — *show me what people I follow posted*. TikTok inverted this: **the algorithm is the product**. The architecture is built around a recommendation pipeline that must be both blazing-fast and constantly learning from watch signals.

Three things stand out:

**Aggressive edge caching** — they push video delivery as close to the user as physically possible. The CDN is not a performance optimization; it is the entire delivery strategy.

**Real-time ML feedback loops** — a video's trajectory is decided in the first 30 minutes based on completion rate signals. A new creator can go viral without any followers.

**Microservice isolation** — upload, serving, recommendation, and social graph are independently deployable and scalable, preventing any single bottleneck from cascading.

If you're using this for a system design interview:

*Found this useful? Follow for more system design deep dives — next up: designing YouTube's upload pipeline at scale.*
