Training a Twitch chat toxicity classifier on real VOD data at scale

wpnews.pro

Quick answer:Twitch has no public API for VOD chat replay. To build a Twitch toxicity classifier dataset you walk the internalVideoCommentsByOffsetOrCursor

GraphQL endpoint at scale — the same one the web player uses. The[Devil Scrapes Twitch VOD Chat Archive Actor]does that for $0.001 per message (~$1.05 per 1,000), returning the structured fields —message_fragments

,badges

,is_subscriber

— that make classifier features actually useful.

If you maintain a mod-bot (StreamElements, Nightbot, Streamlabs, or custom), or if you are an ML engineer building a Twitch-native toxicity model, your training data problem is the same: you need labeled-able chat messages at scale from real VODs, with enough context per row to build signal-rich features. This post walks the full pipeline — pulling the data, it into pandas, training a baseline TF-IDF + logistic-regression classifier, and sketching the upgrade path to a transformer.

Not in any useful sense. The Twitch Helix API exposes live IRC chat via EventSub and the Chat & Messaging endpoints, but it has no endpoint for VOD chat replay — the historical timestamped record of a past broadcast. That data exists (you can watch it in the VOD player), but the only programmatic surface for it is the internal VideoCommentsByOffsetOrCursor

persisted GraphQL query.

Walking that endpoint reliably is a job in itself. Twitch inspects TLS fingerprints from incoming requests — Python's requests

or httpx

produce a ClientHello that no real browser sends, and the server responds with a 403

before it reads the body. Past roughly 10,000 messages on a single IP, Twitch's rate-limiting kicks in hard. The cursor-based pagination mode triggers an integrity-check challenge that needs a live browser to solve. Offset-based pagination avoids it, but only if you know to use it before you start coding.

We absorb all of that. The Actor rotates through Chrome, Firefox, and Safari TLS fingerprints via curl-cffi

, threads residential proxies with fresh session IDs on each block, retries with exponential backoff on 408 / 429 / 5xx

, and pages exclusively by content offset to sidestep the integrity check. The result is a clean dataset of typed rows you can load straight into pandas.

Not all chat APIs return the same structure. The fields the Actor returns were chosen with feature engineering in mind:

** message_text** — the plain-text body of the message with emote shortcodes preserved as literal text (e.g.

"PogChamp PogChamp OMEGALUL"

). This is your label target and your primary text feature.** message_fragments** — a structured array of

{type, text, emote_id}

objects. Type is either "text"

or "emote"

. This matters because emotes carry semantic weight a TF-IDF tokenizer cannot capture from their shortcode text alone. An "emote"

fragment with emote_id

lets you treat emotes as a distinct token type, deduplicate their representation, or embed them separately. Spam runs often consist almost entirely of emote fragments; that ratio is a cheap feature.** badges** — an array of

{set_id, version}

objects representing the user's active chat badges. A user carrying a moderator

badge, a broadcaster

badge, or a vip

badge is structurally different from a first-time chatter — and their messages should be weighted differently in your training set. A model that does not distinguish a moderator warning from a random user saying the same thing is a weaker model.** is_subscriber** — a boolean convenience flag derived from the badges array. Subscribers are users who have paid for channel membership; their base rate of toxic behavior differs from non-subscribers. This is a fast binary feature your model can use without parsing the full badges array.

** message_offset_seconds** — the message's position in the VOD timeline in seconds. Toxic spikes correlate with in-stream events: a bad play, a controversial opinion, a raid. Including offset in your labeling pass lets you sample across the full timeline rather than front- training data from the first ten minutes.

** commenter_id** and

commenter_login

You need apify-client

installed (pip install apify-client pandas scikit-learn

). Get a free Apify API token at apify.com — no card required, every account starts with $5 of credit.

The call below targets three VODs by ID and caps at 5,000 messages per VOD. At $0.001 per message plus the $0.05 actor-start, 15,000 messages costs $15.05.

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

run = client.actor("DevilScrapes/twitch-vod-chat-archive").call(
    run_input={
        "vodIds": [
            "2773625679",
            "2756421083",
            "2741897234"
        ],
        "maxMessagesPerVod": 5000,
        "startOffsetSeconds": 0,
        "proxyConfiguration": {
            "useApifyProxy": True,
            "apifyProxyGroups": ["RESIDENTIAL"]
        }
    }
)

items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
print(f"Pulled {len(items)} messages")

For a larger training corpus — say 100 VODs from a mix of channels — set maxRecentVods

on channelLogin

mode instead of listing IDs:

run = client.actor("DevilScrapes/twitch-vod-chat-archive").call(
    run_input={
        "channelLogin": "shroud",
        "maxRecentVods": 50,
        "maxMessagesPerVod": 10000,
        "proxyConfiguration": {
            "useApifyProxy": True,
            "apifyProxyGroups": ["RESIDENTIAL"]
        }
    }
)

That gives you up to 500,000 messages per channel in a single run. At $0.001/message that is ~$500.05 for the full 500k — but the free $5 trial credit covers 4,950 messages, enough to validate your pipeline before committing.

import pandas as pd

df = pd.DataFrame(items)

def emote_ratio(fragments):
    if not fragments:
        return 0.0
    emote_count = sum(1 for f in fragments if f.get("type") == "emote")
    return emote_count / len(fragments)

df["emote_ratio"] = df["message_fragments"].apply(emote_ratio)

def badge_set(badges):
    return frozenset(b["set_id"] for b in badges) if badges else frozenset()

df["badge_set"] = df["badges"].apply(badge_set)

df["is_moderator"] = df["badge_set"].apply(lambda s: "moderator" in s)
df["is_broadcaster"] = df["badge_set"].apply(lambda s: "broadcaster" in s)

msg_counts = df.groupby("commenter_id")["message_id"].count().rename("user_msg_count")
df = df.merge(msg_counts, on="commenter_id", how="left")

print(df[["message_text", "is_subscriber", "is_moderator", "emote_ratio", "user_msg_count"]].head())

Sample output row from a real VOD scrape (channel: shroud, toxic content masked):

{
  "vod_id": "2773625679",
  "vod_title": "never played forza but i definitely have a drivers license so it should be easy",
  "channel_login": "shroud",
  "message_id": "1292e052-0561-4db5-86c7-adfc4556d628",
  "message_offset_seconds": 12,
  "posted_at": "2026-05-16T18:42:35.297Z",
  "commenter_id": "142680597",
  "commenter_login": "tabrexs",
  "commenter_display_name": "tabrexs",
  "message_text": "PewPewPew",
  "message_fragments": [
    {
      "type": "emote",
      "text": "PewPewPew",
      "emote_id": "emotesv2_587405136a8147148c77df74baaa1bf4"
    }
  ],
  "user_color": "#DAA520",
  "badges": [],
  "is_subscriber": false,
  "scraped_at": "2026-05-16T19:00:00Z"
}

For a first iteration, label toxic/benign manually on a sample and train a TF-IDF + logistic-regression baseline. This is fast to iterate on and gives you a performance floor to beat with transformer fine-tuning later.

Important framing note for the labeling pass: toxic labels in mod-tool training are typically defined by the channel's own moderation rules, not a universal taxonomy. What a family-friendly channel flags as toxic differs from a gaming-focused one. Build your label schema per-channel or use a community standard like Perspective API categories for initial seeding.

Do not include known-slur text in your labeled examples file in plaintext — store them masked (e.g. [masked slur]

) and apply transformations at load time. The mod community, and any team reviewing your training data, will thank you.

import json
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.pipeline import Pipeline
import numpy as np

with open("labels.json") as f:
    labels = json.load(f)  # {"message_id_1": 0, "message_id_2": 1, ...}

labeled_df = df[df["message_id"].isin(labels)].copy()
labeled_df["label"] = labeled_df["message_id"].map(labels)

X_text = labeled_df["message_text"].fillna("")
y = labeled_df["label"]

X_train, X_test, y_train, y_test = train_test_split(
    X_text, y, test_size=0.2, random_state=42, stratify=y
)

pipeline = Pipeline([
    ("tfidf", TfidfVectorizer(
        ngram_range=(1, 2),
        max_features=20000,
        sublinear_tf=True
    )),
    ("clf", LogisticRegression(
        C=1.0,
        class_weight="balanced",  # important: toxic is a minority class
        max_iter=1000
    )),
])

pipeline.fit(X_train, y_train)
y_pred = pipeline.predict(X_test)

print(classification_report(y_test, y_pred, target_names=["benign", "toxic"]))

Adding structural features alongside TF-IDF:

The text pipeline above ignores emote_ratio

, is_subscriber

, and user_msg_count

. To include them in the same model, combine sparse TF-IDF with a dense feature matrix:

from scipy.sparse import hstack
from sklearn.preprocessing import StandardScaler

dense_features = labeled_df[["emote_ratio", "is_subscriber", "is_moderator", "user_msg_count"]].fillna(0).values

X_train_dense, X_test_dense = (
    dense_features[labeled_df.index.isin(X_train.index)],
    dense_features[labeled_df.index.isin(X_test.index)],
)

tfidf = TfidfVectorizer(ngram_range=(1, 2), max_features=20000, sublinear_tf=True)
X_train_sparse = tfidf.fit_transform(X_train)
X_test_sparse = tfidf.transform(X_test)

X_train_combined = hstack([X_train_sparse, X_train_dense])
X_test_combined = hstack([X_test_sparse, X_test_dense])

clf = LogisticRegression(C=1.0, class_weight="balanced", max_iter=1000)
clf.fit(X_train_combined, y_train)

print(classification_report(y_test, clf.predict(X_test_combined), target_names=["benign", "toxic"]))

In practice the emote_ratio

column tends to lift spam precision noticeably — pure-emote spam messages produce a ratio near 1.0 and a short message_text

length, a combination TF-IDF alone does not capture well.

The baseline above will plateau around 75–82% F1 on a well-balanced Twitch dataset. The main failure modes are:

The upgrade path is to fine-tune a pre-trained model on your labeled data. cardiffnlp/twitter-roberta-base-offensive

is a strong starting checkpoint for chat-style text — it was trained on social-media toxicity and transfers better to Twitch than a generic BERT.

from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
from datasets import Dataset

model_name = "cardiffnlp/twitter-roberta-base-offensive"
tokenizer = AutoTokenizer.from_pretrained(model_name)

hf_dataset = Dataset.from_pandas(labeled_df[["message_text", "label"]].rename(columns={"message_text": "text"}))

def tokenize(batch):
    return tokenizer(batch["text"], truncation=True, padding="max_length", max_length=128)

tokenized = hf_dataset.map(tokenize, batched=True)

The message_fragments

field opens a further avenue: treat emote tokens as special tokens added to the tokenizer vocabulary (one token per emote_id

), then let the model learn emote embeddings jointly with text. This is not a weekend project, but it is the difference between a model that handles OMEGALUL

as an unknown token and one that learns it signals laughter.

The plan answers the pricing question directly. At $0.001/message:

Pull size	Cost	Labeled examples (assuming 10% manual label rate)
10,000 messages	$10.05	~1,000 labeled rows
50,000 messages	$50.05	~5,000 labeled rows
100,000 messages	$100.05	~10,000 labeled rows

For a TF-IDF baseline, 1,000–5,000 labeled examples is workable if your class balance is reasonable. For transformer fine-tuning, 5,000+ labeled examples per class is the typical floor for stable results. You get to the free trial's 4,950 messages before spending a cent — that is enough to validate your feature extraction pipeline end-to-end before scaling up.

The full Twitch chat scraper guide covers the broader use-case landscape (esports analytics, post-broadcast review, channel back-catalog mode) if you want context beyond classifier training: Twitch Chat Scraper: export any VOD's full chat replay for $1.05/1K.

Can I use this for StreamElements / Nightbot rule testing?

Yes. Pull historical chat from VODs where you know toxic events occurred, then replay the message_text

values through your bot's filter rules in a test harness. The badges

and is_subscriber

fields let you simulate the trust-level rules most bots implement (moderators and subscribers often get different thresholds).

Does the Actor return deleted or banned messages?

No. The public chat-replay endpoint does not expose moderator actions — bans, timeouts, or the content of deleted messages. Deleted messages may appear as a <message deleted>

placeholder or may not appear at all, depending on when they were removed relative to the archive write. Your toxicity model should treat the absence of a message ID from a later snapshot as a soft toxic signal, not a hard one.

How do I avoid training on bot messages?

Filter on user_msg_count

— accounts that sent more than N messages in the same VOD are candidate spam bots. You can also filter out users whose message_text

is identical across multiple rows in the same VOD (copy-paste spam). The Actor returns the stable commenter_id

so grouping is straightforward.

Is this legal / TOS-compliant?

Twitch's public VOD chat replay is presented to any logged-out visitor; this Actor retrieves only what the VOD player shows anonymously, at a paced rate. We are not affiliated with Twitch. Check your own jurisdiction and use case. The Twitch Terms of Service governs what you may do with the collected data — notably the prohibition on commercial use of data in ways that compete directly with Twitch.

The Actor is live at ** apify.com/DevilScrapes/twitch-vod-chat-archive**. Free $5 trial credit, no credit card. Pull a few thousand messages from a channel you know, run through the pipeline above, and you will have a working baseline before the end of the day. Leave a question in the comments if you hit a snag — the

message_fragments

/ feature-engineering section in particular has sharp edges worth talking through.Built by Devil Scrapes — we do the dirty work so your dataset stays clean. 😈

source & further reading

dev.to — original article I Couldn’t Fix My LLM Costs Until I Measured Tokens Per Feature Small Model SWE‑bench: What Happens When You Push Tiny Models Into Full Task Pipelines Grok 4.5 Isn't Open Source. The Apache 2.0 Release Has a Privacy Catch.

Training a Twitch chat toxicity classifier on real VOD data at scale

Run your AI side-project on zahid.host