Quicktok: A faster tokenizer

wpnews.pro

cd /news/developer-tools/quicktok-a-faster-tokenizer · home › topics › developer-tools › article

[ARTICLE · art-28848] src=discuss.huggingface.co ↗ pub=2026-06-16T02:14Z topic=developer-tools verified=true sentiment=↑ positive

Quicktok: A faster tokenizer

Developer dmatth1 released quicktok, a fast exact BPE tokenizer written in C++ that runs 2–3.5× faster than bpe-openai and 4–11× faster than tiktoken on CPU. The tokenizer is byte-identical to tiktoken and available as a Python library and C++ source, targeting large-scale data processing and inference serving.

read3 min views25 publishedJun 16, 2026

Been working on this a while! Should be useful for anyone trying to speed up their tokenization flow.

Introducing quicktok native (C++) and Python

quicktok is a fast, exact BPE tokenizer written in C++. Token ids are byte-identical to tiktoken

, and encoding runs 2–3.5× faster than bpe-openai

(the fastest alternative I know of) and 4–11× faster than tiktoken

itself. I believe it’s the fastest exact CPU tokenizer available today for these encodings. It ships cl100k, o200k, GPT-OSS (o200k_harmony), Llama-3, and Qwen2.5/3, all byte-exact, plus bring-your-own Llama-4.

This is useful for anyone doing large amounts of CPU-bound data processing — search indexing, ingesting corpora, token counting/billing — and can significantly reduce the time and cost of data ingestion. It can also be used for online request serving, such as CPU-bound inference paths (token counting, embedding serving).

I’m releasing it as a Python library (pip install quicktok-v1

) and it’s available via C++ source. Repo: GitHub - dmatth1/quicktok: Fast exact BPE tokenizer. Byte-identical to tiktoken, 7x faster · GitHub. Measured on 3 public corpora on my Apple M1, single thread, MB/s. Every encoder’s output was verified token-for-token identical against tiktoken

before timing.

cl100k_base (GPT-3.5 / GPT-4)

encoder	The Pile	GitHub code	Common Crawl
quicktok
116.1
144.2
75.2
bpe-openai	36.5	41.6	29.2
tiktoken-rs	15.3	14.3	13.5
tiktoken (Python)	14.7	13.2	12.3
TokenDagger	11.5	12.0	11.2

o200k_base (GPT-4o)

encoder	The Pile	GitHub code	Common Crawl
quicktok
100.6
117.1
59.2
bpe-openai	36.1	40.1	29.9
tiktoken-rs	23.1	20.9	17.9
tiktoken (Python)	21.6	19.3	16.3
TokenDagger	11.0	11.7	10.2

quicktok also beats llama.cpp’s tokenizer on the Llama-3 vocab by ~14×. The parallel encode_batch

reaches 706 MB/s native on 8 cores; from Python it sustains 550 MB/s — 24× tiktoken

’s batch API. The speedups hold on other architectures like x86.

To keep the comparison fair, each encoder is called through the same raw API its own benchmark uses. (TokenDagger’s README claims 2–4× over tiktoken, but that’s on Llama-4/Mistral vocabs on AMD EPYC; on cl100k/o200k it lands around Python tiktoken’s level.) To reproduce run make bench-compare

in the repo.

The fundamental algorithm is the same as bpe-openai

(exact backtracking BPE) - see their blog post. Much of the speedup over bpe-openai

comes from data-structure engineering around reducing memory accesses.

All comparisons are single-threaded by design - parallel/batch is available but single-threaded is fair for comparison. Multilingual text (Common Crawl) is definitely the weakest ratio. Numbers above are from an M1 and were cross-checked on x86 Xeon - the ordering holds on both but absolute MB/s moves with corpus and host. Table numbers are the native C++ build (-march=native)

. The prebuilt PyPI wheels are portable-ABI and land lower - roughly 1.1–1.6× bpe-openai

depending on corpus. Building from source recovers the table numbers.

The full methodology - corpus fetching, the exactness gate, raw-API rules — is in the bench README in the repo. If you find an input where quicktok’s ids differ from tiktoken’s that’s definitely a bug and please report it!

source & further reading

discuss.huggingface.co — original article Rakarrack-0.6.1 port making progress! ( AI assisted ) Cloud Storage Poll Welcome to Haiku basic(Haiku Docs, Haiku slide and Haiku sheets)

~/api · this article 200

$curl api.wpnews.pro/v1/news/quicktok-a-faster-tokeni…

Read original on discuss.huggingface.co → discuss.huggingface.co/t/quicktok-a-faster-token…

mentioned entities

dmatth1

quicktok

tiktoken

bpe-openai

OpenAI

llama.cpp

GitHub

Apple M1

metadata

slugquicktok-a-faster-tokenizer

topic#developer-tools

secondary3 topics

sentimentpositive

canonicaldiscuss.huggingface.co

navigation

← prevHow to get empowered, not overpo…

next →where the copilot credits went

── more in #developer-tools 4 stories · sorted by recency

dev.to · 1 Aug · #developer-tools

Beyond the API Key: Securing AI Agents with Credential Abstraction and Zero-Trust MCP Architectures

startupfortune.com · 31 Jul · #developer-tools

Amazon completes its $50 billion bet on OpenAI while still bankrolling rival Anthropic

runtimewire.com · 31 Jul · #developer-tools

Amazon completes $50 billion OpenAI investment, deepening its AWS alliance

dev.to · 31 Jul · #developer-tools

AI Daily Digest — August 1, 2026: ARC-AGI-3 Harness Discovery, EU AI Gigafactories, Devin SWE-1.7

── more on @dmatth1 3 stories trending now

wpnews · 30 Jul · #artificial-intelligence

Microsoft and Meta Earnings Show Different AI Spending Pressures

wpnews · 31 Jul · #ai-products

E J Ziyad launches UML, a shared memory graph for Claude and ChatGPT

wpnews · 31 Jul · #artificial-intelligence

OpenAI Slashes GPT-5.6 Prices as Tech Giants Wage War Over Enterprise AI Spending

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required