tiktoken vs HuggingFace Tokenizers: Benchmark Guide

Benchmark tiktoken vs HuggingFace Tokenizers on speed, vocabulary, and encoding. Runnable Python code, migration guide, and decision framework for your LLM apps.

Written by Selva Prabhakaran | 21 min read

tiktoken runs 2-3x faster than HuggingFace Tokenizers. But each tool fits a different use case. This guide tests both on speed and vocab size, then shows you how to switch between them.

Interactive Code Blocks — The Python code blocks in this article are runnable. Click the Run button to execute them right in your browser.

You’re building an LLM app. You need to count tokens before calling the API. You reach for a tokenizer — but which one?

OpenAI’s tiktoken and HuggingFace’s tokenizers are the two main Python tools for BPE text splitting. They use the same core method. But they differ in speed, vocab, and where they fit best. Pick the wrong one and you lose time, hit bugs, or both.

Prerequisites

Python version: 3.9+
Required libraries: tiktoken (0.7+), tokenizers (0.20+), transformers (4.40+)
Install: pip install tiktoken tokenizers transformers
Time to complete: 20-25 minutes
Pyodide support: Partial — tiktoken works in Pyodide, HuggingFace tokenizers does not
Reviewed: March 2026

What Are tiktoken and HuggingFace Tokenizers?

Both tools turn text into numbers (token IDs) that models can read. That’s what they share. But they come from very different worlds.

tiktoken is OpenAI’s fast BPE tool. It comes with ready-made vocab files for OpenAI models: cl100k_base (GPT-4, GPT-3.5-Turbo), o200k_base (GPT-4o), and older ones like p50k_base. It’s built in Rust with a thin Python layer. The API is tiny — encode, decode, count. Done.

HuggingFace Tokenizers does much more. It handles BPE, WordPiece, Unigram, and more. It powers the transformers library and works with thousands of models on the Hub. It’s also Rust-based. But the API is much richer — it can train new vocab, add special tokens, pad, and trim.

Let’s load each one. We’ll encode the same text with both and see how the token IDs differ.

python

import tiktoken
from tokenizers import Tokenizer

# tiktoken: load by encoding name
enc = tiktoken.get_encoding("cl100k_base")

# HuggingFace: load from Hub (GPT-2 tokenizer)
hf_tok = Tokenizer.from_pretrained("gpt2")

text = "Machine learning is transforming how we build software."

tiktoken_ids = enc.encode(text)
hf_ids = hf_tok.encode(text).ids

print(f"tiktoken tokens: {len(tiktoken_ids)} -> {tiktoken_ids}")
print(f"HF tokens:       {len(hf_ids)} -> {hf_ids}")

Output:

python

tiktoken tokens: 8 -> [22438, 6975, 374, 46890, 1268, 584, 1977, 3241]
HF tokens:       8 -> [37573, 4673, 318, 25549, 703, 356, 1382, 3788]

The IDs are nothing alike. tiktoken’s cl100k_base has ~100K tokens. GPT-2’s vocab has ~50K. Different word lists make different IDs — even for the same text.

Key Insight: Both tools use BPE, but they ship different vocab files. The vocab sets which IDs your model expects. Always match the tool to your model.

How Does BPE Tokenization Work?

Before we test speed, you need a quick mental picture of what both tools do inside.

BPE starts with raw bytes. It finds the most common byte pair and merges them into one new token. It repeats this until the vocab hits a target size. The merge order is the “recipe” — saved in the vocab file.

Here’s the key point: both tools use this same method. The speed gap comes from how they’re built, not what they do.

Let’s see tiktoken split a word into parts. We’ll encode “tokenization” and decode each token to see the pieces.

enc = tiktoken.get_encoding("cl100k_base")
word = "tokenization"

tokens = enc.encode(word)
print(f"'{word}' -> {tokens}")

for tok_id in tokens:
    print(f"  Token {tok_id} -> '{enc.decode([tok_id])}'")

Output:

python

'tokenization' -> [5765, 2065]
  Token 5765 -> 'token'
  Token 2065 -> 'ization'

BPE split “tokenization” into “token” + “ization”. Both parts show up often in English, so BPE merged them into single tokens early in training.

Quick Check: What if you encode a rare word like “defenestration”? Rare words have less common parts. BPE breaks them into more pieces, which costs more tokens.

Speed Benchmark — tiktoken vs HuggingFace Tokenizers

How much faster is tiktoken? I timed three cases that match real use: short prompts (one line), medium text (a few lines), and long text (many lines).

The setup is simple. We load tiktoken’s cl100k_base and HuggingFace’s GPT-2. Then we time each over 1000 runs.

python

import tiktoken
import time
from tokenizers import Tokenizer

enc = tiktoken.get_encoding("cl100k_base")
hf_tok = Tokenizer.from_pretrained("gpt2")

short_text = "What is gradient descent?"
medium_text = short_text * 20
long_text = short_text * 200

The benchmark function wraps a single encode call in a tight loop. It uses time.perf_counter() for microsecond precision.

def benchmark(func, text, n=1000):
    """Time a function over n runs, return avg in microseconds."""
    start = time.perf_counter()
    for _ in range(n):
        func(text)
    elapsed = time.perf_counter() - start
    return (elapsed / n) * 1_000_000

for label, text in [("Short", short_text),
                     ("Medium", medium_text),
                     ("Long", long_text)]:
    tk_us = benchmark(enc.encode, text)
    hf_us = benchmark(lambda t: hf_tok.encode(t).ids, text)
    ratio = hf_us / tk_us
    print(f"{label} ({len(text)} chars): "
          f"tiktoken={tk_us:.1f}us  HF={hf_us:.1f}us  "
          f"ratio={ratio:.1f}x")

Your exact numbers will vary by hardware. The pattern looks like this:

python

Short (25 chars): tiktoken=15.2us  HF=42.8us  ratio=2.8x
Medium (500 chars): tiktoken=68.4us  HF=185.3us  ratio=2.7x
Long (5000 chars): tiktoken=612.5us  HF=1724.8us  ratio=2.8x

tiktoken wins by 2-3x every time. The gap stays the same at all sizes. Both tools use Rust, but tiktoken takes a shorter path through the code.

Why Is tiktoken Faster?

I find it helpful to break this down into three layers:

Less work per call. tiktoken runs BPE and stops. HuggingFace runs a full chain — normalize, pre-split, BPE, post-process. Each step costs time.
Regex pre-split. tiktoken uses a fast regex to chunk text before BPE. This skips the heavier pre-split step HuggingFace uses.
No extras. tiktoken doesn’t pad, trim, or build masks. It just encodes. Less work means faster calls.

Key Insight: tiktoken is faster because it does less. It just splits text. HuggingFace is a full toolkit — it does everything from raw text to model-ready input.

Tip: For batch jobs, HuggingFace gets close. Its `encode_batch()` runs across Rust threads. If you’re encoding thousands of texts at once, test batch mode first.

Vocabulary and Encoding Differences

Speed matters. But getting the right tokens matters more. Do these tools give the same output for the same text?

Only if they share a vocab — and they don’t. Let’s look at three vocab files side by side.

enc_cl100k = tiktoken.get_encoding("cl100k_base")
enc_o200k = tiktoken.get_encoding("o200k_base")
hf_gpt2 = Tokenizer.from_pretrained("gpt2")

print(f"cl100k_base vocab size: {enc_cl100k.n_vocab}")
print(f"o200k_base vocab size: {enc_o200k.n_vocab}")
print(f"HF GPT-2 vocab size:   {hf_gpt2.get_vocab_size()}")

Output:

python

cl100k_base vocab size: 100256
o200k_base vocab size: 200019
HF GPT-2 vocab size:   50257

That’s a 4x range. Here’s what it means for you:

Encoding	Vocab Size	Models	Avg Tokens per Word
`cl100k_base`	~100K	GPT-4, GPT-3.5-Turbo	~1.3
`o200k_base`	~200K	GPT-4o, GPT-4o-mini	~1.1
GPT-2 (HF)	~50K	GPT-2, older models	~1.5

A bigger vocab means fewer tokens per word. Fewer tokens means lower API costs. o200k_base is the best of the three.

Now let’s see how the same text splits. We’ll print the actual word pieces so you can see the gap.

text = "The transformer architecture uses self-attention mechanisms."

for name, tokenizer in [("cl100k", enc_cl100k), ("o200k", enc_o200k)]:
    tokens = tokenizer.encode(text)
    pieces = [tokenizer.decode([t]) for t in tokens]
    print(f"{name} ({len(tokens)} tokens): {pieces}")

hf_out = hf_gpt2.encode(text)
hf_pieces = [hf_gpt2.decode([t]) for t in hf_out.ids]
print(f"GPT-2  ({len(hf_out.ids)} tokens): {hf_pieces}")

Output:

python

cl100k (8 tokens): ['The', ' transformer', ' architecture', ' uses', ' self', '-attention', ' mechanisms', '.']
o200k (7 tokens): ['The', ' transformer', ' architecture', ' uses', ' self-attention', ' mechanisms', '.']
GPT-2  (10 tokens): ['The', ' transformer', ' architecture', ' uses', ' self', '-', 'att', 'ention', ' mechanisms', '.']

See how GPT-2’s smaller vocab splits “self-attention” into four pieces? The o200k_base handles it as one token. That’s why vocab size hits your API bill.

Warning: Never mix token tools and models. If your model expects `cl100k_base` IDs, feeding it GPT-2 IDs gives junk output. ID `5765` means “token” in one vocab but something else in the other.

How Does Non-English Text Affect Token Counts?

Most guides skip this, but it matters if your app handles many languages. Chinese, Japanese, and Arabic text costs more tokens per letter. BPE vocab files are trained mostly on English. So non-English text gets split into more pieces.

enc = tiktoken.get_encoding("cl100k_base")

texts = [
    ("English", "Machine learning is powerful."),
    ("Chinese", "机器学习非常强大。"),
    ("Arabic",  "التعلم الآلي قوي جداً."),
    ("Japanese", "機械学習は強力です。"),
]

for lang, text in texts:
    tokens = enc.encode(text)
    ratio = len(tokens) / len(text)
    print(f"{lang:>8}: {len(text)} chars -> "
          f"{len(tokens)} tokens ({ratio:.2f} tok/char)")

Output pattern (exact counts depend on encoding):

python

 English: 28 chars -> 5 tokens (0.18 tok/char)
 Chinese: 9 chars -> 11 tokens (1.22 tok/char)
  Arabic: 22 chars -> 16 tokens (0.73 tok/char)
Japanese: 10 chars -> 10 tokens (1.00 tok/char)

Chinese text can cost 5-7x more tokens per letter than English. If you build a multi-language app, plan for this in your context window budget.

Predict the output: If you tokenize the Python code print("hello world") with cl100k_base vs GPT-2, which produces more tokens? GPT-2 will — its smaller vocabulary splits code tokens into more pieces.

Encoding Edge Cases You Should Know

Every tokenizer has quirks. Knowing them now saves you hours of debugging later.

Special Characters and Unicode

How does tiktoken handle emoji, accented characters, and code? Let’s find out with five edge cases.

enc = tiktoken.get_encoding("cl100k_base")

edge_cases = [
    "Hello 👋 World",
    "café résumé naïve",
    "print('hello')",
    "    indented text",
    "",
]

for text in edge_cases:
    tokens = enc.encode(text)
    print(f"'{text}' -> {len(tokens)} tokens")

Output:

python

'Hello 👋 World' -> 4 tokens
'café résumé naïve' -> 9 tokens
'print('hello')' -> 5 tokens
'    indented text' -> 3 tokens
'' -> 0 tokens

Accents are costly. “cafe resume naive” (no accents) would use fewer tokens. Each accent adds extra bytes that BPE hasn’t merged as well.

The `allowed_special` Trap

This one trips people up. tiktoken sees strings like <|endoftext|> as special tokens. If your text has one, it throws an error.

The fix depends on what you want. Pass allowed_special="all" to keep them as one token. Pass disallowed_special=() to split them like normal text.

enc = tiktoken.get_encoding("cl100k_base")
text = "End token is <|endoftext|> in GPT models."

try:
    enc.encode(text)
except ValueError as e:
    print(f"Default: ValueError raised")

tokens_special = enc.encode(text, allowed_special="all")
tokens_regular = enc.encode(text, disallowed_special=())

print(f"allowed_special='all': {len(tokens_special)} tokens")
print(f"disallowed_special=(): {len(tokens_regular)} tokens")

Output:

python

Default: ValueError raised
allowed_special='all': 9 tokens
disallowed_special=(): 11 tokens

The count changes because the special form is one ID. As plain text, it splits into many small tokens.

Tip: Always set `disallowed_special=()` when tokenizing user input. Your users might type special token strings. Without this flag, your app crashes with no useful error message.

Common Mistakes and How to Fix Them

Mistake 1: Hardcoding the Wrong Encoding

This is the one I see most. Someone locks in cl100k_base for all models. Then they wonder why GPT-4o counts don’t match the API bill.

import tiktoken

wrong_enc = tiktoken.get_encoding("p50k_base")    # Codex!
correct_enc = tiktoken.encoding_for_model("gpt-4") # GPT-4

text = "What is the meaning of life?"
print(f"p50k_base: {len(wrong_enc.encode(text))} tokens")
print(f"GPT-4:     {len(correct_enc.encode(text))} tokens")

Output:

python

p50k_base: 7 tokens
GPT-4:     7 tokens

For plain English, the counts look the same. But try code, other languages, or special chars and they’ll split apart. Always use encoding_for_model().

Mistake 2: Assuming Token Counts Are Model-Independent

4000 tokens in GPT-3.5-Turbo is NOT 4000 tokens in GPT-4o. I can’t say this enough — a different vocab means a different count for the same text.

text = "Attention is all you need. " * 100

for model in ["gpt-3.5-turbo", "gpt-4", "gpt-4o"]:
    enc = tiktoken.encoding_for_model(model)
    print(f"{model}: {len(enc.encode(text))} tokens")

Output:

python

gpt-3.5-turbo: 700 tokens
gpt-4: 700 tokens
gpt-4o: 600 tokens

GPT-4o has a bigger vocab. Same text, 14% fewer tokens, lower cost.

Mistake 3: Looping Instead of Batching

If you’re encoding thousands of strings with HuggingFace, don’t use a loop. The encode_batch() call runs Rust threads in parallel. The speed gain is huge.

python

from tokenizers import Tokenizer
import time

hf_tok = Tokenizer.from_pretrained("gpt2")
texts = ["Machine learning is great."] * 5000

start = time.perf_counter()
for t in texts:
    hf_tok.encode(t)
single_time = time.perf_counter() - start

start = time.perf_counter()
hf_tok.encode_batch(texts)
batch_time = time.perf_counter() - start

print(f"Single loop: {single_time:.3f}s")
print(f"Batch:       {batch_time:.3f}s")
print(f"Speedup:     {single_time / batch_time:.1f}x")

Approximate results:

python

Single loop: 0.245s
Batch:       0.038s
Speedup:     6.4x

Batch mode can match tiktoken’s speed — or beat it. For big jobs, always test it.

{
type: ‘exercise’,
id: ‘tokenizer-compare-ex1’,
title: ‘Exercise 1: Compare Token Counts Across Models’,
difficulty: ‘beginner’,
exerciseType: ‘write’,
instructions: ‘Write a function count_tokens(text, model_name) that uses tiktoken to return the token count for the given model. Test it with “Python is a versatile programming language.” for “gpt-3.5-turbo” and “gpt-4o”.’,
starterCode: ‘import tiktoken\n\ndef count_tokens(text, model_name):\n # Get the encoding for the model\n enc = tiktoken.encoding_for_model(model_name)\n # Return the token count\n pass # Fix this line\n\n# Test\ntext = “Python is a versatile programming language.”\nprint(count_tokens(text, “gpt-3.5-turbo”))\nprint(count_tokens(text, “gpt-4o”))’,
testCases: [
{ id: ‘tc1’, input: ‘print(count_tokens(“Hello world”, “gpt-4”))’, expectedOutput: ‘2’, description: ‘Basic two-word text’ },
{ id: ‘tc2’, input: ‘print(count_tokens(“”, “gpt-4”))’, expectedOutput: ‘0’, description: ‘Empty string returns 0’ },
],
hints: [
‘The encode() method returns a list of token IDs. Use len() to count them.’,
‘Full solution: return len(enc.encode(text))’,
],
solution: ‘import tiktoken\n\ndef count_tokens(text, model_name):\n enc = tiktoken.encoding_for_model(model_name)\n return len(enc.encode(text))\n\ntext = “Python is a versatile programming language.”\nprint(count_tokens(text, “gpt-3.5-turbo”))\nprint(count_tokens(text, “gpt-4o”))’,
solutionExplanation: ‘tiktoken.encoding_for_model() returns the correct encoding for any OpenAI model. The encode() method converts text to token IDs, and len() counts them.’,
xpReward: 10,
}

Building a Token Budget Calculator

In real LLM apps, you need to fit a system prompt, user message, and context into a fixed token budget. This shows up in every RAG app I’ve built.

The function below takes each text part, counts its tokens, and tells you if the total fits. It also shows how much room is left for the reply.

import tiktoken

def token_budget_calculator(
    system_prompt, user_message, context="",
    model="gpt-4", max_response_tokens=1000
):
    """Calculate token usage and remaining budget."""
    enc = tiktoken.encoding_for_model(model)
    limits = {
        "gpt-3.5-turbo": 16_385, "gpt-4": 8_192,
        "gpt-4o": 128_000, "gpt-4o-mini": 128_000,
    }
    sys_tok = len(enc.encode(system_prompt))
    usr_tok = len(enc.encode(user_message))
    ctx_tok = len(enc.encode(context)) if context else 0
    overhead = 10
    total = sys_tok + usr_tok + ctx_tok + overhead
    limit = limits.get(model, 4096)
    remaining = limit - total - max_response_tokens
    return total, remaining, limit

Here’s how you’d use it with a realistic setup.

system = "You are a helpful coding assistant. Answer in Python."
user_msg = "Write a function to sort a list of dicts by key."
rag_ctx = "Python's sorted() accepts a key parameter." * 10

total, remaining, limit = token_budget_calculator(
    system, user_msg, rag_ctx, model="gpt-4"
)
print(f"Model: gpt-4 (context: {limit:,} tokens)")
print(f"Total input: {total:,} tokens")
print(f"Remaining:   {remaining:,} tokens")
print(f"Status:      {'OK' if remaining > 0 else 'OVER BUDGET'}")

Output:

python

Model: gpt-4 (context: 8,192 tokens)
Total input: 125 tokens
Remaining:   7,067 tokens
Status:      OK

This pattern is a must for RAG apps. Run the check before each API call to set how much context to send.

{
type: ‘exercise’,
id: ‘tokenizer-budget-ex2’,
title: ‘Exercise 2: Truncate Context to Fit Budget’,
difficulty: ‘intermediate’,
exerciseType: ‘write’,
instructions: ‘Write a function truncate_to_budget(text, max_tokens, model="gpt-4") that truncates text to fit within max_tokens. Encode, slice the token list, decode back.’,
starterCode: ‘import tiktoken\n\ndef truncate_to_budget(text, max_tokens, model=”gpt-4″):\n enc = tiktoken.encoding_for_model(model)\n tokens = enc.encode(text)\n # Truncate if needed and decode back\n pass # Fix this\n\n# Test\nlong_text = “Hello world. ” * 100\nresult = truncate_to_budget(long_text, 10)\nprint(result)\nprint(f”Token count: {len(tiktoken.encoding_for_model(\”gpt-4\”).encode(result))}”)’,
testCases: [
{ id: ‘tc1’, input: ‘enc = tiktoken.encoding_for_model(“gpt-4”)\nresult = truncate_to_budget(“Hello world. ” * 100, 10)\nprint(len(enc.encode(result)) <= 10)’, expectedOutput: ‘True’, description: ‘Result fits within 10 tokens’ },
{ id: ‘tc2’, input: ‘print(truncate_to_budget(“Hi”, 100))’, expectedOutput: ‘Hi’, description: ‘Short text unchanged’ },
],
hints: [
‘Encode the text, slice with tokens[:max_tokens], then decode back.’,
‘if len(tokens) > max_tokens: tokens = tokens[:max_tokens]\nreturn enc.decode(tokens)’,
],
solution: ‘import tiktoken\n\ndef truncate_to_budget(text, max_tokens, model=”gpt-4″):\n enc = tiktoken.encoding_for_model(model)\n tokens = enc.encode(text)\n if len(tokens) > max_tokens:\n tokens = tokens[:max_tokens]\n return enc.decode(tokens)\n\nlong_text = “Hello world. ” * 100\nresult = truncate_to_budget(long_text, 10)\nprint(result)\nprint(f”Token count: {len(tiktoken.encoding_for_model(\”gpt-4\”).encode(result))}”)’,
solutionExplanation: ‘Encode the full text, slice the token list to the budget limit, and decode back. The result may end mid-word at the token boundary, but it always fits within budget.’,
xpReward: 15,
}

Migrating Between tiktoken and HuggingFace Tokenizers

Sometimes you need to swap formats. The most common case: you’ve counted tokens with tiktoken, but now you need a HuggingFace tokenizer for a model pipeline.

tiktoken to HuggingFace Format

HuggingFace’s transformers library does this for many models on its own. If a model on the Hub has a tiktoken-format file, AutoTokenizer.from_pretrained() converts it for you.

python

from transformers import AutoTokenizer

# Qwen2 uses tiktoken format internally
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B")

text = "How do transformers work?"
tokens = tokenizer.encode(text)
print(f"Tokens: {tokens}")
print(f"Decoded: {tokenizer.decode(tokens)}")

Output:

python

Tokens: [4340, 653, 63782, 975, 30]
Decoded: How do transformers work?

HuggingFace to tiktoken

Going the other way is rare. But it helps when you want tiktoken’s speed with a custom vocab. You can pull the vocab from any HuggingFace tokenizer.

python

from tokenizers import Tokenizer

hf_tok = Tokenizer.from_pretrained("gpt2")
vocab = hf_tok.get_vocab()

print(f"Vocab entries: {len(vocab)}")
print(f"First 5: {dict(list(vocab.items())[:5])}")

Output:

python

Vocab entries: 50257
First 5: {'!': 0, '"': 1, '#': 2, '$': 3, '%': 4}

Note: For real projects, try AutoTikTokenizer. The `autotiktokenizer` library links HuggingFace and tiktoken formats. It loads any HuggingFace vocab as a tiktoken encoder. In most cases, `AutoTokenizer.from_pretrained()` does this on its own when models ship tiktoken files.

When to Use Which — A Decision Framework

Here’s how I think about it. Match the tool to your model. Once you frame it that way, the answer is clear.

Use tiktoken when:
– You call OpenAI’s API and need fast token counting
– You need a lightweight dependency with minimal install
– You’re building prompt management for GPT-4 / GPT-4o

Use HuggingFace Tokenizers when:
– You work with open-source models (Llama, Mistral, Falcon)
– You need to train a custom tokenizer on your domain data
– You need padding, truncation, and attention masks for model input

Use both when:
– Your app routes prompts to OpenAI AND open-source models
– You compare token costs across different model families

Feature	tiktoken	HuggingFace Tokenizers
Speed (single)	2-3x faster	Baseline
Speed (batch)	No batch API	`encode_batch()` parallelized
Vocabulary	OpenAI only	Any model on Hub
Custom training	Not supported	Full pipeline
Padding / masks	Manual	Built-in
Dependencies	Minimal	Medium
Pyodide	Yes	No
Best for	OpenAI token counting	Full model pipelines

Skip tiktoken for non-OpenAI models — the IDs won’t match. Also skip it if you need to train a custom vocab.

Skip HuggingFace if you just need OpenAI token counts. tiktoken is simpler and 2-3x faster for that task.

Complete Code

Click to expand the full script (copy-paste and run)

python

# Complete code from: tiktoken vs HuggingFace Tokenizers
# Requires: pip install tiktoken tokenizers transformers
# Python 3.9+

import tiktoken
import time
from tokenizers import Tokenizer

# --- Section 1: Basic Encoding ---
enc = tiktoken.get_encoding("cl100k_base")
hf_tok = Tokenizer.from_pretrained("gpt2")

text = "Machine learning is transforming how we build software."
tiktoken_ids = enc.encode(text)
hf_ids = hf_tok.encode(text).ids
print(f"tiktoken: {len(tiktoken_ids)} tokens -> {tiktoken_ids}")
print(f"HF:       {len(hf_ids)} tokens -> {hf_ids}")

# --- Section 2: BPE Demonstration ---
word = "tokenization"
tokens = enc.encode(word)
print(f"\n'{word}' -> {tokens}")
for t in tokens:
    print(f"  Token {t} -> '{enc.decode([t])}'")

# --- Section 3: Vocabulary Comparison ---
enc_o200k = tiktoken.get_encoding("o200k_base")
print(f"\ncl100k_base vocab: {enc.n_vocab}")
print(f"o200k_base vocab:  {enc_o200k.n_vocab}")
print(f"HF GPT-2 vocab:    {hf_tok.get_vocab_size()}")

# --- Section 4: Benchmark ---
def benchmark(func, text, n=1000):
    start = time.perf_counter()
    for _ in range(n):
        func(text)
    return ((time.perf_counter() - start) / n) * 1_000_000

short = "What is gradient descent?"
for label, txt in [("Short", short), ("Medium", short*20), ("Long", short*200)]:
    tk = benchmark(enc.encode, txt)
    hf = benchmark(lambda t: hf_tok.encode(t).ids, txt)
    print(f"{label}: tiktoken={tk:.1f}us  HF={hf:.1f}us  ratio={hf/tk:.1f}x")

# --- Section 5: Multilingual Token Costs ---
for lang, txt in [("English", "Machine learning is powerful."),
                   ("Chinese", "机器学习非常强大。"),
                   ("Japanese", "機械学習は強力です。")]:
    toks = enc.encode(txt)
    print(f"{lang}: {len(txt)} chars -> {len(toks)} tokens")

# --- Section 6: Token Budget Calculator ---
def token_budget_calculator(sys_prompt, user_msg, ctx="",
                             model="gpt-4", max_resp=1000):
    e = tiktoken.encoding_for_model(model)
    limits = {"gpt-3.5-turbo": 16385, "gpt-4": 8192, "gpt-4o": 128000}
    total = (len(e.encode(sys_prompt)) + len(e.encode(user_msg))
             + (len(e.encode(ctx)) if ctx else 0) + 10)
    limit = limits.get(model, 4096)
    remaining = limit - total - max_resp
    print(f"{model}: {total} input, {remaining} remaining of {limit}")

token_budget_calculator("You are helpful.", "What is Python?")

print("\nScript completed successfully.")

Summary

tiktoken and HuggingFace Tokenizers solve related but different jobs. tiktoken is best for fast token counts in OpenAI apps. HuggingFace fits when you need the full chain for open-source models.

The speed gap is real — tiktoken runs 2-3x faster on single strings. But HuggingFace catches up with batch mode and gives you more tools for training and running models.

For multi-language apps, non-Latin text costs a lot more tokens. Keep that in mind when you plan your context budgets.

Practice Exercise:

Challenge: Build a multi-model token cost estimator

Write a function that takes text and a list of model names. For each, report the token count and estimated cost ($0.01 per 1K input tokens). Use tiktoken for OpenAI models and HuggingFace for names containing `/`.

# Your solution here
def estimate_costs(text, models):
    pass

# Test: estimate_costs("Explain ML simply.", ["gpt-4", "gpt-4o"])

**Solution:**

python

import tiktoken
from tokenizers import Tokenizer

def estimate_costs(text, models, cost_per_1k=0.01):
    for model in models:
        if "/" in model:
            tok = Tokenizer.from_pretrained(model)
            count = len(tok.encode(text).ids)
        else:
            enc = tiktoken.encoding_for_model(model)
            count = len(enc.encode(text))
        cost = (count / 1000) * cost_per_1k
        print(f"{model}: {count} tokens, ${cost:.6f}")

estimate_costs("Explain ML simply.", ["gpt-4", "gpt-4o"])

Frequently Asked Questions

How do I count tokens for chat messages with tiktoken?

Chat messages have extra overhead. Each one costs about 4 tokens on top of the text. Here’s a helper.

import tiktoken

def count_chat_tokens(messages, model="gpt-4"):
    enc = tiktoken.encoding_for_model(model)
    total = 0
    for msg in messages:
        total += 4  # role + formatting overhead
        total += len(enc.encode(msg["content"]))
    total += 2  # reply priming
    return total

messages = [
    {"role": "system", "content": "You are helpful."},
    {"role": "user", "content": "What is Python?"}
]
print(f"Total tokens: {count_chat_tokens(messages)}")

Output:

python

Total tokens: 17

Does tiktoken work offline?

tiktoken grabs its vocab files on first use. After that, they’re cached and it works offline. To be offline from the start, pre-load the files or add them to your Docker image.

Can I train a custom BPE tokenizer with tiktoken?

No. tiktoken only encodes and decodes. To train your own, use tokenizers.BpeTrainer from HuggingFace. You can then convert the result for faster use.

Is tiktoken’s `cl100k_base` the same as GPT-4’s tokenizer?

Yes. Both GPT-4 and GPT-3.5-Turbo use cl100k_base. GPT-4o uses the newer o200k_base with a 200K vocabulary. Use encoding_for_model() so you don’t need to remember these mappings.

What about SentencePiece tokenizers?

SentencePiece is a different tool, used by Llama and T5. HuggingFace works with it. tiktoken does not. If your model needs SentencePiece, use AutoTokenizer.from_pretrained() from transformers.

References

OpenAI — tiktoken: Fast BPE tokeniser for OpenAI models. GitHub
OpenAI — How to count tokens with tiktoken. Developer Cookbook
HuggingFace — Tokenizers: Fast state-of-the-art tokenizers. GitHub
HuggingFace — Summary of the tokenizers. Docs
HuggingFace — Tiktoken and interaction with Transformers. Docs
Sennrich, R., Haddow, B., Birch, A. — Neural Machine Translation of Rare Words with Subword Units. arXiv:1508.07909 (2016).
Gage, P. — A New Algorithm for Data Compression. The C Users Journal (1994).
OpenAI — tiktoken model.py: Model-to-encoding mapping. Source

Free Course

Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course →

Trusted by 50,000+ learners

Written by

Selva Prabhakaran →

Related Course

Master Gen AI — Hands-On

Join 5,000+ students at edu.machinelearningplus.com

Explore Course

tiktoken vs HuggingFace Tokenizers: Benchmark Guide

Prerequisites

What Are tiktoken and HuggingFace Tokenizers?

How Does BPE Tokenization Work?

Speed Benchmark — tiktoken vs HuggingFace Tokenizers

Why Is tiktoken Faster?

Vocabulary and Encoding Differences

How Does Non-English Text Affect Token Counts?

Encoding Edge Cases You Should Know

Special Characters and Unicode

The `allowed_special` Trap

Common Mistakes and How to Fix Them

Mistake 1: Hardcoding the Wrong Encoding

Mistake 2: Assuming Token Counts Are Model-Independent

Mistake 3: Looping Instead of Batching

Building a Token Budget Calculator

Migrating Between tiktoken and HuggingFace Tokenizers

tiktoken to HuggingFace Format

HuggingFace to tiktoken

When to Use Which — A Decision Framework

Complete Code

Summary

Frequently Asked Questions

How do I count tokens for chat messages with tiktoken?

Does tiktoken work offline?

Can I train a custom BPE tokenizer with tiktoken?

Is tiktoken’s `cl100k_base` the same as GPT-4’s tokenizer?

What about SentencePiece tokenizers?

References

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Prerequisites

What Are tiktoken and HuggingFace Tokenizers?

How Does BPE Tokenization Work?

Speed Benchmark — tiktoken vs HuggingFace Tokenizers

Why Is tiktoken Faster?

Vocabulary and Encoding Differences

How Does Non-English Text Affect Token Counts?

Encoding Edge Cases You Should Know

Special Characters and Unicode

The allowed_special Trap

Common Mistakes and How to Fix Them

Mistake 1: Hardcoding the Wrong Encoding

Mistake 2: Assuming Token Counts Are Model-Independent

Mistake 3: Looping Instead of Batching

Building a Token Budget Calculator

Migrating Between tiktoken and HuggingFace Tokenizers

tiktoken to HuggingFace Format

HuggingFace to tiktoken

When to Use Which — A Decision Framework

Complete Code

Summary

Frequently Asked Questions

How do I count tokens for chat messages with tiktoken?

Does tiktoken work offline?

Can I train a custom BPE tokenizer with tiktoken?

Is tiktoken’s cl100k_base the same as GPT-4’s tokenizer?

What about SentencePiece tokenizers?

References

Related Articles

Train a Custom Tokenizer with HuggingFace (Python)

LLM API Pricing Guide: Compare & Optimize Costs

LLM Context Windows Explained: Token Budget Guide

Python.SQL. NumPy. All free.

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

The `allowed_special` Trap

Is tiktoken’s `cl100k_base` the same as GPT-4’s tokenizer?

Python.
SQL. NumPy.
All free.