OpenAI Batch API: Process 10K Prompts at 50% Cost
Master the OpenAI Batch API in Python: build a reusable pipeline for 10,000+ prompts at 50% cost with JSONL formatting, progress polling, and error handling.
The OpenAI Batch API lets you send up to 50,000 prompts in a single file, pay 50% less per token, and get every result back within 24 hours. Here’s how to build a full processing pipeline with error handling and cost tracking.
You have 10,000 product descriptions to generate. You fire up a loop, call the OpenAI API one by one, and watch your bill climb. Each request costs full price. Rate limits slow you down. If your script crashes at request 7,432? You start over.
There’s a better way. OpenAI’s Batch API bundles all 10,000 requests into one JSONL file. You upload it. Get every response within 24 hours — at half the price. No rate limit headaches. No babysitting a loop.
In this tutorial, you’ll build a reusable BatchProcessor class. It handles creating JSONL files, uploading them, polling for completion, downloading results, and retrying errors. All with raw HTTP requests — no SDK needed.
What Is the OpenAI Batch API?
Picture this. You need to classify 5,000 support tickets using GPT-4o. Calling the API one at a time means 5,000 individual requests. Each waits for a response before the next fires. Slow, expensive, fragile.
The Batch API flips this. You write all 5,000 requests into one file. Upload it. OpenAI processes them on its own time — using spare capacity — and hands back a file with all 5,000 responses.
The tradeoff? No instant responses. Results arrive within 24 hours. But in practice, small-to-medium batches finish in 1-2 hours. And you pay 50% less on every token — input and output.
In short: The OpenAI Batch API is an asynchronous file-processing service. You upload a JSONL file of API requests, OpenAI processes them within 24 hours at a 50% discount, and you download a JSONL file of responses. It supports chat completions, embeddings, and completions endpoints.
Here’s the comparison at a glance:
| Feature | Synchronous API | Batch API |
|---|---|---|
| Cost | Full price | 50% discount |
| Speed | Instant | Up to 24 hours |
| Rate limits | Standard | Separate, higher pool |
| Max requests | One at a time | 50,000 per batch |
| Max file size | N/A | 200 MB |
How Does the Batch API Work? A 5-Step Pipeline
Before we touch code, here’s the data flow. I’ll keep it brief because we build each step right after.
Step 1 — Build the JSONL file. Each line is one API request. You tag it with a custom_id for tracking.
Step 2 — Upload the file. Send JSONL to OpenAI’s Files API. You get a file_id back.
Step 3 — Create the batch. POST to /batches with your file_id. OpenAI validates and starts processing.
Step 4 — Poll for status. Check periodically: validating -> in_progress -> completed.
Step 5 — Download results. Grab the output file. Each line has the custom_id matched to its response.
# The 5-step Batch API flow
steps = [
"1. Build JSONL -> batch_input.jsonl",
"2. Upload file -> file_id",
"3. Create batch -> batch_id",
"4. Poll status -> completed",
"5. Download results -> {custom_id: response}"
]
for step in steps:
print(step)
1. Build JSONL -> batch_input.jsonl
2. Upload file -> file_id
3. Create batch -> batch_id
4. Poll status -> completed
5. Download results -> {custom_id: response}
That’s the whole pattern. Every batch workflow follows these five steps. Let’s build each piece.
Setting Up for Batch API Requests
Prerequisites
- Python version: 3.9+
- Required libraries: None beyond the standard library (
json,time,datetime) - API key: An OpenAI API key (create one here)
- Time to complete: 25 minutes
import json
import time
from datetime import datetime
# Configuration
API_KEY = "sk-your-api-key-here" # Replace with your key
BASE_URL = "https://api.openai.com/v1"
print("Setup complete!")
print(f"Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
Setup complete!
Timestamp: 2026-03-17 10:30:00
Step 1: Building the Batch API JSONL File
This is where most beginners trip up. The JSONL format is strict. One bad line rejects the whole batch.
Each line needs four fields:
custom_id— Your tracking key. Use something descriptive likeproduct-SKU-1234.method— Always"POST".url— The endpoint path:"/v1/chat/completions".body— The exact JSON body you’d send to the sync API.
The function below converts a list of prompts into batch-ready JSONL. It wraps each prompt in the required structure and assigns sequential IDs:
def create_batch_jsonl(prompts, model="gpt-4o-mini",
temperature=0.7, max_tokens=500):
"""Convert a list of prompts into JSONL batch format."""
lines = []
for i, prompt in enumerate(prompts):
request = {
"custom_id": f"request-{i+1}",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": model,
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
"temperature": temperature,
"max_tokens": max_tokens
}
}
lines.append(json.dumps(request))
return "\n".join(lines)
Let’s try it with three ML prompts:
sample_prompts = [
"Summarize what gradient descent does in one sentence.",
"Explain the difference between L1 and L2 regularization.",
"What is the bias-variance tradeoff?"
]
jsonl_content = create_batch_jsonl(sample_prompts)
# Pretty-print the first request
first_line = json.loads(jsonl_content.split("\n")[0])
print("First request in the JSONL:")
print(json.dumps(first_line, indent=2))
print(f"\nTotal lines: {len(jsonl_content.split(chr(10)))}")
First request in the JSONL:
{
"custom_id": "request-1",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": "gpt-4o-mini",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Summarize what gradient descent does in one sentence."
}
],
"temperature": 0.7,
"max_tokens": 500
}
}
Total lines: 3
See the structure? The body field holds the exact payload you’d send to /v1/chat/completions synchronously. Converting existing code to batch format is a wrapping exercise.
Quick check: What happens if you forget the "method": "POST" field on one line? The whole batch fails validation. Every field is required on every line.
Step 2: Uploading the Batch File to OpenAI
With your JSONL ready, upload it to OpenAI’s Files API. You POST to /v1/files with the content and purpose: "batch". OpenAI hands back a file_id.
Here’s the upload function. The mock returns the exact response structure the real API produces:
import micropip
await micropip.install(["requests"])
MOCK_FILE_ID = "file-abc123def456"
def upload_batch_file(jsonl_content):
"""Upload JSONL to OpenAI Files API. Returns file response."""
# Production code (uncomment for real use):
# import requests
# resp = requests.post(f"{BASE_URL}/files",
# headers={"Authorization": f"Bearer {API_KEY}"},
# files={"file": ("batch.jsonl", jsonl_content, "application/jsonl")},
# data={"purpose": "batch"})
# return resp.json()
return {
"id": MOCK_FILE_ID,
"object": "file",
"bytes": len(jsonl_content.encode()),
"created_at": int(time.time()),
"filename": "batch_input.jsonl",
"purpose": "batch",
"status": "processed"
}
file_resp = upload_batch_file(jsonl_content)
print(f"File uploaded! ID: {file_resp['id']}")
print(f"Size: {file_resp['bytes']} bytes")
print(f"Status: {file_resp['status']}")
File uploaded! ID: file-abc123def456
Size: 687 bytes
Status: processed
A status of "processed" means OpenAI accepted the file. Malformed files show "error" instead.
Step 3: Creating a Batch API Job
Tell OpenAI to process your file. POST to /v1/batches with the input_file_id, the endpoint you’re targeting, and a completion_window.
One thing I want to flag. The completion_window only accepts "24h" right now. That’s not how long it takes — it’s a guarantee. If OpenAI can’t finish within 24 hours, it marks the batch as expired and refunds unprocessed requests. Most batches finish way sooner.
import micropip
await micropip.install(["requests"])
MOCK_BATCH_ID = "batch-789xyz"
def create_batch(file_id, endpoint="/v1/chat/completions",
description=""):
"""Create a batch job from an uploaded file."""
# Production code:
# resp = requests.post(f"{BASE_URL}/batches",
# headers={"Authorization": f"Bearer {API_KEY}",
# "Content-Type": "application/json"},
# json={"input_file_id": file_id,
# "endpoint": endpoint,
# "completion_window": "24h",
# "metadata": {"description": description}})
# return resp.json()
return {
"id": MOCK_BATCH_ID,
"object": "batch",
"endpoint": endpoint,
"input_file_id": file_id,
"completion_window": "24h",
"status": "validating",
"created_at": int(time.time()),
"request_counts": {"total": 3, "completed": 0, "failed": 0},
"metadata": {"description": description}
}
batch = create_batch(file_resp["id"], description="ML concept summaries")
print(f"Batch ID: {batch['id']}")
print(f"Status: {batch['status']}")
print(f"Requests: {batch['request_counts']['total']}")
Batch ID: batch-789xyz
Status: validating
Requests: 3
The initial status is "validating". OpenAI checks every JSONL line before it starts processing. The request_counts object tracks progress — watch the completed and failed fields.
Step 4: Polling Batch API Status with Backoff
After creating the batch, you poll its status. The batch moves through several states:
| Status | What It Means | Your Action |
|---|---|---|
validating |
Checking your JSONL | Wait |
in_progress |
Processing requests | Track progress |
finalizing |
Building output file | Almost done |
completed |
All done | Download results |
failed |
Validation error | Check errors |
expired |
Exceeded 24 hours | Get partial results |
cancelled |
You cancelled it | Get completed results |
I prefer exponential backoff for polling. Start at 10 seconds, double each time, cap at 2 minutes. You catch fast completions without hammering the API on long batches:
def poll_batch_status(batch_id, max_wait=86400, start_interval=10):
"""Poll batch status with exponential backoff."""
elapsed = 0
interval = start_interval
poll_count = 0
# Simulated status progression
statuses = ["validating", "in_progress", "in_progress", "completed"]
while elapsed < max_wait:
poll_count += 1
idx = min(poll_count - 1, len(statuses) - 1)
status = statuses[idx]
completed = {"validating": 0, "in_progress": 2,
"completed": 3}.get(status, 0)
batch_obj = {
"id": batch_id, "status": status,
"request_counts": {"total": 3, "completed": completed, "failed": 0},
"output_file_id": "file-output-999" if status == "completed" else None,
"error_file_id": None
}
pct = completed / 3 * 100
print(f" Poll #{poll_count} | {status} | {completed}/3 ({pct:.0f}%)")
if status in ("completed", "failed", "expired", "cancelled"):
return batch_obj
elapsed += interval
interval = min(interval * 2, 120)
return None
print("Polling batch status...")
final_batch = poll_batch_status(MOCK_BATCH_ID)
print(f"\nDone! Status: {final_batch['status']}")
Polling batch status...
Poll #1 | validating | 0/3 (0%)
Poll #2 | in_progress | 2/3 (67%)
Poll #3 | in_progress | 2/3 (67%)
Poll #4 | completed | 3/3 (100%)
Done! Status: completed
Why does backoff matter? Starting at 10s and doubling gives intervals of 10, 20, 40, 80, 120 (capped). For a 2-hour batch, that’s ~60 polls instead of 720 with a fixed 10-second wait.
Step 5: Downloading and Parsing Batch Results
The completed batch gives you an output_file_id. Download that file and you get JSONL back — one line per request. Each includes your custom_id matched to the full response.
This function fetches results and tallies token usage for cost tracking:
def download_results(batch_obj):
"""Download and parse batch output file."""
# Mock: realistic response structure
mock_lines = [
{"custom_id": "request-1", "response": {
"status_code": 200, "body": {
"choices": [{"message": {"content":
"Gradient descent adjusts model parameters step by step, "
"moving toward lower loss each time."
}, "finish_reason": "stop"}],
"usage": {"prompt_tokens": 25, "completion_tokens": 18,
"total_tokens": 43}
}}},
{"custom_id": "request-2", "response": {
"status_code": 200, "body": {
"choices": [{"message": {"content":
"L1 adds absolute weight penalties (encourages sparsity). "
"L2 adds squared weight penalties (encourages small values)."
}, "finish_reason": "stop"}],
"usage": {"prompt_tokens": 24, "completion_tokens": 25,
"total_tokens": 49}
}}},
{"custom_id": "request-3", "response": {
"status_code": 200, "body": {
"choices": [{"message": {"content":
"Simple models underfit (high bias). Complex models overfit "
"(high variance). The goal is the sweet spot between them."
}, "finish_reason": "stop"}],
"usage": {"prompt_tokens": 22, "completion_tokens": 28,
"total_tokens": 50}
}}}
]
results = {}
total_tokens = 0
for item in mock_lines:
cid = item["custom_id"]
body = item["response"]["body"]
results[cid] = {
"content": body["choices"][0]["message"]["content"],
"tokens": body["usage"]["total_tokens"],
"status": item["response"]["status_code"]
}
total_tokens += body["usage"]["total_tokens"]
return results, total_tokens
results, tokens_used = download_results(final_batch)
print(f"Results: {len(results)} | Tokens: {tokens_used}\n")
for cid, r in results.items():
print(f"[{cid}] {r['content'][:70]}...")
Results: 3 | Tokens: 142
[request-1] Gradient descent adjusts model parameters step by step, moving tow...
[request-2] L1 adds absolute weight penalties (encourages sparsity). L2 adds s...
[request-3] Simple models underfit (high bias). Complex models overfit (high va...
Each result maps to its custom_id. The tokens field tells you what each request consumed — essential for cost tracking.
Exercise 1: Build a Batch Sentiment Classifier
You’ve seen all five steps. Time to practice. Your task: create a JSONL batch that classifies product reviews as “positive”, “negative”, or “neutral”.
Requirements:
– Model: gpt-4o-mini
– Temperature: 0 (we need consistent labels)
– Max tokens: 10 (one-word response)
– System prompt: tell the model to reply with exactly one word
– Custom IDs: review-1, review-2, etc.
def create_sentiment_batch(reviews):
"""Create JSONL for batch sentiment classification."""
lines = []
for i, review in enumerate(reviews):
request = {
"custom_id": f"review-{i+1}",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
# YOUR CODE: model, messages, temperature, max_tokens
}
}
lines.append(json.dumps(request))
return "\n".join(lines)
# Test it
test_reviews = [
"This laptop is amazing! Best purchase all year.",
"Terrible quality. Broke after two days.",
"It's okay. Nothing special but gets the job done."
]
jsonl = create_sentiment_batch(test_reviews)
first = json.loads(jsonl.split("\n")[0])
print(f"Model: {first['body']['model']}")
print(f"Temperature: {first['body']['temperature']}")
print(f"Max tokens: {first['body']['max_tokens']}")
print(f"Custom ID: {first['custom_id']}")
print(f"Total requests: {len(jsonl.split(chr(10)))}")
Expected output:
Model: gpt-4o-mini
Temperature: 0
Max tokens: 10
Custom ID: review-1
Total requests: 3
Handling Batch API Errors: Two Levels to Watch
Real batches don’t always go perfectly. I’ve seen production batches with 10,000 requests where 30-50 fail for various reasons. You need to handle both levels.
Batch-level errors kill the whole job. Causes: malformed JSONL, wrong endpoint, invalid model name. The status goes to "failed".
Request-level errors are sneakier. The batch completes, but some lines have error responses instead of completions. You find these in the error file — or by checking status codes in the output.
This parser separates successes from failures and gives you a clear report:
def parse_results_with_errors():
"""Parse batch output, separating successes from failures."""
mock_results = [
{"custom_id": "req-001", "response": {
"status_code": 200, "body": {
"choices": [{"message": {"content": "Response 1"}}],
"usage": {"total_tokens": 45}}}},
{"custom_id": "req-002", "response": {
"status_code": 200, "body": {
"choices": [{"message": {"content": "Response 2"}}],
"usage": {"total_tokens": 52}}}},
{"custom_id": "req-003", "response": {
"status_code": 400, "body": {
"error": {"message": "Invalid 'messages': expected at least 1 message.",
"type": "invalid_request_error"}}}}
]
successes, failures = {}, {}
for item in mock_results:
cid = item["custom_id"]
code = item["response"]["status_code"]
if code == 200:
body = item["response"]["body"]
successes[cid] = body["choices"][0]["message"]["content"]
else:
err = item["response"]["body"]["error"]
failures[cid] = f"{err['type']}: {err['message']}"
print(f"Successes: {len(successes)} | Failures: {len(failures)}")
if failures:
print("\nFailed requests:")
for cid, msg in failures.items():
print(f" [{cid}] {msg}")
return successes, failures
ok, failed = parse_results_with_errors()
Successes: 2 | Failures: 1
Failed requests:
[req-003] invalid_request_error: Invalid 'messages': expected at least 1 message.
Exercise 2: Build a Batch Retry Function
After a batch completes, some requests may have failed. Write a function that creates a retry JSONL from just the failed requests.
def create_retry_batch(original_prompts, failures):
"""Create a retry JSONL from failed request IDs.
Args:
original_prompts: Dict of custom_id -> prompt text
failures: Dict of custom_id -> error info
Returns:
JSONL string with only failed requests
"""
# YOUR CODE HERE
pass
# Test data
originals = {
"req-00001": "Explain overfitting.",
"req-00002": "What is bagging?",
"req-00003": "Define boosting.",
"req-00004": "Explain dropout.",
"req-00005": "What is batch normalization?"
}
failed_reqs = {
"req-00002": {"error": "rate_limit_exceeded"},
"req-00004": {"error": "server_error"}
}
retry = create_retry_batch(originals, failed_reqs)
lines = retry.strip().split("\n")
print(f"Retry batch size: {len(lines)}")
for line in lines:
parsed = json.loads(line)
print(f" Retrying: {parsed['custom_id']}")
Expected output:
Retry batch size: 2
Retrying: req-00002
Retrying: req-00004
Building a Reusable BatchProcessor Class
We’ve built each step as a standalone function. In production, you want one class that ties everything together. I’ll walk you through a BatchProcessor with three main methods: prepare(), submit(), and wait_and_download().
The constructor sets up tracking variables for stats and cost estimation:
class BatchProcessor:
"""Reusable pipeline for OpenAI Batch API.
Usage:
bp = BatchProcessor(api_key="sk-...")
bp.prepare(prompts, model="gpt-4o-mini")
bp.submit(description="Product descriptions")
results = bp.wait_and_download()
bp.summary()
"""
def __init__(self, api_key="sk-demo"):
self.api_key = api_key
self.jsonl_content = None
self.file_id = None
self.batch_id = None
self.results = None
self.stats = {"total": 0, "completed": 0, "failed": 0,
"tokens": 0, "start": None, "end": None}
The prepare() method builds JSONL from your prompts. It mirrors our standalone function:
def prepare(self, prompts, model="gpt-4o-mini",
system_prompt="You are a helpful assistant.",
temperature=0.7, max_tokens=500):
"""Build JSONL from a list of prompts."""
lines = []
for i, prompt in enumerate(prompts):
req = {
"custom_id": f"req-{i+1:05d}",
"method": "POST",
"url": "/v1/chat/completions",
"body": {
"model": model,
"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": prompt}
],
"temperature": temperature,
"max_tokens": max_tokens
}
}
lines.append(json.dumps(req))
self.jsonl_content = "\n".join(lines)
self.stats["total"] = len(prompts)
print(f"Prepared {len(prompts)} requests ({len(self.jsonl_content):,} bytes)")
return self
The submit() and wait_and_download() methods handle upload, batch creation, polling, and result retrieval:
def submit(self, description=""):
"""Upload file and create batch."""
if not self.jsonl_content:
raise ValueError("Call prepare() first")
self.stats["start"] = datetime.now()
self.file_id = "file-batch-001" # Mock
self.batch_id = "batch-proc-001" # Mock
print(f"Uploaded: {self.file_id} | Batch: {self.batch_id}")
return self
def wait_and_download(self):
"""Poll for completion and download results."""
if not self.batch_id:
raise ValueError("Call submit() first")
print(f"Polling {self.batch_id}...")
for status in ["validating", "in_progress", "completed"]:
print(f" -> {status}")
self.stats["end"] = datetime.now()
self.results = {}
for i in range(self.stats["total"]):
cid = f"req-{i+1:05d}"
self.results[cid] = {
"content": f"Response for request {i+1}",
"tokens": 45 + i * 3, "status": 200
}
self.stats["completed"] = len(self.results)
self.stats["tokens"] = sum(r["tokens"] for r in self.results.values())
print(f"Done! {len(self.results)} results.")
return self.results
The summary() method calculates the 50% savings so you can see exactly what the batch saved:
def summary(self):
"""Print processing summary with cost estimate."""
t = self.stats["tokens"]
# gpt-4o-mini: ~$0.375/1M tokens (blended input+output)
sync_cost = (t / 1_000_000) * 0.375
batch_cost = sync_cost * 0.5
print("\n" + "=" * 45)
print("BATCH PROCESSING SUMMARY")
print("=" * 45)
print(f"Requests: {self.stats['total']}")
print(f"Completed: {self.stats['completed']}")
print(f"Failed: {self.stats['failed']}")
print(f"Tokens: {t:,}")
print(f"Sync cost: ${sync_cost:.4f}")
print(f"Batch cost: ${batch_cost:.4f}")
print(f"You saved: ${sync_cost - batch_cost:.4f} (50%)")
print("=" * 45)
Here’s the full pipeline in action:
bp = BatchProcessor()
prompts = [
"Explain overfitting in one sentence.",
"What does a confusion matrix show?",
"Define precision vs recall.",
"What is cross-validation?",
"Explain the purpose of a learning rate."
]
bp.prepare(prompts)
bp.submit(description="ML concept quiz")
bp.wait_and_download()
bp.summary()
Prepared 5 requests (2,345 bytes)
Uploaded: file-batch-001 | Batch: batch-proc-001
Polling batch-proc-001...
-> validating
-> in_progress
-> completed
Done! 5 results.
=============================================
BATCH PROCESSING SUMMARY
=============================================
Requests: 5
Completed: 5
Failed: 0
Tokens: 255
Sync cost: $0.0001
Batch cost: $0.0000
You saved: $0.0000 (50%)
=============================================
Small batch, tiny savings. But scale it up and the numbers change fast.
Batch API Cost Savings: The Math at Scale
At small volumes, the 50% discount barely registers. At 10,000+ prompts, it adds up.
Here’s the breakdown for gpt-4o-mini. I’m assuming 100 input tokens and 150 output tokens per request — typical for classification or short generation:
def cost_table(counts, input_tok=100, output_tok=150):
"""Show sync vs batch costs at different scales."""
# gpt-4o-mini pricing (March 2026)
inp_rate = 0.15 # $/1M input tokens
out_rate = 0.60 # $/1M output tokens
print(f"{'Prompts':>10} {'Sync':>10} {'Batch':>10} {'Saved':>10}")
print("-" * 44)
for n in counts:
inp_cost = (n * input_tok / 1e6) * inp_rate
out_cost = (n * output_tok / 1e6) * out_rate
sync = inp_cost + out_cost
batch = sync * 0.5
print(f"{n:>10,} \({sync:>9.4f} \){batch:>9.4f} ${sync-batch:>9.4f}")
cost_table([100, 1000, 5000, 10000, 50000])
Prompts Sync Batch Saved
--------------------------------------------
100 $ 0.0105 $ 0.0053 $ 0.0053
1,000 $ 0.1050 $ 0.0525 $ 0.0525
5,000 $ 0.5250 $ 0.2625 $ 0.2625
10,000 $ 1.0500 $ 0.5250 $ 0.5250
50,000 $ 5.2500 $ 2.6250 $ 2.6250
With gpt-4o-mini, 10,000 prompts saves about $0.53. Modest. But switch to gpt-4o and the savings jump:
def gpt4o_cost(n=10000, input_tok=100, output_tok=150):
"""Cost comparison for gpt-4o at scale."""
# gpt-4o: $2.50/1M input, $10.00/1M output
inp = (n * input_tok / 1e6) * 2.50
out = (n * output_tok / 1e6) * 10.00
sync = inp + out
batch = sync * 0.5
print(f"gpt-4o with {n:,} prompts:")
print(f" Sync cost: ${sync:.2f}")
print(f" Batch cost: ${batch:.2f}")
print(f" You save: ${sync - batch:.2f}")
gpt4o_cost(10000)
gpt-4o with 10,000 prompts:
Sync cost: $17.50
Batch cost: $8.75
You save: $8.75
$8.75 saved on a single batch run. Run that weekly and you save $455 per year.
Handling Large Batches: Chunking Strategy
What if you have 200,000 prompts? A single batch has a 50,000-request cap and a 200 MB file limit. You need to split them into chunks.
Here’s a simple chunking approach:
def chunk_prompts(prompts, chunk_size=50000):
"""Split a large prompt list into batch-sized chunks."""
chunks = []
for i in range(0, len(prompts), chunk_size):
chunk = prompts[i:i + chunk_size]
chunks.append(chunk)
print(f"Split {len(prompts):,} prompts into {len(chunks)} batches")
return chunks
# Demo: 120,000 prompts -> 3 batches
big_list = [f"Prompt {i}" for i in range(120000)]
batches = chunk_prompts(big_list)
for i, batch in enumerate(batches):
print(f" Batch {i+1}: {len(batch):,} prompts")
Split 120,000 prompts into 3 batches
Batch 1: 50,000 prompts
Batch 2: 50,000 prompts
Batch 3: 20,000 prompts
Submit each chunk as a separate batch. Merge the results using custom_id. I recommend adding the batch number to your IDs (batch1-req-00001) so you can trace any issues.
When NOT to Use the Batch API
I want to be straight with you. The Batch API isn’t always the right call.
Real-time apps? Definitely not. Chatbots, autocomplete, interactive tools — anything where a user is waiting needs the sync API.
Fewer than 100 requests? Probably not worth it. The overhead of creating a file, uploading, polling, and downloading outweighs the savings.
Conversational chains? Can’t batch these. When request N depends on the response from request N-1, you need sync calls.
Time-sensitive data? If your prompts reference live data that changes within hours, responses might be stale when they arrive.
| Use Case | Batch? | Why |
|---|---|---|
| 10K product descriptions | Yes | Bulk work, no time pressure |
| 5K ticket classifications | Yes | Offline analysis |
| Chatbot responses | No | Must be instant |
| 500 document summaries | Yes | Offline job |
| Real-time moderation | No | Latency-critical |
| 2K essay grades | Yes | Not time-sensitive |
Common Batch API Mistakes and How to Fix Them
Mistake 1: Writing JSON arrays instead of JSONL
The number one failure. JSONL means one JSON object per line. Not a JSON array. If you use json.dumps(list_of_dicts), you get an array that OpenAI rejects.
requests_list = [{"custom_id": "r1"}, {"custom_id": "r2"}]
bad = json.dumps(requests_list)
print("WRONG (JSON array):")
print(bad)
good = "\n".join(json.dumps(r) for r in requests_list)
print("\nCORRECT (JSONL):")
print(good)
WRONG (JSON array):
[{"custom_id": "r1"}, {"custom_id": "r2"}]
CORRECT (JSONL):
{"custom_id": "r1"}
{"custom_id": "r2"}
Mistake 2: Mismatched endpoint URLs
The url field inside each JSONL line must match the endpoint you set when creating the batch. Using the legacy /v1/completions with a chat model? Silent validation failure.
wrong = {"url": "/v1/completions"} # Legacy endpoint
right = {"url": "/v1/chat/completions"} # Chat models
print(f"Wrong: {wrong['url']}")
print(f"Right: {right['url']}")
Wrong: /v1/completions
Right: /v1/chat/completions
Mistake 3: Ignoring the error file after completion
A "completed" status means the batch finished — not that every request succeeded. With 10,000 requests, 50 might have failed. If you skip the error file, you silently lose those results.
def check_for_errors(batch_status):
"""Always check request_counts.failed after completion."""
counts = batch_status.get("request_counts", {})
failed = counts.get("failed", 0)
total = counts.get("total", 0)
if failed > 0:
pct = failed / total * 100
print(f"WARNING: {failed}/{total} failed ({pct:.1f}%)")
print("Download the error file!")
else:
print(f"All {total} requests succeeded.")
check_for_errors({"request_counts": {"total": 10000, "completed": 9950, "failed": 50}})
WARNING: 50/10000 failed (0.5%)
Download the error file!
Batch API Error Troubleshooting
Error: Invalid 'messages' field
{"error": {"message": "Invalid 'messages': expected array of message objects"}}
Fix: Each message needs role and content keys. Make sure messages is a list of dicts.
Error: Unrecognized request argument: prompt
{"error": {"message": "Unrecognized request argument supplied: prompt"}}
Fix: Chat models use messages, not prompt. Update your JSONL body.
Error: Batch file validation failed: line N is not valid JSON
{"error": {"message": "Batch file validation failed: line 47 is not valid JSON"}}
Fix: Line 47 has a syntax error. Validate each line before uploading:
def validate_jsonl(content):
"""Check every line is valid JSON before uploading."""
errors = []
for i, line in enumerate(content.strip().split("\n"), 1):
try:
json.loads(line)
except json.JSONDecodeError as e:
errors.append(f"Line {i}: {e}")
if errors:
print(f"Found {len(errors)} invalid lines:")
for err in errors:
print(f" {err}")
else:
print("All lines valid!")
return len(errors) == 0
validate_jsonl(jsonl_content)
All lines valid!
Complete Code
Summary
You now have a full toolkit for the OpenAI Batch API:
- JSONL format — Each line is a self-contained request with
custom_id,method,url, andbody. - 5-step workflow — Build JSONL, upload, create batch, poll, download.
- BatchProcessor class — Reusable pipeline with progress tracking and cost estimates.
- Error handling — Batch-level and request-level errors, plus retry logic.
- Cost savings — 50% on all tokens. Biggest impact at 10K+ requests with
gpt-4o.
The pattern fits any non-real-time use case: bulk generation, data labeling, classification, summarization, translation, evaluation.
Practice challenge: Extend the BatchProcessor with auto-retry. When wait_and_download() finds failed requests, it creates a retry batch automatically and merges the results.
Frequently Asked Questions
Can I cancel a running batch?
Yes. POST to /v1/batches/{batch_id}/cancel. Status changes to "cancelling", then "cancelled". Completed results stay downloadable. You’re charged only for what finished.
Which models support the Batch API?
All GPT-4o variants, GPT-4o-mini, GPT-4-turbo, and GPT-3.5-turbo. Embeddings and completions endpoints work too. The 50% discount applies across all supported models. Check OpenAI’s pricing page for the current list.
Does the Batch API work for embeddings?
Yes. Set the endpoint to /v1/embeddings. Each line’s body needs "model" and "input" fields instead of "messages". Same 50% discount.
What happens when a batch expires?
OpenAI marks it "expired". Any results that finished are still downloadable. Unprocessed requests aren’t charged. Submit a new batch for the remainder.
Is there a limit on how many batches I can run at once?
No hard cap on batch count, but there’s a limit on total enqueued requests. The cap varies by account tier. Hit the limit and you get a 429 response. Check your limits at platform.openai.com.
References
- OpenAI Batch API Documentation — Official guide. Link
- OpenAI API Pricing — Model pricing with batch discounts. Link
- OpenAI Files API Reference — File upload endpoints. Link
- OpenAI Batch API FAQ — Common questions. Link
- JSON Lines Format Specification. Link
- OpenAI Rate Limits Guide. Link
- OpenAI Cost Optimization Guide. Link
Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.
Start Free Course →