OpenAI Batch API: Process 10K Prompts at 50% Cost

Master the OpenAI Batch API in Python: build a reusable pipeline for 10,000+ prompts at 50% cost with JSONL formatting, progress polling, and error handling.

Written by Selva Prabhakaran | 29 min read

Interactive Code Blocks — The Python code blocks in this article are runnable. Click the Run button to execute them right in your browser.

The OpenAI Batch API lets you send up to 50,000 prompts in a single file, pay 50% less per token, and get every result back within 24 hours. Here’s how to build a full processing pipeline with error handling and cost tracking.

You have 10,000 product descriptions to generate. You fire up a loop, call the OpenAI API one by one, and watch your bill climb. Each request costs full price. Rate limits slow you down. If your script crashes at request 7,432? You start over.

There’s a better way. OpenAI’s Batch API bundles all 10,000 requests into one JSONL file. You upload it. Get every response within 24 hours — at half the price. No rate limit headaches. No babysitting a loop.

In this tutorial, you’ll build a reusable BatchProcessor class. It handles creating JSONL files, uploading them, polling for completion, downloading results, and retrying errors. All with raw HTTP requests — no SDK needed.

What Is the OpenAI Batch API?

Picture this. You need to classify 5,000 support tickets using GPT-4o. Calling the API one at a time means 5,000 individual requests. Each waits for a response before the next fires. Slow, expensive, fragile.

The Batch API flips this. You write all 5,000 requests into one file. Upload it. OpenAI processes them on its own time — using spare capacity — and hands back a file with all 5,000 responses.

The tradeoff? No instant responses. Results arrive within 24 hours. But in practice, small-to-medium batches finish in 1-2 hours. And you pay 50% less on every token — input and output.

In short: The OpenAI Batch API is an asynchronous file-processing service. You upload a JSONL file of API requests, OpenAI processes them within 24 hours at a 50% discount, and you download a JSONL file of responses. It supports chat completions, embeddings, and completions endpoints.

Key Insight: The Batch API trades latency for cost. If you don’t need instant responses — data labeling, bulk generation, mass classification — you save serious money with zero rate-limit headaches.

Here’s the comparison at a glance:

Feature	Synchronous API	Batch API
Cost	Full price	50% discount
Speed	Instant	Up to 24 hours
Rate limits	Standard	Separate, higher pool
Max requests	One at a time	50,000 per batch
Max file size	N/A	200 MB

How Does the Batch API Work? A 5-Step Pipeline

Before we touch code, here’s the data flow. I’ll keep it brief because we build each step right after.

Step 1 — Build the JSONL file. Each line is one API request. You tag it with a custom_id for tracking.

Step 2 — Upload the file. Send JSONL to OpenAI’s Files API. You get a file_id back.

Step 3 — Create the batch. POST to /batches with your file_id. OpenAI validates and starts processing.

Step 4 — Poll for status. Check periodically: validating -> in_progress -> completed.

Step 5 — Download results. Grab the output file. Each line has the custom_id matched to its response.

# The 5-step Batch API flow
steps = [
    "1. Build JSONL   -> batch_input.jsonl",
    "2. Upload file    -> file_id",
    "3. Create batch   -> batch_id",
    "4. Poll status    -> completed",
    "5. Download results -> {custom_id: response}"
]
for step in steps:
    print(step)

python

1. Build JSONL   -> batch_input.jsonl
2. Upload file    -> file_id
3. Create batch   -> batch_id
4. Poll status    -> completed
5. Download results -> {custom_id: response}

That’s the whole pattern. Every batch workflow follows these five steps. Let’s build each piece.

Setting Up for Batch API Requests

Prerequisites

Python version: 3.9+
Required libraries: None beyond the standard library (json, time, datetime)
API key: An OpenAI API key (create one here)
Time to complete: 25 minutes

Note: About Pyodide and API calls. This tutorial uses raw HTTP requests to the OpenAI API. Since Pyodide (the in-browser Python runtime) can’t make external HTTP calls, all code blocks below use mock responses that mirror real API structures exactly. The code patterns are identical to what you’d run locally — swap the mocks for real HTTP calls in production. The Complete Code section at the end has the production version.

import json
import time
from datetime import datetime

# Configuration
API_KEY = "sk-your-api-key-here"  # Replace with your key
BASE_URL = "https://api.openai.com/v1"

print("Setup complete!")
print(f"Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

python

Setup complete!
Timestamp: 2026-03-17 10:30:00

Step 1: Building the Batch API JSONL File

This is where most beginners trip up. The JSONL format is strict. One bad line rejects the whole batch.

Each line needs four fields:

custom_id — Your tracking key. Use something descriptive like product-SKU-1234.
method — Always "POST".
url — The endpoint path: "/v1/chat/completions".
body — The exact JSON body you’d send to the sync API.

The function below converts a list of prompts into batch-ready JSONL. It wraps each prompt in the required structure and assigns sequential IDs:

def create_batch_jsonl(prompts, model="gpt-4o-mini",
                       temperature=0.7, max_tokens=500):
    """Convert a list of prompts into JSONL batch format."""
    lines = []
    for i, prompt in enumerate(prompts):
        request = {
            "custom_id": f"request-{i+1}",
            "method": "POST",
            "url": "/v1/chat/completions",
            "body": {
                "model": model,
                "messages": [
                    {"role": "system", "content": "You are a helpful assistant."},
                    {"role": "user", "content": prompt}
                ],
                "temperature": temperature,
                "max_tokens": max_tokens
            }
        }
        lines.append(json.dumps(request))
    return "\n".join(lines)

Let’s try it with three ML prompts:

sample_prompts = [
    "Summarize what gradient descent does in one sentence.",
    "Explain the difference between L1 and L2 regularization.",
    "What is the bias-variance tradeoff?"
]

jsonl_content = create_batch_jsonl(sample_prompts)

# Pretty-print the first request
first_line = json.loads(jsonl_content.split("\n")[0])
print("First request in the JSONL:")
print(json.dumps(first_line, indent=2))
print(f"\nTotal lines: {len(jsonl_content.split(chr(10)))}")

python

First request in the JSONL:
{
  "custom_id": "request-1",
  "method": "POST",
  "url": "/v1/chat/completions",
  "body": {
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "Summarize what gradient descent does in one sentence."
      }
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }
}

Total lines: 3

See the structure? The body field holds the exact payload you’d send to /v1/chat/completions synchronously. Converting existing code to batch format is a wrapping exercise.

Tip: Use descriptive `custom_id` values. Instead of `request-1`, try `product-desc-SKU-1234` or `ticket-T5021`. With 10,000 results, meaningful IDs save you from a separate lookup table.

Quick check: What happens if you forget the "method": "POST" field on one line? The whole batch fails validation. Every field is required on every line.

Step 2: Uploading the Batch File to OpenAI

With your JSONL ready, upload it to OpenAI’s Files API. You POST to /v1/files with the content and purpose: "batch". OpenAI hands back a file_id.

Here’s the upload function. The mock returns the exact response structure the real API produces:

import micropip
await micropip.install(["requests"])

MOCK_FILE_ID = "file-abc123def456"

def upload_batch_file(jsonl_content):
    """Upload JSONL to OpenAI Files API. Returns file response."""

    # Production code (uncomment for real use):
    # import requests
    # resp = requests.post(f"{BASE_URL}/files",
    #     headers={"Authorization": f"Bearer {API_KEY}"},
    #     files={"file": ("batch.jsonl", jsonl_content, "application/jsonl")},
    #     data={"purpose": "batch"})
    # return resp.json()

    return {
        "id": MOCK_FILE_ID,
        "object": "file",
        "bytes": len(jsonl_content.encode()),
        "created_at": int(time.time()),
        "filename": "batch_input.jsonl",
        "purpose": "batch",
        "status": "processed"
    }

file_resp = upload_batch_file(jsonl_content)
print(f"File uploaded! ID: {file_resp['id']}")
print(f"Size: {file_resp['bytes']} bytes")
print(f"Status: {file_resp['status']}")

python

File uploaded! ID: file-abc123def456
Size: 687 bytes
Status: processed

A status of "processed" means OpenAI accepted the file. Malformed files show "error" instead.

Step 3: Creating a Batch API Job

Tell OpenAI to process your file. POST to /v1/batches with the input_file_id, the endpoint you’re targeting, and a completion_window.

One thing I want to flag. The completion_window only accepts "24h" right now. That’s not how long it takes — it’s a guarantee. If OpenAI can’t finish within 24 hours, it marks the batch as expired and refunds unprocessed requests. Most batches finish way sooner.

import micropip
await micropip.install(["requests"])

MOCK_BATCH_ID = "batch-789xyz"

def create_batch(file_id, endpoint="/v1/chat/completions",
                 description=""):
    """Create a batch job from an uploaded file."""

    # Production code:
    # resp = requests.post(f"{BASE_URL}/batches",
    #     headers={"Authorization": f"Bearer {API_KEY}",
    #              "Content-Type": "application/json"},
    #     json={"input_file_id": file_id,
    #           "endpoint": endpoint,
    #           "completion_window": "24h",
    #           "metadata": {"description": description}})
    # return resp.json()

    return {
        "id": MOCK_BATCH_ID,
        "object": "batch",
        "endpoint": endpoint,
        "input_file_id": file_id,
        "completion_window": "24h",
        "status": "validating",
        "created_at": int(time.time()),
        "request_counts": {"total": 3, "completed": 0, "failed": 0},
        "metadata": {"description": description}
    }

batch = create_batch(file_resp["id"], description="ML concept summaries")
print(f"Batch ID: {batch['id']}")
print(f"Status: {batch['status']}")
print(f"Requests: {batch['request_counts']['total']}")

python

Batch ID: batch-789xyz
Status: validating
Requests: 3

The initial status is "validating". OpenAI checks every JSONL line before it starts processing. The request_counts object tracks progress — watch the completed and failed fields.

Warning: The `completion_window` must be `”24h”`. Any other value fails the request. OpenAI might add shorter windows eventually, but for now it’s the only option.

Step 4: Polling Batch API Status with Backoff

After creating the batch, you poll its status. The batch moves through several states:

Status	What It Means	Your Action
`validating`	Checking your JSONL	Wait
`in_progress`	Processing requests	Track progress
`finalizing`	Building output file	Almost done
`completed`	All done	Download results
`failed`	Validation error	Check errors
`expired`	Exceeded 24 hours	Get partial results
`cancelled`	You cancelled it	Get completed results

I prefer exponential backoff for polling. Start at 10 seconds, double each time, cap at 2 minutes. You catch fast completions without hammering the API on long batches:

def poll_batch_status(batch_id, max_wait=86400, start_interval=10):
    """Poll batch status with exponential backoff."""
    elapsed = 0
    interval = start_interval
    poll_count = 0

    # Simulated status progression
    statuses = ["validating", "in_progress", "in_progress", "completed"]

    while elapsed < max_wait:
        poll_count += 1
        idx = min(poll_count - 1, len(statuses) - 1)
        status = statuses[idx]

        completed = {"validating": 0, "in_progress": 2,
                     "completed": 3}.get(status, 0)

        batch_obj = {
            "id": batch_id, "status": status,
            "request_counts": {"total": 3, "completed": completed, "failed": 0},
            "output_file_id": "file-output-999" if status == "completed" else None,
            "error_file_id": None
        }

        pct = completed / 3 * 100
        print(f"  Poll #{poll_count} | {status} | {completed}/3 ({pct:.0f}%)")

        if status in ("completed", "failed", "expired", "cancelled"):
            return batch_obj

        elapsed += interval
        interval = min(interval * 2, 120)

    return None

print("Polling batch status...")
final_batch = poll_batch_status(MOCK_BATCH_ID)
print(f"\nDone! Status: {final_batch['status']}")

python

Polling batch status...
  Poll #1 | validating | 0/3 (0%)
  Poll #2 | in_progress | 2/3 (67%)
  Poll #3 | in_progress | 2/3 (67%)
  Poll #4 | completed | 3/3 (100%)

Done! Status: completed

Why does backoff matter? Starting at 10s and doubling gives intervals of 10, 20, 40, 80, 120 (capped). For a 2-hour batch, that’s ~60 polls instead of 720 with a fixed 10-second wait.

Step 5: Downloading and Parsing Batch Results

The completed batch gives you an output_file_id. Download that file and you get JSONL back — one line per request. Each includes your custom_id matched to the full response.

This function fetches results and tallies token usage for cost tracking:

def download_results(batch_obj):
    """Download and parse batch output file."""

    # Mock: realistic response structure
    mock_lines = [
        {"custom_id": "request-1", "response": {
            "status_code": 200, "body": {
                "choices": [{"message": {"content":
                    "Gradient descent adjusts model parameters step by step, "
                    "moving toward lower loss each time."
                }, "finish_reason": "stop"}],
                "usage": {"prompt_tokens": 25, "completion_tokens": 18,
                          "total_tokens": 43}
            }}},
        {"custom_id": "request-2", "response": {
            "status_code": 200, "body": {
                "choices": [{"message": {"content":
                    "L1 adds absolute weight penalties (encourages sparsity). "
                    "L2 adds squared weight penalties (encourages small values)."
                }, "finish_reason": "stop"}],
                "usage": {"prompt_tokens": 24, "completion_tokens": 25,
                          "total_tokens": 49}
            }}},
        {"custom_id": "request-3", "response": {
            "status_code": 200, "body": {
                "choices": [{"message": {"content":
                    "Simple models underfit (high bias). Complex models overfit "
                    "(high variance). The goal is the sweet spot between them."
                }, "finish_reason": "stop"}],
                "usage": {"prompt_tokens": 22, "completion_tokens": 28,
                          "total_tokens": 50}
            }}}
    ]

    results = {}
    total_tokens = 0

    for item in mock_lines:
        cid = item["custom_id"]
        body = item["response"]["body"]
        results[cid] = {
            "content": body["choices"][0]["message"]["content"],
            "tokens": body["usage"]["total_tokens"],
            "status": item["response"]["status_code"]
        }
        total_tokens += body["usage"]["total_tokens"]

    return results, total_tokens

results, tokens_used = download_results(final_batch)
print(f"Results: {len(results)} | Tokens: {tokens_used}\n")
for cid, r in results.items():
    print(f"[{cid}] {r['content'][:70]}...")

python

Results: 3 | Tokens: 142

[request-1] Gradient descent adjusts model parameters step by step, moving tow...
[request-2] L1 adds absolute weight penalties (encourages sparsity). L2 adds s...
[request-3] Simple models underfit (high bias). Complex models overfit (high va...

Each result maps to its custom_id. The tokens field tells you what each request consumed — essential for cost tracking.

Exercise 1: Build a Batch Sentiment Classifier

You’ve seen all five steps. Time to practice. Your task: create a JSONL batch that classifies product reviews as “positive”, “negative”, or “neutral”.

Try It Yourself

Requirements:
– Model: gpt-4o-mini
– Temperature: 0 (we need consistent labels)
– Max tokens: 10 (one-word response)
– System prompt: tell the model to reply with exactly one word
– Custom IDs: review-1, review-2, etc.

def create_sentiment_batch(reviews):
    """Create JSONL for batch sentiment classification."""
    lines = []
    for i, review in enumerate(reviews):
        request = {
            "custom_id": f"review-{i+1}",
            "method": "POST",
            "url": "/v1/chat/completions",
            "body": {
                # YOUR CODE: model, messages, temperature, max_tokens
            }
        }
        lines.append(json.dumps(request))
    return "\n".join(lines)

# Test it
test_reviews = [
    "This laptop is amazing! Best purchase all year.",
    "Terrible quality. Broke after two days.",
    "It's okay. Nothing special but gets the job done."
]

jsonl = create_sentiment_batch(test_reviews)
first = json.loads(jsonl.split("\n")[0])
print(f"Model: {first['body']['model']}")
print(f"Temperature: {first['body']['temperature']}")
print(f"Max tokens: {first['body']['max_tokens']}")
print(f"Custom ID: {first['custom_id']}")
print(f"Total requests: {len(jsonl.split(chr(10)))}")

Expected output:

python

Model: gpt-4o-mini
Temperature: 0
Max tokens: 10
Custom ID: review-1
Total requests: 3

Hint 1

The system prompt should say exactly what you want: “Classify the following product review as exactly one word: positive, negative, or neutral.”

Hint 2 (nearly the answer)

"body": {
    "model": "gpt-4o-mini",
    "messages": [
        {"role": "system", "content": "Classify the following product review as exactly one word: positive, negative, or neutral."},
        {"role": "user", "content": review}
    ],
    "temperature": 0,
    "max_tokens": 10
}

Solution

def create_sentiment_batch(reviews):
    lines = []
    for i, review in enumerate(reviews):
        request = {
            "custom_id": f"review-{i+1}",
            "method": "POST",
            "url": "/v1/chat/completions",
            "body": {
                "model": "gpt-4o-mini",
                "messages": [
                    {"role": "system", "content": "Classify the following product review as exactly one word: positive, negative, or neutral."},
                    {"role": "user", "content": review}
                ],
                "temperature": 0,
                "max_tokens": 10
            }
        }
        lines.append(json.dumps(request))
    return "\n".join(lines)

**Why `temperature=0`?** Classification needs consistency. You want the same review to always get the same label. Any randomness means the same input could get different outputs on different runs.

Handling Batch API Errors: Two Levels to Watch

Real batches don’t always go perfectly. I’ve seen production batches with 10,000 requests where 30-50 fail for various reasons. You need to handle both levels.

Batch-level errors kill the whole job. Causes: malformed JSONL, wrong endpoint, invalid model name. The status goes to "failed".

Request-level errors are sneakier. The batch completes, but some lines have error responses instead of completions. You find these in the error file — or by checking status codes in the output.

This parser separates successes from failures and gives you a clear report:

def parse_results_with_errors():
    """Parse batch output, separating successes from failures."""

    mock_results = [
        {"custom_id": "req-001", "response": {
            "status_code": 200, "body": {
                "choices": [{"message": {"content": "Response 1"}}],
                "usage": {"total_tokens": 45}}}},
        {"custom_id": "req-002", "response": {
            "status_code": 200, "body": {
                "choices": [{"message": {"content": "Response 2"}}],
                "usage": {"total_tokens": 52}}}},
        {"custom_id": "req-003", "response": {
            "status_code": 400, "body": {
                "error": {"message": "Invalid 'messages': expected at least 1 message.",
                          "type": "invalid_request_error"}}}}
    ]

    successes, failures = {}, {}
    for item in mock_results:
        cid = item["custom_id"]
        code = item["response"]["status_code"]
        if code == 200:
            body = item["response"]["body"]
            successes[cid] = body["choices"][0]["message"]["content"]
        else:
            err = item["response"]["body"]["error"]
            failures[cid] = f"{err['type']}: {err['message']}"

    print(f"Successes: {len(successes)} | Failures: {len(failures)}")
    if failures:
        print("\nFailed requests:")
        for cid, msg in failures.items():
            print(f"  [{cid}] {msg}")
    return successes, failures

ok, failed = parse_results_with_errors()

python

Successes: 2 | Failures: 1

Failed requests:
  [req-003] invalid_request_error: Invalid 'messages': expected at least 1 message.

Tip: Retry only the failures. Don’t resubmit all 10,000 requests for 50 failures. Extract the failed `custom_id` values, rebuild a JSONL with just those, and submit a mini batch. Reuse the same IDs so results merge cleanly.

Exercise 2: Build a Batch Retry Function

After a batch completes, some requests may have failed. Write a function that creates a retry JSONL from just the failed requests.

Try It Yourself

def create_retry_batch(original_prompts, failures):
    """Create a retry JSONL from failed request IDs.

    Args:
        original_prompts: Dict of custom_id -> prompt text
        failures: Dict of custom_id -> error info
    Returns:
        JSONL string with only failed requests
    """
    # YOUR CODE HERE
    pass

# Test data
originals = {
    "req-00001": "Explain overfitting.",
    "req-00002": "What is bagging?",
    "req-00003": "Define boosting.",
    "req-00004": "Explain dropout.",
    "req-00005": "What is batch normalization?"
}
failed_reqs = {
    "req-00002": {"error": "rate_limit_exceeded"},
    "req-00004": {"error": "server_error"}
}

retry = create_retry_batch(originals, failed_reqs)
lines = retry.strip().split("\n")
print(f"Retry batch size: {len(lines)}")
for line in lines:
    parsed = json.loads(line)
    print(f"  Retrying: {parsed['custom_id']}")

Expected output:

python

Retry batch size: 2
  Retrying: req-00002
  Retrying: req-00004

Hint 1

Loop through the `failures` dict keys. For each failed ID, look up the original prompt and build a standard JSONL request line.

Hint 2 (nearly the answer)

lines = []
for cid in failures:
    prompt = original_prompts[cid]
    request = {"custom_id": cid, "method": "POST",
               "url": "/v1/chat/completions",
               "body": {"model": "gpt-4o-mini",
                        "messages": [{"role": "system", "content": "You are a helpful assistant."},
                                     {"role": "user", "content": prompt}]}}
    lines.append(json.dumps(request))
return "\n".join(lines)

Solution

def create_retry_batch(original_prompts, failures):
    lines = []
    for cid in failures:
        prompt = original_prompts[cid]
        request = {
            "custom_id": cid,
            "method": "POST",
            "url": "/v1/chat/completions",
            "body": {
                "model": "gpt-4o-mini",
                "messages": [
                    {"role": "system", "content": "You are a helpful assistant."},
                    {"role": "user", "content": prompt}
                ],
                "temperature": 0.7,
                "max_tokens": 500
            }
        }
        lines.append(json.dumps(request))
    return "\n".join(lines)

**Key insight:** Reusing the original `custom_id` lets you merge retry results directly into your existing dict. No extra mapping needed.

Building a Reusable BatchProcessor Class

We’ve built each step as a standalone function. In production, you want one class that ties everything together. I’ll walk you through a BatchProcessor with three main methods: prepare(), submit(), and wait_and_download().

The constructor sets up tracking variables for stats and cost estimation:

class BatchProcessor:
    """Reusable pipeline for OpenAI Batch API.

    Usage:
        bp = BatchProcessor(api_key="sk-...")
        bp.prepare(prompts, model="gpt-4o-mini")
        bp.submit(description="Product descriptions")
        results = bp.wait_and_download()
        bp.summary()
    """

    def __init__(self, api_key="sk-demo"):
        self.api_key = api_key
        self.jsonl_content = None
        self.file_id = None
        self.batch_id = None
        self.results = None
        self.stats = {"total": 0, "completed": 0, "failed": 0,
                      "tokens": 0, "start": None, "end": None}

The prepare() method builds JSONL from your prompts. It mirrors our standalone function:

    def prepare(self, prompts, model="gpt-4o-mini",
                system_prompt="You are a helpful assistant.",
                temperature=0.7, max_tokens=500):
        """Build JSONL from a list of prompts."""
        lines = []
        for i, prompt in enumerate(prompts):
            req = {
                "custom_id": f"req-{i+1:05d}",
                "method": "POST",
                "url": "/v1/chat/completions",
                "body": {
                    "model": model,
                    "messages": [
                        {"role": "system", "content": system_prompt},
                        {"role": "user", "content": prompt}
                    ],
                    "temperature": temperature,
                    "max_tokens": max_tokens
                }
            }
            lines.append(json.dumps(req))
        self.jsonl_content = "\n".join(lines)
        self.stats["total"] = len(prompts)
        print(f"Prepared {len(prompts)} requests ({len(self.jsonl_content):,} bytes)")
        return self

The submit() and wait_and_download() methods handle upload, batch creation, polling, and result retrieval:

    def submit(self, description=""):
        """Upload file and create batch."""
        if not self.jsonl_content:
            raise ValueError("Call prepare() first")
        self.stats["start"] = datetime.now()
        self.file_id = "file-batch-001"  # Mock
        self.batch_id = "batch-proc-001"  # Mock
        print(f"Uploaded: {self.file_id} | Batch: {self.batch_id}")
        return self

    def wait_and_download(self):
        """Poll for completion and download results."""
        if not self.batch_id:
            raise ValueError("Call submit() first")
        print(f"Polling {self.batch_id}...")
        for status in ["validating", "in_progress", "completed"]:
            print(f"  -> {status}")

        self.stats["end"] = datetime.now()
        self.results = {}
        for i in range(self.stats["total"]):
            cid = f"req-{i+1:05d}"
            self.results[cid] = {
                "content": f"Response for request {i+1}",
                "tokens": 45 + i * 3, "status": 200
            }
        self.stats["completed"] = len(self.results)
        self.stats["tokens"] = sum(r["tokens"] for r in self.results.values())
        print(f"Done! {len(self.results)} results.")
        return self.results

The summary() method calculates the 50% savings so you can see exactly what the batch saved:

    def summary(self):
        """Print processing summary with cost estimate."""
        t = self.stats["tokens"]
        # gpt-4o-mini: ~$0.375/1M tokens (blended input+output)
        sync_cost = (t / 1_000_000) * 0.375
        batch_cost = sync_cost * 0.5

        print("\n" + "=" * 45)
        print("BATCH PROCESSING SUMMARY")
        print("=" * 45)
        print(f"Requests:    {self.stats['total']}")
        print(f"Completed:   {self.stats['completed']}")
        print(f"Failed:      {self.stats['failed']}")
        print(f"Tokens:      {t:,}")
        print(f"Sync cost:   ${sync_cost:.4f}")
        print(f"Batch cost:  ${batch_cost:.4f}")
        print(f"You saved:   ${sync_cost - batch_cost:.4f} (50%)")
        print("=" * 45)

Here’s the full pipeline in action:

bp = BatchProcessor()
prompts = [
    "Explain overfitting in one sentence.",
    "What does a confusion matrix show?",
    "Define precision vs recall.",
    "What is cross-validation?",
    "Explain the purpose of a learning rate."
]

bp.prepare(prompts)
bp.submit(description="ML concept quiz")
bp.wait_and_download()
bp.summary()

python

Prepared 5 requests (2,345 bytes)
Uploaded: file-batch-001 | Batch: batch-proc-001
Polling batch-proc-001...
  -> validating
  -> in_progress
  -> completed
Done! 5 results.

=============================================
BATCH PROCESSING SUMMARY
=============================================
Requests:    5
Completed:   5
Failed:      0
Tokens:      255
Sync cost:   $0.0001
Batch cost:  $0.0000
You saved:   $0.0000 (50%)
=============================================

Small batch, tiny savings. But scale it up and the numbers change fast.

Batch API Cost Savings: The Math at Scale

At small volumes, the 50% discount barely registers. At 10,000+ prompts, it adds up.

Here’s the breakdown for gpt-4o-mini. I’m assuming 100 input tokens and 150 output tokens per request — typical for classification or short generation:

def cost_table(counts, input_tok=100, output_tok=150):
    """Show sync vs batch costs at different scales."""
    # gpt-4o-mini pricing (March 2026)
    inp_rate = 0.15   # $/1M input tokens
    out_rate = 0.60   # $/1M output tokens

    print(f"{'Prompts':>10} {'Sync':>10} {'Batch':>10} {'Saved':>10}")
    print("-" * 44)
    for n in counts:
        inp_cost = (n * input_tok / 1e6) * inp_rate
        out_cost = (n * output_tok / 1e6) * out_rate
        sync = inp_cost + out_cost
        batch = sync * 0.5
        print(f"{n:>10,} \({sync:>9.4f} \){batch:>9.4f} ${sync-batch:>9.4f}")

cost_table([100, 1000, 5000, 10000, 50000])

python

   Prompts       Sync      Batch      Saved
--------------------------------------------
       100 $   0.0105 $   0.0053 $   0.0053
     1,000 $   0.1050 $   0.0525 $   0.0525
     5,000 $   0.5250 $   0.2625 $   0.2625
    10,000 $   1.0500 $   0.5250 $   0.5250
    50,000 $   5.2500 $   2.6250 $   2.6250

With gpt-4o-mini, 10,000 prompts saves about $0.53. Modest. But switch to gpt-4o and the savings jump:

def gpt4o_cost(n=10000, input_tok=100, output_tok=150):
    """Cost comparison for gpt-4o at scale."""
    # gpt-4o: $2.50/1M input, $10.00/1M output
    inp = (n * input_tok / 1e6) * 2.50
    out = (n * output_tok / 1e6) * 10.00
    sync = inp + out
    batch = sync * 0.5
    print(f"gpt-4o with {n:,} prompts:")
    print(f"  Sync cost:  ${sync:.2f}")
    print(f"  Batch cost: ${batch:.2f}")
    print(f"  You save:   ${sync - batch:.2f}")

gpt4o_cost(10000)

python

gpt-4o with 10,000 prompts:
  Sync cost:  $17.50
  Batch cost: $8.75
  You save:   $8.75

$8.75 saved on a single batch run. Run that weekly and you save $455 per year.

Key Insight: The Batch API’s real value goes beyond the 50% discount. It’s operational simplicity. No rate limit juggling, no connection management for 10,000 calls, no retry logic for transient failures. Upload one file, download one file. Done.

Handling Large Batches: Chunking Strategy

What if you have 200,000 prompts? A single batch has a 50,000-request cap and a 200 MB file limit. You need to split them into chunks.

Here’s a simple chunking approach:

def chunk_prompts(prompts, chunk_size=50000):
    """Split a large prompt list into batch-sized chunks."""
    chunks = []
    for i in range(0, len(prompts), chunk_size):
        chunk = prompts[i:i + chunk_size]
        chunks.append(chunk)
    print(f"Split {len(prompts):,} prompts into {len(chunks)} batches")
    return chunks

# Demo: 120,000 prompts -> 3 batches
big_list = [f"Prompt {i}" for i in range(120000)]
batches = chunk_prompts(big_list)
for i, batch in enumerate(batches):
    print(f"  Batch {i+1}: {len(batch):,} prompts")

python

Split 120,000 prompts into 3 batches
  Batch 1: 50,000 prompts
  Batch 2: 50,000 prompts
  Batch 3: 20,000 prompts

Submit each chunk as a separate batch. Merge the results using custom_id. I recommend adding the batch number to your IDs (batch1-req-00001) so you can trace any issues.

When NOT to Use the Batch API

I want to be straight with you. The Batch API isn’t always the right call.

Real-time apps? Definitely not. Chatbots, autocomplete, interactive tools — anything where a user is waiting needs the sync API.

Fewer than 100 requests? Probably not worth it. The overhead of creating a file, uploading, polling, and downloading outweighs the savings.

Conversational chains? Can’t batch these. When request N depends on the response from request N-1, you need sync calls.

Time-sensitive data? If your prompts reference live data that changes within hours, responses might be stale when they arrive.

Use Case	Batch?	Why
10K product descriptions	Yes	Bulk work, no time pressure
5K ticket classifications	Yes	Offline analysis
Chatbot responses	No	Must be instant
500 document summaries	Yes	Offline job
Real-time moderation	No	Latency-critical
2K essay grades	Yes	Not time-sensitive

Common Batch API Mistakes and How to Fix Them

Mistake 1: Writing JSON arrays instead of JSONL

The number one failure. JSONL means one JSON object per line. Not a JSON array. If you use json.dumps(list_of_dicts), you get an array that OpenAI rejects.

requests_list = [{"custom_id": "r1"}, {"custom_id": "r2"}]

bad = json.dumps(requests_list)
print("WRONG (JSON array):")
print(bad)

good = "\n".join(json.dumps(r) for r in requests_list)
print("\nCORRECT (JSONL):")
print(good)

python

WRONG (JSON array):
[{"custom_id": "r1"}, {"custom_id": "r2"}]

CORRECT (JSONL):
{"custom_id": "r1"}
{"custom_id": "r2"}

Mistake 2: Mismatched endpoint URLs

The url field inside each JSONL line must match the endpoint you set when creating the batch. Using the legacy /v1/completions with a chat model? Silent validation failure.

wrong = {"url": "/v1/completions"}       # Legacy endpoint
right = {"url": "/v1/chat/completions"}  # Chat models

print(f"Wrong: {wrong['url']}")
print(f"Right: {right['url']}")

python

Wrong: /v1/completions
Right: /v1/chat/completions

Mistake 3: Ignoring the error file after completion

A "completed" status means the batch finished — not that every request succeeded. With 10,000 requests, 50 might have failed. If you skip the error file, you silently lose those results.

def check_for_errors(batch_status):
    """Always check request_counts.failed after completion."""
    counts = batch_status.get("request_counts", {})
    failed = counts.get("failed", 0)
    total = counts.get("total", 0)

    if failed > 0:
        pct = failed / total * 100
        print(f"WARNING: {failed}/{total} failed ({pct:.1f}%)")
        print("Download the error file!")
    else:
        print(f"All {total} requests succeeded.")

check_for_errors({"request_counts": {"total": 10000, "completed": 9950, "failed": 50}})

python

WARNING: 50/10000 failed (0.5%)
Download the error file!

Warning: Always check `request_counts.failed` after a batch completes. A `”completed”` status means processing finished. It does NOT mean every request succeeded.

Batch API Error Troubleshooting

Error: Invalid 'messages' field

python

{"error": {"message": "Invalid 'messages': expected array of message objects"}}

Fix: Each message needs role and content keys. Make sure messages is a list of dicts.

Error: Unrecognized request argument: prompt

python

{"error": {"message": "Unrecognized request argument supplied: prompt"}}

Fix: Chat models use messages, not prompt. Update your JSONL body.

Error: Batch file validation failed: line N is not valid JSON

python

{"error": {"message": "Batch file validation failed: line 47 is not valid JSON"}}

Fix: Line 47 has a syntax error. Validate each line before uploading:

def validate_jsonl(content):
    """Check every line is valid JSON before uploading."""
    errors = []
    for i, line in enumerate(content.strip().split("\n"), 1):
        try:
            json.loads(line)
        except json.JSONDecodeError as e:
            errors.append(f"Line {i}: {e}")
    if errors:
        print(f"Found {len(errors)} invalid lines:")
        for err in errors:
            print(f"  {err}")
    else:
        print("All lines valid!")
    return len(errors) == 0

validate_jsonl(jsonl_content)

python

All lines valid!

Complete Code

Click to expand the full production-ready script (copy-paste and run)

import micropip
await micropip.install(["requests"])

# Complete code: OpenAI Batch API Pipeline
# Requires: pip install requests
# Python 3.9+

import json
import time
import requests
from datetime import datetime

API_KEY = "sk-your-api-key-here"
BASE_URL = "https://api.openai.com/v1"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# --- Step 1: Build JSONL ---
def create_batch_jsonl(prompts, model="gpt-4o-mini",
                       system_prompt="You are a helpful assistant.",
                       temperature=0.7, max_tokens=500):
    lines = []
    for i, prompt in enumerate(prompts):
        request = {
            "custom_id": f"req-{i+1:05d}",
            "method": "POST",
            "url": "/v1/chat/completions",
            "body": {
                "model": model,
                "messages": [
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": prompt}
                ],
                "temperature": temperature,
                "max_tokens": max_tokens
            }
        }
        lines.append(json.dumps(request))
    return "\n".join(lines)

# --- Step 2: Upload file ---
def upload_file(jsonl_content):
    resp = requests.post(f"{BASE_URL}/files",
        headers=HEADERS,
        files={"file": ("batch.jsonl", jsonl_content, "application/jsonl")},
        data={"purpose": "batch"})
    resp.raise_for_status()
    return resp.json()["id"]

# --- Step 3: Create batch ---
def create_batch(file_id, endpoint="/v1/chat/completions", desc=""):
    resp = requests.post(f"{BASE_URL}/batches",
        headers={**HEADERS, "Content-Type": "application/json"},
        json={"input_file_id": file_id, "endpoint": endpoint,
              "completion_window": "24h",
              "metadata": {"description": desc}})
    resp.raise_for_status()
    return resp.json()["id"]

# --- Step 4: Poll status ---
def wait_for_batch(batch_id, interval=10, max_wait=86400):
    elapsed = 0
    while elapsed < max_wait:
        resp = requests.get(f"{BASE_URL}/batches/{batch_id}",
                           headers=HEADERS)
        batch = resp.json()
        status = batch["status"]
        counts = batch["request_counts"]
        print(f"[{status}] {counts['completed']}/{counts['total']}")
        if status in ("completed", "failed", "expired", "cancelled"):
            return batch
        time.sleep(interval)
        elapsed += interval
        interval = min(interval * 2, 120)
    return None

# --- Step 5: Download results ---
def download_results(batch):
    file_id = batch["output_file_id"]
    resp = requests.get(f"{BASE_URL}/files/{file_id}/content",
                       headers=HEADERS)
    results = {}
    for line in resp.text.strip().split("\n"):
        item = json.loads(line)
        cid = item["custom_id"]
        body = item["response"]["body"]
        results[cid] = {
            "content": body["choices"][0]["message"]["content"],
            "tokens": body["usage"]["total_tokens"]
        }
    return results

# --- Run the pipeline ---
if __name__ == "__main__":
    prompts = ["Your prompts here..."]
    jsonl = create_batch_jsonl(prompts)
    file_id = upload_file(jsonl)
    batch_id = create_batch(file_id, desc="My batch job")
    batch = wait_for_batch(batch_id)
    if batch and batch["status"] == "completed":
        results = download_results(batch)
        print(f"Got {len(results)} results!")

print("Script completed successfully.")

Summary

You now have a full toolkit for the OpenAI Batch API:

JSONL format — Each line is a self-contained request with custom_id, method, url, and body.
5-step workflow — Build JSONL, upload, create batch, poll, download.
BatchProcessor class — Reusable pipeline with progress tracking and cost estimates.
Error handling — Batch-level and request-level errors, plus retry logic.
Cost savings — 50% on all tokens. Biggest impact at 10K+ requests with gpt-4o.

The pattern fits any non-real-time use case: bulk generation, data labeling, classification, summarization, translation, evaluation.

Practice challenge: Extend the BatchProcessor with auto-retry. When wait_and_download() finds failed requests, it creates a retry batch automatically and merges the results.

Solution sketch

def wait_and_download_with_retry(self, max_retries=2):
    results = self.wait_and_download()
    retries = 0
    while self.stats["failed"] > 0 and retries < max_retries:
        retries += 1
        failed_ids = [cid for cid, r in results.items()
                      if r["status"] != 200]
        print(f"Retry {retries}: resubmitting {len(failed_ids)} requests")
        # Rebuild JSONL for failed IDs, submit, merge
    return results

Frequently Asked Questions

Can I cancel a running batch?

Yes. POST to /v1/batches/{batch_id}/cancel. Status changes to "cancelling", then "cancelled". Completed results stay downloadable. You’re charged only for what finished.

Which models support the Batch API?

All GPT-4o variants, GPT-4o-mini, GPT-4-turbo, and GPT-3.5-turbo. Embeddings and completions endpoints work too. The 50% discount applies across all supported models. Check OpenAI’s pricing page for the current list.

Does the Batch API work for embeddings?

Yes. Set the endpoint to /v1/embeddings. Each line’s body needs "model" and "input" fields instead of "messages". Same 50% discount.

What happens when a batch expires?

OpenAI marks it "expired". Any results that finished are still downloadable. Unprocessed requests aren’t charged. Submit a new batch for the remainder.

Is there a limit on how many batches I can run at once?

No hard cap on batch count, but there’s a limit on total enqueued requests. The cap varies by account tier. Hit the limit and you get a 429 response. Check your limits at platform.openai.com.

References

OpenAI Batch API Documentation — Official guide. Link
OpenAI API Pricing — Model pricing with batch discounts. Link
OpenAI Files API Reference — File upload endpoints. Link
OpenAI Batch API FAQ — Common questions. Link
JSON Lines Format Specification. Link
OpenAI Rate Limits Guide. Link
OpenAI Cost Optimization Guide. Link

Free Course

Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course →

Trusted by 50,000+ learners

Written by

Selva Prabhakaran →

Related Course

Master Gen AI — Hands-On

Join 5,000+ students at edu.machinelearningplus.com

Explore Course

OpenAI Batch API: Process 10K Prompts at 50% Cost

What Is the OpenAI Batch API?

How Does the Batch API Work? A 5-Step Pipeline

Setting Up for Batch API Requests

Prerequisites

Step 1: Building the Batch API JSONL File

Step 2: Uploading the Batch File to OpenAI

Step 3: Creating a Batch API Job

Step 4: Polling Batch API Status with Backoff

Step 5: Downloading and Parsing Batch Results

Exercise 1: Build a Batch Sentiment Classifier

Handling Batch API Errors: Two Levels to Watch

Exercise 2: Build a Batch Retry Function

Building a Reusable BatchProcessor Class

Batch API Cost Savings: The Math at Scale

Handling Large Batches: Chunking Strategy

When NOT to Use the Batch API

Common Batch API Mistakes and How to Fix Them

Mistake 1: Writing JSON arrays instead of JSONL

Mistake 2: Mismatched endpoint URLs

Mistake 3: Ignoring the error file after completion

Batch API Error Troubleshooting

Complete Code

Summary

Frequently Asked Questions

Can I cancel a running batch?

Which models support the Batch API?

Does the Batch API work for embeddings?

What happens when a batch expires?

Is there a limit on how many batches I can run at once?

References

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

What Is the OpenAI Batch API?

How Does the Batch API Work? A 5-Step Pipeline

Setting Up for Batch API Requests

Prerequisites

Step 1: Building the Batch API JSONL File

Step 2: Uploading the Batch File to OpenAI

Step 3: Creating a Batch API Job

Step 4: Polling Batch API Status with Backoff

Step 5: Downloading and Parsing Batch Results

Exercise 1: Build a Batch Sentiment Classifier

Handling Batch API Errors: Two Levels to Watch

Exercise 2: Build a Batch Retry Function

Building a Reusable BatchProcessor Class

Batch API Cost Savings: The Math at Scale

Handling Large Batches: Chunking Strategy

When NOT to Use the Batch API

Common Batch API Mistakes and How to Fix Them

Mistake 1: Writing JSON arrays instead of JSONL

Mistake 2: Mismatched endpoint URLs

Mistake 3: Ignoring the error file after completion

Batch API Error Troubleshooting

Complete Code

Summary

Frequently Asked Questions

Can I cancel a running batch?

Which models support the Batch API?

Does the Batch API work for embeddings?

What happens when a batch expires?

Is there a limit on how many batches I can run at once?

References

Related Articles

LLM Temperature, Top-P, and Top-K Explained — With Python Simulations

OpenAI API Python Tutorial — Chat Completions, Streaming & Error Handling

Zero-Shot vs Few-Shot Prompting: Complete Guide

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science