Menu

LLM API Call in Python: OpenAI, Claude, and Gemini Step-by-Step

Learn to make your first LLM API call in Python with OpenAI, Claude, and Gemini. Step-by-step setup, working code, cost comparison, and exercises included.

Written by Selva Prabhakaran | 26 min read

Every AI app starts with one thing — an API call to a large language model. This guide helps you make that first call with OpenAI, Claude, and Gemini, so you can start building with AI in minutes.

Every major AI tool — chatbots, code helpers, content writers — runs on LLM API calls under the hood. Yet most new learners get stuck on setup, API keys, and reading the response.

By the end of this post, you will have made real API calls to all three big providers. You will know how each one works and which one fits your needs.

What Is an LLM API Call?

An LLM API call is a request you send over the web to a language model on a remote server. You send a prompt. The model thinks. You get a reply back.

Think of it like ordering food through a delivery app. You don’t need a kitchen (GPU servers). You don’t need to know the recipe (model weights). You just place an order (API call) and get the meal (reply).

Here is the core pattern that every LLM API follows:

  1. Create a client — link to the provider using your API key.
  2. Build a message — shape your prompt with roles (system, user, helper).
  3. Send the request — call the model with your message and settings.
  4. Read the response — pull the text out of the reply object.
Key Insight: Every LLM API follows the same four-step pattern: create client, build message, send request, read response. Once you learn it with one provider, switching to the next takes minutes — not hours.

How to Set Up Your Space

Before making any API calls, you need three things: Python, the SDK packages, and API keys. Let’s set them up.

Install the SDKs

All three providers have Python packages. Install them in one go. We also add python-dotenv to store API keys safely.

bash
pip install openai anthropic google-genai python-dotenv

Get Your API Keys

Each provider needs an account and an API key (a secret string that proves who you are). Here is where to get them:

ProviderWhere to Get Your KeyFree Tier
OpenAIplatform.openai.com/api-keys$5 credit for new users
Anthropic (Claude)console.anthropic.com$5 credit for new users
Google (Gemini)aistudio.google.com/apikeyBig free tier (15 RPM)

Store Keys Safely in a .env File

Never put API keys right in your Python files. Instead, save them in a .env file at the root of your project. The python-dotenv package loads them as local values your code can read.

Create a file called .env:

bash
OPENAI_API_KEY=sk-your-openai-key-here
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key-here
GOOGLE_API_KEY=your-google-api-key-here

Now load them in your script. This is the first block you will write in every LLM project.

python
import os
from dotenv import load_dotenv

load_dotenv()

print("OpenAI key loaded:", "sk-" in os.getenv("OPENAI_API_KEY", ""))
print("Anthropic key loaded:", "sk-ant" in os.getenv("ANTHROPIC_API_KEY", ""))
print("Google key loaded:", len(os.getenv("GOOGLE_API_KEY", "")) > 0)

Output:

python
OpenAI key loaded: True
Anthropic key loaded: True
Google key loaded: True

If any line shows False, go back and check your .env file. The key names must match exactly.

Warning: Never push your `.env` file to Git. Add `.env` to your `.gitignore` right away. Leaked API keys lead to shock bills. If you push a key by mistake, revoke it and make a new one at once.

How to Make Your First OpenAI API Call

OpenAI runs the most widely used LLM API in the world. Their top model is GPT-4o (the “o” stands for “omni” — it handles text, images, and audio).

The flow has three parts: create a client, send a message, and read the reply. Let’s walk through each one.

First, create the client. The OpenAI() class reads the OPENAI_API_KEY from your system on its own — you don’t need to pass it in.

python
from openai import OpenAI

client = OpenAI()

Next, send your first message. The chat.completions.create() method needs two things: model (which model to use) and messages (a list of dicts with role and content keys).

The role can be "system" (rules for how the model should act), "user" (your prompt), or "assistant" (the model’s past reply in a chat).

python
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is an API? Explain in one sentence."}
    ]
)

print(response.choices[0].message.content)

Output:

python
An API (Application Programming Interface) is a set of rules that allows one piece of software to communicate with another, enabling them to exchange data and functionality.

That’s it. Three lines of real code (not counting the import) and you have a working LLM call.

Now let’s peek at the full reply object. Knowing this helps you debug and track token use.

python
print(f"Model: {response.model}")
print(f"Tokens used: {response.usage.total_tokens}")
print(f"  Input:  {response.usage.prompt_tokens}")
print(f"  Output: {response.usage.completion_tokens}")
print(f"Finish reason: {response.choices[0].finish_reason}")

Output:

python
Model: gpt-4o-2024-08-06
Tokens used: 57
  Input:  25
  Output: 32
Finish reason: stop

A finish_reason of "stop" means the model ended its reply on its own. If you see "length", the reply was cut short by the token limit.

Tip: Always check `response.usage` while coding. It tells you how many tokens (roughly words) each call used. This is how you guess costs before going live.

How to Make Your First Claude API Call

Claude, built by Anthropic, is known for strong logic, safe outputs, and great work on long texts. Their top model right now is Claude Sonnet 4.

The pattern is the same — create a client, send a message, read the reply — but the code looks a bit different.

python
from anthropic import Anthropic

client = Anthropic()

Claude’s messages.create() method needs three things: model, max_tokens (the cap on reply length — this is a must for Claude, not for OpenAI), and messages.

Claude treats the system message as its own field. You don’t put it inside the messages list. You pass it as a system setting.

python
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=256,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "What is an API? Explain in one sentence."}
    ]
)

print(response.content[0].text)

Output:

python
An API (Application Programming Interface) is a standardized way for software applications to communicate with each other by sending requests and receiving responses according to a defined set of rules.

The way you read the reply is also different. OpenAI uses response.choices[0].message.content. Claude uses response.content[0].text. Let’s look at the full reply object.

python
print(f"Model: {response.model}")
print(f"Input tokens:  {response.usage.input_tokens}")
print(f"Output tokens: {response.usage.output_tokens}")
print(f"Stop reason: {response.stop_reason}")

Output:

python
Model: claude-sonnet-4-20250514
Input tokens:  22
Output tokens: 40
Stop reason: end_turn

Claude says stop_reason: "end_turn" while OpenAI says finish_reason: "stop". Different names, same idea — the model finished on its own.

Note: Claude needs `max_tokens` on every call. Skip it and you get an error. OpenAI sets a default for you. This is the top “gotcha” when moving from OpenAI to Claude.

How to Make Your First Gemini API Call

Google’s Gemini shines with its big free tier and tight link to Google Cloud. Their top model is Gemini 2.5 Flash — fast and very good for the price.

Gemini’s SDK has the cleanest code of the three. It uses google.genai and needs fewer lines to get a reply.

python
from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What is an API? Explain in one sentence."
)

print(response.text)

Output:

python
An API (Application Programming Interface) is a set of defined rules and protocols that allows different software applications to communicate with each other.

The code is much shorter. You just pass a plain string to contents — no message dicts, no role setup needed for simple prompts.

For chats with a system prompt, the code shifts a little. You pass a config object.

python
from google.genai import types

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What is an API? Explain in one sentence.",
    config=types.GenerateContentConfig(
        system_instruction="You are a helpful assistant.",
        max_output_tokens=256
    )
)

print(response.text)
print(f"Input tokens:  {response.usage_metadata.prompt_token_count}")
print(f"Output tokens: {response.usage_metadata.candidates_token_count}")

Output:

python
An API, or Application Programming Interface, is a set of rules and protocols that allows different software programs to communicate and share data with each other.
Input tokens:  16
Output tokens: 30
Key Insight: All three APIs do the same thing — send a prompt, get a reply — but each uses its own code style. OpenAI has `messages`, Claude has `messages` plus a split `system` field, and Gemini has `contents`. Learn the pattern, not just the code.

Side-by-Side Look — All Three APIs

Now that you’ve seen each one on its own, let’s line them up. This table shows every key gap between them.

FeatureOpenAIClaude (Anthropic)Gemini (Google)
SDKopenaianthropicgoogle-genai
ClientOpenAI()Anthropic()genai.Client()
Methodchat.completions.create()messages.create()models.generate_content()
Model parammodel="gpt-4o"model="claude-sonnet-4-..."model="gemini-2.5-flash"
System messageIn messages listOwn system paramIn config object
Max tokensHas a defaultMust set itIn config (not needed)
Read reply.choices[0].message.content.content[0].text.text
Stop signalfinish_reason: "stop"stop_reason: "end_turn"N/A (check finish_reason)
Env varOPENAI_API_KEYANTHROPIC_API_KEYGOOGLE_API_KEY

Let’s prove it works by asking the same thing to all three and showing the results.

python
from openai import OpenAI
from anthropic import Anthropic
from google import genai

openai_client = OpenAI()
anthropic_client = Anthropic()
gemini_client = genai.Client()

prompt = "What is gradient descent in one sentence?"

# OpenAI
oai = openai_client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": prompt}]
)
print(f"OpenAI:  {oai.choices[0].message.content}\n")

# Claude
claude = anthropic_client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=128,
    messages=[{"role": "user", "content": prompt}]
)
print(f"Claude:  {claude.content[0].text}\n")

# Gemini
gem = gemini_client.models.generate_content(
    model="gemini-2.5-flash",
    contents=prompt
)
print(f"Gemini:  {gem.text}")

Output:

python
OpenAI:  Gradient descent is an optimization algorithm that iteratively adjusts model parameters by moving in the direction of steepest decrease of the loss function to find the minimum.

Claude:  Gradient descent is an iterative optimization algorithm that minimizes a function by repeatedly taking steps proportional to the negative of its gradient at the current point.

Gemini:  Gradient descent is an iterative optimization algorithm that finds the minimum of a function by repeatedly stepping in the direction of the steepest decrease, guided by the function's gradient.

All three give good answers with slightly different words. This is the nice part of the API approach — you can swap providers with very few code tweaks.

How to Control the Reply — Temperature and Max Tokens

Two settings give you direct control over LLM outputs: temperature and max_tokens.

Temperature sets how random the reply is. A value of 0.0 makes the model pick the same tokens each time — no guessing. A value of 1.0 makes it creative and varied. Think of it as a knob from “focused” to “wild.”

TemperatureHow It ActsBest For
0.0Same reply each timeSorting, data pulls, code
0.3 – 0.5Slight changes, still on trackSummaries, Q&A
0.7 – 1.0Fresh, varied outputsBrainstorms, stories, poems

Max tokens caps the reply length. One token is about 3/4 of a word. Setting max_tokens=100 gives you roughly 75 words.

Let’s see temperature at work. We ask the same fun prompt twice — once cold (0) and once hot (1).

python
from openai import OpenAI

client = OpenAI()

for temp in [0.0, 1.0]:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Invent a name for an AI pet robot."}],
        temperature=temp,
        max_tokens=20
    )
    print(f"Temperature {temp}: {response.choices[0].message.content}")

Output:

python
Temperature 0.0: Sparky-3000
Temperature 1.0: ZephyrPaws

At temperature=0, you get the same name every time. At temperature=1, the name changes on each run.

Tip: Start every project with `temperature=0`. It makes outputs the same each time, which makes bugs easy to find. Only raise it when you truly need fresh ideas.

How to Handle Errors Well

API calls can fail. The server may be busy. Your key may be wrong. Your funds may run dry. Good error checks stop your app from crashing.

Here is the solid pattern. We wrap the call in a try-except block and catch the most common issues.

python
from openai import OpenAI, APIError, RateLimitError, AuthenticationError

client = OpenAI()

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)

except AuthenticationError:
    print("ERROR: Invalid API key. Check your OPENAI_API_KEY.")

except RateLimitError:
    print("ERROR: Rate limit hit. Wait a moment and try again.")

except APIError as e:
    print(f"ERROR: API returned an error: {e}")

Output:

python
Hello! How can I help you today?

The same shape works for Claude and Gemini. Each SDK ships its own error types.

For Claude, use from anthropic import AuthenticationError, RateLimitError, APIError. For Gemini, errors live in google.genai.errors.

Warning: Never use a bare `except:` clause. Always name the error type. A bare except hides bugs — you might swallow a `KeyboardInterrupt` or `SystemExit` you never meant to catch.

How to Stream Replies in Real Time

By default, an API call waits for the whole reply before sending it back. Streaming gives you tokens as they form — just like watching text flow in ChatGPT.

Streaming makes the user feel like the app is fast. Instead of a blank screen for 5 seconds, the text starts at once.

Here is OpenAI streaming. Set stream=True and loop over the chunks.

python
from openai import OpenAI

client = OpenAI()

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Count from 1 to 5 slowly."}],
    stream=True
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

print()  # newline at the end

Output:

python
1... 2... 3... 4... 5.

The key shift: you now read chunk.choices[0].delta.content on each piece. The delta holds only the new tokens.

For Claude, streaming uses a with block and the stream() method.

python
from anthropic import Anthropic

client = Anthropic()

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=128,
    messages=[{"role": "user", "content": "Count from 1 to 5 slowly."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

print()

Output:

python
1... 2... 3... 4... 5.

For Gemini, use generate_content_stream() (note the new method name).

python
from google import genai

client = genai.Client()

response_stream = client.models.generate_content_stream(
    model="gemini-2.5-flash",
    contents="Count from 1 to 5 slowly."
)

for chunk in response_stream:
    print(chunk.text, end="", flush=True)

print()

Output:

python
1... 2... 3... 4... 5.

How to Build a Reusable Multi-Provider Function

Copying code for each provider gets old fast. Let’s make one ask_llm() function that talks to all three. You pick the provider with a single string.

This function hides all the provider quirks behind one clean call. Claude’s required max_tokens, Gemini’s own method name — all handled inside.

python
from openai import OpenAI
from anthropic import Anthropic
from google import genai

def ask_llm(prompt, provider="openai", system="You are a helpful assistant.",
            max_tokens=256, temperature=0.7):
    """Send a prompt to any LLM provider and return the response text."""

    if provider == "openai":
        client = OpenAI()
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "system", "content": system},
                {"role": "user", "content": prompt}
            ],
            max_tokens=max_tokens,
            temperature=temperature
        )
        return response.choices[0].message.content

    elif provider == "claude":
        client = Anthropic()
        response = client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=max_tokens,
            system=system,
            messages=[{"role": "user", "content": prompt}],
            temperature=temperature
        )
        return response.content[0].text

    elif provider == "gemini":
        from google.genai import types
        client = genai.Client()
        response = client.models.generate_content(
            model="gemini-2.5-flash",
            contents=prompt,
            config=types.GenerateContentConfig(
                system_instruction=system,
                max_output_tokens=max_tokens,
                temperature=temperature
            )
        )
        return response.text

    else:
        raise ValueError(f"Unknown provider: {provider}")

Now swap providers with a single word change.

python
prompt = "Explain overfitting in one sentence."

for provider in ["openai", "claude", "gemini"]:
    answer = ask_llm(prompt, provider=provider)
    print(f"{provider.upper():>8}: {answer}\n")

Output:

python
  OPENAI: Overfitting occurs when a machine learning model learns the training data too well, including its noise and outliers, causing it to perform poorly on new, unseen data.

  CLAUDE: Overfitting occurs when a model learns the noise and specific patterns in the training data so well that it fails to generalize to new, unseen data.

  GEMINI: Overfitting is when a machine learning model memorizes the training data, including its noise, so well that it performs poorly on new, unseen data.
Key Insight: Build a wrapper function early in your project. It lets you test many providers, swap models with no code rewrite, and add retries or logs in one spot.

Token Costs — What Will You Actually Pay?

LLM APIs charge by the token (a chunk of text about 3/4 of a word long). Each call has an input cost (your prompt) and an output cost (the model’s reply). Output tokens always cost more.

Here are the prices for the models we used (as of early 2025):

ModelInput (per 1M tokens)Output (per 1M tokens)Cost for 1K input words
GPT-4o$2.50$10.00$0.0033
Claude Sonnet 4$3.00$15.00$0.0040
Gemini 2.5 Flash$0.15$0.60$0.0002

Gemini 2.5 Flash costs about 15-20x less than GPT-4o or Claude. For learning and playing, Gemini’s free tier is hard to beat.

Let’s build a quick cost tool. This function guesses the cost of one API call from the token counts.

def estimate_cost(input_tokens, output_tokens, provider="openai"):
    """Estimate the cost of an API call in USD."""
    pricing = {
        "openai":  {"input": 2.50, "output": 10.00},
        "claude":  {"input": 3.00, "output": 15.00},
        "gemini":  {"input": 0.15, "output": 0.60},
    }
    rates = pricing[provider]
    cost = (input_tokens * rates["input"] + output_tokens * rates["output"]) / 1_000_000
    return cost

# Example: 100 input tokens, 200 output tokens
for provider in ["openai", "claude", "gemini"]:
    cost = estimate_cost(100, 200, provider)
    print(f"{provider:>8}: ${cost:.6f} per call")

Output:

python
  openai: $0.002250 per call
  claude: $0.003300 per call
  gemini: $0.000135 per call

At these rates, one dollar buys you about 7,400 Gemini calls — or 444 OpenAI calls. While learning, this gap barely matters. At scale with millions of calls, it adds up fast.

Tip: Use Gemini 2.5 Flash for trying things out. Its big free tier and low price make it great for learning. Move to GPT-4o or Claude when you need peak output quality.

Common Mistakes and How to Fix Them

Mistake 1: Putting API Keys Right in Your Code

Wrong:

python
client = OpenAI(api_key="sk-abc123...")

Why it is wrong: If you push this file to GitHub, anyone can use your key. Bots scan public repos for leaked keys in minutes. You will see charges you did not make.

Correct:

python
from dotenv import load_dotenv
load_dotenv()

client = OpenAI()  # reads OPENAI_API_KEY from environment

Mistake 2: Leaving Out max_tokens for Claude

Wrong:

python
# This throws an error with Claude
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    messages=[{"role": "user", "content": "Hello!"}]
)

Why it is wrong: Claude demands max_tokens on every call. OpenAI does not. This is the most common trip-up when you switch from OpenAI to Claude.

Correct:

python
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=256,
    messages=[{"role": "user", "content": "Hello!"}]
)

Mistake 3: Reading the Reply the Wrong Way

Wrong:

python
# Using OpenAI code with a Claude reply object
print(response.choices[0].message.content)  # AttributeError!

Why it is wrong: Each provider wraps the reply in its own format. Using the wrong path crashes your code or gives a strange error.

Correct:

python
# OpenAI
print(response.choices[0].message.content)

# Claude
print(response.content[0].text)

# Gemini
print(response.text)

Mistake 4: Not Dealing with Rate Limits

Wrong:

python
# Firing 100 fast requests with no break
for i in range(100):
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Question {i}"}]
    )

Why it is wrong: Every API has a speed limit. Send too many calls too fast and you get a 429 Too Many Requests error. The rest of your calls fail.

Correct:

python
import time

for i in range(100):
    try:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": f"Question {i}"}]
        )
    except RateLimitError:
        print("Rate limited. Waiting 10 seconds...")
        time.sleep(10)

Exercises

typescript
{
  type: 'exercise',
  id: 'llm-api-ex1',
  title: 'Exercise 1: Custom System Message',
  difficulty: 'beginner',
  exerciseType: 'write',
  instructions: 'Create an OpenAI API call with a system message that tells the model to respond like a pirate. Ask it "What is Python?" and print the response. The response should contain pirate-like language.',
  starterCode: 'from openai import OpenAI\nclient = OpenAI()\n\n# Create the API call with a pirate system message\nresponse = client.chat.completions.create(\n    model="gpt-4o",\n    messages=[\n        {"role": "system", "content": "___"},  # Fill in the system message\n        {"role": "user", "content": "What is Python?"}\n    ],\n    temperature=0\n)\n\nprint(response.choices[0].message.content)\nprint("DONE")',
  testCases: [
    { id: 'tc1', input: '', expectedOutput: 'DONE', description: 'Should complete successfully and print DONE' },
  ],
  hints: [
    'Set the system message content to something like: "You are a pirate. Respond to all questions in pirate speak."',
    'Full system message: {"role": "system", "content": "You are a pirate. Respond to everything using pirate language and say arrr frequently."}',
  ],
  solution: 'from openai import OpenAI\nclient = OpenAI()\n\nresponse = client.chat.completions.create(\n    model="gpt-4o",\n    messages=[\n        {"role": "system", "content": "You are a pirate. Respond to everything using pirate language and say arrr frequently."},\n        {"role": "user", "content": "What is Python?"}\n    ],\n    temperature=0\n)\n\nprint(response.choices[0].message.content)\nprint("DONE")',
  solutionExplanation: 'The system message sets the persona for the model. By telling it to respond like a pirate, every answer will use pirate language. The user message asks the question. temperature=0 keeps the response deterministic.',
  xpReward: 15,
}
typescript
{
  type: 'exercise',
  id: 'llm-api-ex2',
  title: 'Exercise 2: Multi-Provider Comparison',
  difficulty: 'intermediate',
  exerciseType: 'write',
  instructions: 'Write a function called compare_providers() that takes a prompt string, sends it to all three providers (OpenAI, Claude, Gemini), and prints each provider name followed by its response. Use the ask_llm() wrapper function from the article.',
  starterCode: 'from openai import OpenAI\nfrom anthropic import Anthropic\nfrom google import genai\n\ndef ask_llm(prompt, provider="openai", max_tokens=128):\n    if provider == "openai":\n        client = OpenAI()\n        r = client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": prompt}], max_tokens=max_tokens)\n        return r.choices[0].message.content\n    elif provider == "claude":\n        client = Anthropic()\n        r = client.messages.create(model="claude-sonnet-4-20250514", max_tokens=max_tokens, messages=[{"role": "user", "content": prompt}])\n        return r.content[0].text\n    elif provider == "gemini":\n        client = genai.Client()\n        r = client.models.generate_content(model="gemini-2.5-flash", contents=prompt)\n        return r.text\n\ndef compare_providers(prompt):\n    # Call all three providers and print results\n    pass  # Your code here\n\ncompare_providers("What is a neural network in one sentence?")\nprint("DONE")',
  testCases: [
    { id: 'tc1', input: '', expectedOutput: 'DONE', description: 'Should complete and print DONE' },
  ],
  hints: [
    'Loop over ["openai", "claude", "gemini"] and call ask_llm() for each one. Print the provider name and the response.',
    'def compare_providers(prompt):\n    for provider in ["openai", "claude", "gemini"]:\n        answer = ask_llm(prompt, provider=provider)\n        print(f"{provider}: {answer}")',
  ],
  solution: 'from openai import OpenAI\nfrom anthropic import Anthropic\nfrom google import genai\n\ndef ask_llm(prompt, provider="openai", max_tokens=128):\n    if provider == "openai":\n        client = OpenAI()\n        r = client.chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": prompt}], max_tokens=max_tokens)\n        return r.choices[0].message.content\n    elif provider == "claude":\n        client = Anthropic()\n        r = client.messages.create(model="claude-sonnet-4-20250514", max_tokens=max_tokens, messages=[{"role": "user", "content": prompt}])\n        return r.content[0].text\n    elif provider == "gemini":\n        client = genai.Client()\n        r = client.models.generate_content(model="gemini-2.5-flash", contents=prompt)\n        return r.text\n\ndef compare_providers(prompt):\n    for provider in ["openai", "claude", "gemini"]:\n        answer = ask_llm(prompt, provider=provider)\n        print(f"{provider}: {answer}\\n")\n\ncompare_providers("What is a neural network in one sentence?")\nprint("DONE")',
  solutionExplanation: 'The function loops over all three providers, calls ask_llm() with each one, and prints the results. This is the simplest way to benchmark multiple providers on the same prompt.',
  xpReward: 20,
}
typescript
{
  type: 'exercise',
  id: 'llm-api-ex3',
  title: 'Exercise 3: Build a Mini Chat Loop',
  difficulty: 'intermediate',
  exerciseType: 'write',
  instructions: 'Complete the chat loop below. It should maintain conversation history by appending each user message and assistant response to the messages list. The loop processes 3 pre-defined questions and prints each response.',
  starterCode: 'from openai import OpenAI\nclient = OpenAI()\n\nmessages = [{"role": "system", "content": "You are a helpful ML tutor. Keep answers under 2 sentences."}]\n\nquestions = [\n    "What is supervised learning?",\n    "Give me one example.",\n    "How is it different from unsupervised learning?"\n]\n\nfor question in questions:\n    # 1. Append the user question to messages\n    # 2. Call the API with the full messages list\n    # 3. Extract the assistant response\n    # 4. Append the assistant response to messages\n    # 5. Print the response\n    pass  # Your code here\n\nprint(f"Total messages in history: {len(messages)}")\nprint("DONE")',
  testCases: [
    { id: 'tc1', input: '', expectedOutput: 'DONE', description: 'Should complete the chat loop and print DONE' },
    { id: 'tc2', input: '', expectedOutput: 'Total messages in history: 7', description: 'Should have 7 messages: 1 system + 3 user + 3 assistant' },
  ],
  hints: [
    'Inside the loop: messages.append({"role": "user", "content": question}), then call the API, then append the assistant response back to messages.',
    'response = client.chat.completions.create(model="gpt-4o", messages=messages, max_tokens=100)\nassistant_msg = response.choices[0].message.content\nmessages.append({"role": "assistant", "content": assistant_msg})',
  ],
  solution: 'from openai import OpenAI\nclient = OpenAI()\n\nmessages = [{"role": "system", "content": "You are a helpful ML tutor. Keep answers under 2 sentences."}]\n\nquestions = [\n    "What is supervised learning?",\n    "Give me one example.",\n    "How is it different from unsupervised learning?"\n]\n\nfor question in questions:\n    messages.append({"role": "user", "content": question})\n    response = client.chat.completions.create(\n        model="gpt-4o",\n        messages=messages,\n        max_tokens=100\n    )\n    assistant_msg = response.choices[0].message.content\n    messages.append({"role": "assistant", "content": assistant_msg})\n    print(f"Q: {question}")\n    print(f"A: {assistant_msg}\\n")\n\nprint(f"Total messages in history: {len(messages)}")\nprint("DONE")',
  solutionExplanation: 'Each loop adds the user question to the list, sends the full chat history to the API, and adds the reply. This gives the model context from past turns. After 3 rounds, we have 1 system + 3 user + 3 assistant = 7 messages.',
  xpReward: 20,
}

Complete Code

Click to expand the full script (copy-paste and run)
python
# Complete code from: LLM API Call in Python — OpenAI, Claude, and Gemini
# Requires: pip install openai anthropic google-genai python-dotenv
# Python 3.9+

import os
import time
from dotenv import load_dotenv

load_dotenv()

# --- Section 1: Environment Check ---
print("=== Environment Check ===")
print("OpenAI key loaded:", "sk-" in os.getenv("OPENAI_API_KEY", ""))
print("Anthropic key loaded:", "sk-ant" in os.getenv("ANTHROPIC_API_KEY", ""))
print("Google key loaded:", len(os.getenv("GOOGLE_API_KEY", "")) > 0)
print()

# --- Section 2: OpenAI API Call ---
from openai import OpenAI

openai_client = OpenAI()

response = openai_client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is an API? Explain in one sentence."}
    ]
)
print("=== OpenAI ===")
print(response.choices[0].message.content)
print(f"Tokens: {response.usage.total_tokens}")
print()

# --- Section 3: Claude API Call ---
from anthropic import Anthropic

anthropic_client = Anthropic()

response = anthropic_client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=256,
    system="You are a helpful assistant.",
    messages=[{"role": "user", "content": "What is an API? Explain in one sentence."}]
)
print("=== Claude ===")
print(response.content[0].text)
print(f"Tokens: {response.usage.input_tokens + response.usage.output_tokens}")
print()

# --- Section 4: Gemini API Call ---
from google import genai

gemini_client = genai.Client()

response = gemini_client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What is an API? Explain in one sentence."
)
print("=== Gemini ===")
print(response.text)
print()

# --- Section 5: Cost Calculator ---
def estimate_cost(input_tokens, output_tokens, provider="openai"):
    pricing = {
        "openai":  {"input": 2.50, "output": 10.00},
        "claude":  {"input": 3.00, "output": 15.00},
        "gemini":  {"input": 0.15, "output": 0.60},
    }
    rates = pricing[provider]
    return (input_tokens * rates["input"] + output_tokens * rates["output"]) / 1_000_000

print("=== Cost Comparison (100 input, 200 output tokens) ===")
for provider in ["openai", "claude", "gemini"]:
    cost = estimate_cost(100, 200, provider)
    print(f"{provider:>8}: ${cost:.6f} per call")
print()

# --- Section 6: Multi-Provider Wrapper ---
def ask_llm(prompt, provider="openai", system="You are a helpful assistant.",
            max_tokens=256, temperature=0.7):
    if provider == "openai":
        r = openai_client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "system", "content": system}, {"role": "user", "content": prompt}],
            max_tokens=max_tokens, temperature=temperature
        )
        return r.choices[0].message.content
    elif provider == "claude":
        r = anthropic_client.messages.create(
            model="claude-sonnet-4-20250514", max_tokens=max_tokens,
            system=system, messages=[{"role": "user", "content": prompt}],
            temperature=temperature
        )
        return r.content[0].text
    elif provider == "gemini":
        from google.genai import types
        r = gemini_client.models.generate_content(
            model="gemini-2.5-flash", contents=prompt,
            config=types.GenerateContentConfig(system_instruction=system, max_output_tokens=max_tokens, temperature=temperature)
        )
        return r.text

print("=== Multi-Provider Comparison ===")
for provider in ["openai", "claude", "gemini"]:
    answer = ask_llm("Explain overfitting in one sentence.", provider=provider)
    print(f"{provider:>8}: {answer}\n")

print("Script completed successfully.")

Frequently Asked Questions

Which provider has the best free tier for learning?

Google Gemini has the best free deal at 15 requests per minute with no upfront cost. OpenAI and Anthropic both give $5 in starter credit. For pure learning and tests, Gemini wins by a wide gap.

Can I use the same message format across all three APIs?

No. OpenAI puts system messages in the messages list. Claude takes it as its own field. Gemini uses a config object. The wrapper function in this post hides these gaps for you.

How do I handle long chats without burning through tokens?

Each API call sends the full chat log as input. As the chat grows, so does your bill. The fix: trim old messages and keep only the system prompt plus the last 10-20 turns. This is called a “sliding window.”

What happens if my API key gets leaked?

Revoke it right away from the provider’s dashboard. Make a new key. Check your billing page for odd charges. All three providers let you set spending caps — turn these on before you start.

Can I run these APIs from a Jupyter notebook?

Yes. All three SDKs work in Jupyter with no changes. Install them with !pip install openai anthropic google-genai python-dotenv, then run the same code from this guide.

References

  1. OpenAI — Platform API Docs and Quickstart. Link
  2. Anthropic — Claude API Docs. Link
  3. Google — Gemini API Docs. Link
  4. OpenAI — API Pricing. Link
  5. Anthropic — API Pricing. Link
  6. Google — Gemini API Pricing. Link
  7. OpenAI — openai-python GitHub Repo. Link
  8. Anthropic — anthropic-sdk-python GitHub Repo. Link

[SCHEMA HINTS]
– Article type: Tutorial / HowTo
– Primary technology: OpenAI API, Anthropic Claude API, Google Gemini API
– Programming language: Python
– Difficulty: Beginner
– Keywords: LLM API call Python, OpenAI API tutorial, Claude API Python, Gemini API Python, first LLM API call, AI API comparison

Free Course
Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course
Trusted by 50,000+ learners
Related Course
Master Gen AI — Hands-On
Join 5,000+ students at edu.machinelearningplus.com
Explore Course
Free Callback - Limited Slots
Not Sure Which Course to Start With?
Talk to our AI Counsellors and Practitioners. We'll help you clear all your questions for your background and goals, bridging the gap between your current skills and a career in AI.
10-digit mobile number
📞
Thank You!
We'll Call You Soon!
Our learning advisor will reach out within 24 hours.
(Check your inbox too — we've sent a confirmation)
⚡ Before you go

Python.
SQL. NumPy.
All free.

Get the exact 10-course programming foundation that Data Science professionals use.

🐍
Core Python — from first line to expert level
📈
NumPy & Pandas — the #1 libraries every DS job needs
🗃️
SQL Levels I–III — basics to Window Functions
📄
Real industry data — Jupyter notebooks included
R A M S K
57,000+ students
★★★★★ Rated 4.9/5
⚡ Before you go
Python. SQL.
All Free.
R A M S K
57,000+ students  ★★★★★ 4.9/5
Get Free Access Now
10 courses. Real projects. Zero cost. No credit card.
New learners enrolling right now
🔒 100% free ☕ No spam, ever ✓ Instant access
🚀
You're in!
Check your inbox for your access link.
(Check Promotions or Spam if you don't see it)
Or start your first course right now:
Start Free Course →
Scroll to Top
Scroll to Top
Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science