Menu

Introduction to LLMs and the OpenAI API in Python

Start using the OpenAI API Python SDK today. Build your first LLM-powered app with runnable code for chat completions, streaming, and token management.

Written by Selva Prabhakaran | 24 min read


A large language model (LLM) predicts the next word in a sequence based on patterns it learned from massive amounts of text. The OpenAI API lets you tap into these models — like GPT-4o — directly from Python, so you can generate text, hold multi-turn chats, and stream replies in real time.

You’ve typed a question into ChatGPT and watched it write back something that sounds genuinely smart. But how does that actually work behind the scenes? And more to the point — how do you bring that same power into your own Python projects?

That’s what I’ll cover in this post. By the end, you’ll know what LLMs really do under the hood, and you’ll have working Python code that calls the OpenAI API to create text, manage back-and-forth chats, and stream output live.

What Is a Large Language Model?

Strip it down to one sentence: an LLM is a program that guesses what word comes next. Feed it “The capital of France is” and it bets on “Paris.”

What makes it “large”? Sheer size. GPT-4 has billions of tunable weights — numbers the model fine-tuned by reading a massive sea of text. Books, blog posts, GitHub repos, forums — it has absorbed patterns from all corners of the internet.

Here’s the part most people miss at first: the model doesn’t “know” facts the way you do. What it has are statistical habits — it learned which words tend to show up after which other words, and in what context. When it gives a correct answer, that’s because the pattern was strong enough in its training data.

KEY INSIGHT: An LLM doesn’t look up facts in a database. It predicts the most likely next token based on learned patterns. Keeping this in mind helps you see where the model is strong (producing fluent text) and where it stumbles (exact factual recall).

How Do LLMs Work: Tokens, Transformers, and Attention?

You don’t need to code a transformer from zero to use the API. But peeking under the hood will help you see why some calls cost more, why token caps exist, and why the model sometimes “forgets” earlier parts of a long chat.

What Are Tokens? The Model’s Alphabet

LLMs don’t process whole words — they work with tokens. A token is a bite-sized piece of text, roughly 3–4 characters in English. The word “understanding” gets split into “under” + “standing.” The word “cat” is a single token.

Why does this matter to you? Two reasons: you pay per token, and every call has a cap on how many tokens fit. Longer prompts eat more tokens and run up your bill.

The good news is you can count tokens before you send anything. OpenAI’s tiktoken library encodes text into the same tokens the model sees. Use encoding_for_model() to pick the right encoding, then .encode() to get a list of token IDs.

python
import tiktoken

encoder = tiktoken.encoding_for_model("gpt-4o")
text = "What is machine learning?"
tokens = encoder.encode(text)
print(f"Text: {text}")
print(f"Tokens: {tokens}")
print(f"Token count: {len(tokens)}")

Five tokens for five words — a clean one-to-one match this time. But that ratio won’t always hold. Unusual or long words often get chopped into two or more tokens.

The Transformer Architecture (30-Second Tour)

The transformer is the neural network behind every modern LLM. Here’s a bird’s-eye view of how it works, step by step:

  1. Embedding — Each token turns into a vector — a list of numbers that captures meaning. “King” and “queen” end up near each other in this number space; “king” and “bicycle” end up far apart.

  2. Attention — The big idea. The model scans all tokens at once and asks: “Which other tokens matter most for guessing the next one?” In “The cat sat on the mat because it was tired,” attention links “it” to “cat,” not “mat.”

  3. Feed-forward layers — Once attention has mapped the key links, small neural nets at each spot extract deeper patterns.

  4. Stack and repeat — Steps 2 and 3 run many times in a row. Each round adds more depth to the model’s grasp of the input.

  5. Pick the next token — A final layer ranks every token in the vocab. The model grabs the top pick (or samples from the best few, based on settings you control).

UNDER THE HOOD: Attention looks at every token at once, so compute scales roughly with the square of input length. A 4,000-token prompt costs much more than a 1,000-token one. Writing shorter prompts is a direct way to cut your bill.

Now that you have the mental model, let’s get your setup ready and make a live API call.

How Do You Set Up the OpenAI Python SDK?

You need three things before you write any code: Python on your machine, the OpenAI package, and an API key.

Prerequisites

  • Python version: 3.9+
  • Required library: openai (1.0+)
  • Install: pip install openai tiktoken
  • Time to complete: 15-20 minutes

How Do You Get Your API Key?

Head over to platform.openai.com and create a fresh key. Copy it on the spot — you won’t get a second look at it.

Store it as an environment variable. Hard-coding keys into scripts is a recipe for trouble.

bash
# On macOS/Linux
export OPENAI_API_KEY="sk-your-key-here"

# On Windows (Command Prompt)
set OPENAI_API_KEY=sk-your-key-here

# On Windows (PowerShell)
$env:OPENAI_API_KEY="sk-your-key-here"

Another option: put the key in a .env file inside your project folder and load it with python-dotenv:

python
# .env file (add this to .gitignore!)
# OPENAI_API_KEY=sk-your-key-here

from dotenv import load_dotenv
load_dotenv()  # loads .env into environment variables

WARNING: Never commit API keys to git. Add .env to .gitignore. A leaked key means anyone can run calls on your account — and you pay for every one.

The SDK reads OPENAI_API_KEY from the environment for you. No need to type it into your code:

python
from openai import OpenAI

client = OpenAI()  # reads OPENAI_API_KEY from environment

Done. The client object is your gateway to every OpenAI model.

How Do You Make Your First API Call?

Here’s where it all comes together. You call client.chat.completions.create(), hand it a model name and a list of messages, and the model sends back a completion.

python
from openai import OpenAI

client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "What is Python used for?"}
    ]
)

print(response.choices[0].message.content)

You’ll get a slightly different reply each run — LLMs are not deterministic by default. Here’s what each part means:

  • model="gpt-4o-mini" — A quick, low-cost model. Ideal while you’re learning.
  • messages — A list of dicts, each carrying a role and content.
  • response.choices[0] — The API can return multiple options. We grab the first.
  • .message.content — The raw text the model wrote.

But there’s more inside the response object than just the reply text. Let’s dig in so you see what you get back — and what shows up on your bill.

python
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Say hello in French."}]
)

print(f"Model used: {response.model}")
print(f"Response text: {response.choices[0].message.content}")
print(f"Finish reason: {response.choices[0].finish_reason}")
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
FieldWhat It Means
response.modelThe exact model version that handled your request
choices[0].message.contentThe text the model generated
choices[0].finish_reasonWhy the model stopped: "stop" (natural end) or "length" (hit the token cap)
usage.prompt_tokensTokens in your input (you pay for these)
usage.completion_tokensTokens the model produced (you also pay for these)
usage.total_tokensBoth added together — this is your billing number

KEY INSIGHT: You pay for tokens on both sides — input and output. A 500-token prompt that gets a 500-token reply costs the same total as a 900-token prompt that gets a 100-token reply. Keep prompts lean, and set max_tokens when you don’t need long answers.

Quick check: Before you read on, guess what finish_reason would say if you set max_tokens=5 on a question that needs a full paragraph. (Answer: "length" — the model was forced to stop before it could finish.)

python
{
  type: 'exercise',
  id: 'first-api-call',
  title: 'Exercise 1: Make Your First API Call',
  difficulty: 'beginner',
  exerciseType: 'write',
  instructions: 'Use the OpenAI client to ask the model "Explain what an API is in one sentence." Print just the response text. Use the model "gpt-4o-mini".',
  starterCode: 'from openai import OpenAI\n\nclient = OpenAI()\n\n# Make your API call here\nresponse = client.chat.completions.create(\n    model="gpt-4o-mini",\n    messages=[\n        # Add your message here\n    ]\n)\n\n# Print the response content\nprint(___)',
  testCases: [
    { id: 'tc1', input: 'print("DONE")', expectedOutput: 'DONE', description: 'Code runs without errors', hidden: false },
    { id: 'tc2', input: 'print(type(response).__name__)', expectedOutput: 'ChatCompletion', description: 'Response is a ChatCompletion object', hidden: true },
  ],
  hints: [
    'The message should have role "user" and content "Explain what an API is in one sentence."',
    'Print response.choices[0].message.content to get the text.',
  ],
  solution: 'from openai import OpenAI\n\nclient = OpenAI()\n\nresponse = client.chat.completions.create(\n    model="gpt-4o-mini",\n    messages=[\n        {"role": "user", "content": "Explain what an API is in one sentence."}\n    ]\n)\n\nprint(response.choices[0].message.content)',
  solutionExplanation: 'We create a message with role "user" and pass it inside the messages list. The model returns a ChatCompletion object, and we access the generated text through response.choices[0].message.content.',
  xpReward: 15,
}

What Are Messages, Roles, and How Do You Control the Model?

The real power of the API lives in the messages list. Each message has a role — a label that tells the model who said what. Three roles cover nearly every use case:

system — Hidden stage directions. You set the tone, guardrails, and persona here. The end user never sees this message, but the model obeys it throughout the chat.

user — That’s you (or your app). It holds the question, prompt, or task you want a reply to.

assistant — Earlier replies from the model. Include these when you want a multi-turn chat so the model keeps context across turns.

Let me show you what a system message does in practice. We’ll instruct the model to play the role of a Python tutor that keeps answers short.

python
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "system",
            "content": "You are a Python tutor. Give short, clear answers in 2-3 sentences max."
        },
        {
            "role": "user",
            "content": "What's the difference between a list and a tuple?"
        }
    ]
)

print(response.choices[0].message.content)

Take that system message away and you’ll get a longer, less focused reply. With it, the model sticks to the rules you set.

TIP: System messages are the best lever you have for output quality. A sharp prompt like “You are a data analyst. Respond only with pandas code. No prose unless asked.” will crush a vague instruction every time.

What Do temperature, max_tokens, and top_p Control?

The create() call accepts a handful of knobs that change the way the model writes. I’ll focus on the three you’ll use most.

temperature is the randomness dial. It goes from 0 to 2. Low values produce tight, repeatable text. High values unlock more surprise and variety.

python
response_low = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Name a fruit."}],
    temperature=0
)
print(f"temp=0: {response_low.choices[0].message.content}")

response_high = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Name a fruit."}],
    temperature=1.5
)
print(f"temp=1.5: {response_high.choices[0].message.content}")

At zero, you’ll almost always see the same fruit. At 1.5, expect wild cards like “Rambutan” or “Persimmon.”

My rule of thumb: temperature=0 for factual jobs (code, data pulls) and 0.7–1.0 for creative jobs (brainstorming, copy).

max_tokens puts a hard limit on how long the reply can be. If the model runs out of room, it stops mid-thought. The telltale sign is finish_reason = "length" instead of "stop".

top_p (nucleus sampling) is a second knob for randomness. It tells the model to only consider the top P% of likely tokens. top_p=0.1 means just the top 10%. Pick one — either temperature or top_p. OpenAI’s advice is to tweak one and leave the other at its default.

python
{
  type: 'exercise',
  id: 'system-message-exercise',
  title: 'Exercise 2: Craft a System Message',
  difficulty: 'beginner',
  exerciseType: 'write',
  instructions: 'Create a system message that instructs the model to act as a SQL expert who responds only with SQL queries (no explanations). Then ask it to write a query that selects all users older than 25 from a "users" table. Print the response.',
  starterCode: 'from openai import OpenAI\n\nclient = OpenAI()\n\nresponse = client.chat.completions.create(\n    model="gpt-4o-mini",\n    messages=[\n        {\n            "role": "system",\n            "content": ___  # Your system message here\n        },\n        {\n            "role": "user",\n            "content": ___  # Your question here\n        }\n    ]\n)\n\nprint(response.choices[0].message.content)',
  testCases: [
    { id: 'tc1', input: 'print("DONE")', expectedOutput: 'DONE', description: 'Code runs without errors', hidden: false },
    { id: 'tc2', input: 'print(len(response.choices[0].message.content) > 10)', expectedOutput: 'True', description: 'Response contains SQL content', hidden: true },
  ],
  hints: [
    'Set the system message to something like "You are a SQL expert. Respond only with SQL queries, no explanations."',
    'Set the user content to "Write a query to select all users older than 25 from a users table."',
  ],
  solution: 'from openai import OpenAI\n\nclient = OpenAI()\n\nresponse = client.chat.completions.create(\n    model="gpt-4o-mini",\n    messages=[\n        {\n            "role": "system",\n            "content": "You are a SQL expert. Respond only with SQL queries, no explanations."\n        },\n        {\n            "role": "user",\n            "content": "Write a query to select all users older than 25 from a users table."\n        }\n    ]\n)\n\nprint(response.choices[0].message.content)',
  solutionExplanation: 'The system message constrains the model to behave as a SQL expert and return only queries. The model responds with a SQL SELECT statement filtering by age > 25.',
  xpReward: 15,
}

How Do You Build Multi-Turn Conversations?

Here’s something that surprises many people: every API call starts with a blank slate. The model has zero memory of what you asked before. To create a conversation, you must ship the entire chat log in messages on every call.

The recipe: after the model replies, tack its response onto the list, add the user’s next message, and send the whole thing again. Let me show you.

python
conversation = [
    {
        "role": "system",
        "content": "You are a helpful cooking assistant."
    },
    {
        "role": "user",
        "content": "How do I make scrambled eggs?"
    }
]

# First turn
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=conversation
)
assistant_reply = response.choices[0].message.content
print(f"Assistant: {assistant_reply}\n")

# Add the assistant's reply to history
conversation.append({"role": "assistant", "content": assistant_reply})

# Second turn — the model now has context
conversation.append({"role": "user", "content": "What cheese goes best with that?"})

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=conversation
)
print(f"Assistant: {response.choices[0].message.content}")

That second reply works because the model sees the whole thread above it. It can tell “that” refers to scrambled eggs. Strip the history and “that” means nothing. This is one of the trickiest parts of the API — you manage the chat state. Drop a message, and the model loses its place.

Quick thought experiment: What if you deleted the assistant entry before the second call? The model would have no clue which dish you mean, so you’d get a generic cheese tip instead of one matched to eggs.

WARNING: Every message in the history eats tokens. A 50-turn chat ships all 50 messages on each new call. For long chats, you’ll need to prune old entries or condense them into a summary so you don’t blow through the token cap.

How Do Streaming and Error Handling Work?

Before you ship anything to users, you need two more skills: streaming (so the UI feels fast) and error handling (so your app doesn’t blow up when the API hiccups).

How Does Streaming Work?

Without streaming, the API holds the full reply until the model is done writing. For long answers, that wait feels slow. Streaming fixes this — it sends tokens to you the moment they’re ready, just like the ChatGPT typing effect.

All you do is set stream=True and loop over the result. Each chunk carries a tiny slice of the output.

python
stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "Write a haiku about Python programming."}
    ],
    stream=True
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content is not None:
        print(content, end="", flush=True)

print()  # newline at the end

Notice: it’s delta.content here, not message.content. The delta only carries what’s new since the last chunk. Some chunks have None (just metadata), which is why we check before printing.

Does streaming save money? No — same tokens, same price. But the user’s feel is night and day. I turn on streaming for every user-facing feature. Text that flows in word by word feels alive; a three-second blank screen followed by a wall of text feels broken.

How Should You Handle API Errors?

The SDK raises typed exceptions, which makes catching errors clean. These are the four you’ll see most often:

python
from openai import (
    OpenAI,
    AuthenticationError,
    RateLimitError,
    APIConnectionError,
    BadRequestError
)

client = OpenAI()

try:
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello"}]
    )
    print(response.choices[0].message.content)

except AuthenticationError:
    print("Invalid API key. Check your OPENAI_API_KEY.")

except RateLimitError:
    print("Rate limit hit. Wait a moment and retry.")

except APIConnectionError:
    print("Can't reach OpenAI servers. Check your internet.")

except BadRequestError as e:
    print(f"Bad request: {e}")
ErrorCommon CauseFix
AuthenticationErrorWrong or expired API keyRegenerate key at platform.openai.com
RateLimitErrorToo many requests per minuteAdd retry logic with exponential backoff
APIConnectionErrorNetwork issue or OpenAI outageCheck connectivity, retry after a few seconds
BadRequestErrorInvalid model name, too many tokensCheck your parameters against the docs

In production, you want auto-retries. The tenacity library wraps exponential backoff in a single decorator:

python
# pip install tenacity
from tenacity import retry, wait_exponential, stop_after_attempt

@retry(wait=wait_exponential(min=1, max=60), stop=stop_after_attempt(3))
def call_openai(messages):
    return client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages
    )

That decorator retries up to 3 times with gaps that double each round (1s, 2s, 4s, capped at 60s). Most rate limit blips clear in seconds.

How Do You Choose the Right Model?

OpenAI has a lineup of models. Which one fits depends on what you’re building, how much you want to spend, and how fast you need a reply.

ModelBest ForSpeedCostContext Window
gpt-4o-miniPrototyping, simple tasks, high volumeFastLowest128K tokens
gpt-4oComplex reasoning, coding, analysisMediumModerate128K tokens
gpt-4.1Production apps, nuanced tasksMediumHigher1M tokens
gpt-4.1-miniBalanced performance and costFastLow1M tokens

My advice for beginners: start on gpt-4o-mini. It’s cheap, it’s quick, and it handles most learning tasks with ease. Move to a bigger model once you need deeper reasoning or a wider context window.

OpenAI also offers embeddings (text → numeric vectors for search) and image generation (DALL-E). Those hit different endpoints but share the same SDK and auth you set up above. We’ll tackle them in future posts.

NOTE: OpenAI rolled out the Responses API in 2025 as a simpler option alongside Chat Completions. Here’s a side-by-side look:

FeatureChat CompletionsResponses API
Input formatmessages array (roles required)input string or messages
Output accessresponse.choices[0].message.contentresponse.output_text
Built-in toolsNoneWeb search, file search, code interpreter
Multi-turn stateManual (pass full history)Automatic with store=True
Best forLearning, existing integrationsNew projects, agentic workflows

My take: learn Chat Completions first. It’s what most tutorials, LangChain libraries, and live codebases rely on right now. But keep an eye on the Responses API — that’s the direction OpenAI is moving.

WARNING: When NOT to use the OpenAI API. The API isn’t right for every job. Skip it when: (1) you need replies in under 100ms — API round trips add 500ms–2s at minimum; (2) your data can’t leave your network — look at self-hosted models like Llama or Mistral instead; (3) you need guaranteed identical output — even at temperature=0, replies can shift slightly between API versions; (4) cost matters at huge scale — millions of calls per day may make hosting your own model cheaper.

What Are the Most Common Mistakes (and How Do You Fix Them)?

Mistake 1: Forgetting to Set the API Key

python
# ❌ Wrong — no API key configured
from openai import OpenAI
client = OpenAI(api_key="")  # empty string
python
AuthenticationError: No API key provided.
python
# ✅ Correct — set the environment variable first
# export OPENAI_API_KEY="sk-your-actual-key"
from openai import OpenAI
client = OpenAI()  # reads from environment

Mistake 2: Using the Old SDK Syntax

The SDK got a ground-up rewrite at version 1.0. Any tutorial written before late 2023 uses the old API and won’t run with the current package.

python
# ❌ Old syntax (pre v1.0) — this will error
import openai
openai.api_key = "sk-..."
response = openai.ChatCompletion.create(...)
python
# ✅ Current syntax (v1.0+)
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(...)

If you see AttributeError: module 'openai' has no attribute 'ChatCompletion', you’re mixing old code with the new SDK.

Mistake 3: Ignoring Token Limits

Send a 200,000-token prompt to a model with a 128K window and you’ll get a BadRequestError. Check your input size before hitting send.

python
# ✅ Check token count before sending
import tiktoken

def count_tokens(messages, model="gpt-4o-mini"):
    """Approximate token count for a messages list."""
    encoder = tiktoken.encoding_for_model(model)
    total = 0
    for msg in messages:
        # +4 accounts for message formatting overhead
        total += len(encoder.encode(msg["content"])) + 4
    return total

Mistake 4: Not Handling Streaming Correctly

python
# ❌ Wrong — treating stream like a regular response
stream = client.chat.completions.create(..., stream=True)
print(stream.choices[0].message.content)  # AttributeError!
python
# ✅ Correct — iterate over chunks
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

How Do You Put It All Together? A Code Review Assistant

Time to tie it all together in a hands-on project. We’ll build an assistant that takes a Python function, analyzes it, and flags improvements.

Below, review_code() sends a system message that casts the model as a senior Python dev. The user message carries the code. I set temperature=0.3 for sharp, focused feedback and cap output at 500 tokens to keep reviews concise.

python
from openai import OpenAI

client = OpenAI()

def review_code(code_snippet):
    """Send a code snippet to GPT for review."""
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a senior Python developer. "
                    "Review the code for bugs, style issues, "
                    "and performance. Be specific and concise."
                )
            },
            {
                "role": "user",
                "content": f"Review this Python code:\n\n{code_snippet}"
            }
        ],
        temperature=0.3,
        max_tokens=500
    )
    return response.choices[0].message.content

Now let’s feed it a function that has a few deliberate flaws — range(len()) instead of direct iteration, no empty-list guard, and a missed sum() builtin:

python
sample_code = """
def get_avg(numbers):
    total = 0
    for i in range(len(numbers)):
        total = total + numbers[i]
    avg = total / len(numbers)
    return avg
"""

review = review_code(sample_code)
print(review)

The model catches all three problems — the clunky loop, the missing empty-list check, and the ignored sum() builtin. Twenty lines of code and you have a working review tool. That’s the payoff of pairing a solid system message with the right temperature and a focused prompt.

Summary

Let me recap what you now know:

  • LLMs guess the next token by drawing on patterns from training data — they don’t “understand” the way a human does.
  • Tokens are the unit of both cost and capacity. You pay per token, and each model has a context window cap.
  • The OpenAI Python SDK gives you a single call — client.chat.completions.create() — to interact with any model.
  • Messages use three roles: system (rules), user (input), assistant (past replies).
  • Temperature tunes randomness. Set it to 0 for exact tasks, 0.7–1.0 for creative work.
  • Multi-turn chats work by shipping the full history on each call.
  • Streaming pushes tokens to the user as they arrive, making the app feel fast.
  • Always add error handling with retries before going to production.

Practice Exercise

Build a terminal-based chatbot that keeps context alive across turns. The user types, the bot replies, and the thread grows with each exchange.

Click to see the solution
python
from openai import OpenAI

client = OpenAI()

conversation = [
    {"role": "system", "content": "You are a helpful assistant. Keep answers concise."}
]

print("Chatbot ready! Type 'quit' to exit.\n")

while True:
    user_input = input("You: ")
    if user_input.lower() == "quit":
        break

    conversation.append({"role": "user", "content": user_input})

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=conversation
    )

    reply = response.choices[0].message.content
    conversation.append({"role": "assistant", "content": reply})
    print(f"Bot: {reply}\n")

This exercise puts the multi-turn pattern into practice. The conversation list expands on every exchange, feeding the model the full thread each time.

Complete Code

Click to expand the full script (copy-paste and run)
python
# Complete code from: Introduction to LLMs and the OpenAI API in Python
# Requires: pip install openai tiktoken
# Python 3.9+

from openai import OpenAI
import tiktoken

# --- Section 1: Token Counting ---
encoder = tiktoken.encoding_for_model("gpt-4o")
text = "What is machine learning?"
tokens = encoder.encode(text)
print(f"Text: {text}")
print(f"Tokens: {tokens}")
print(f"Token count: {len(tokens)}")

# --- Section 2: First API Call ---
client = OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "What is Python used for?"}
    ]
)
print(f"\nFirst call: {response.choices[0].message.content}")

# --- Section 3: Response Object ---
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Say hello in French."}]
)
print(f"\nModel: {response.model}")
print(f"Text: {response.choices[0].message.content}")
print(f"Finish reason: {response.choices[0].finish_reason}")
print(f"Total tokens: {response.usage.total_tokens}")

# --- Section 4: System Message ---
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {
            "role": "system",
            "content": "You are a Python tutor. Give short, clear answers in 2-3 sentences max."
        },
        {
            "role": "user",
            "content": "What's the difference between a list and a tuple?"
        }
    ]
)
print(f"\nWith system message: {response.choices[0].message.content}")

# --- Section 5: Temperature ---
response_low = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Name a fruit."}],
    temperature=0
)
print(f"\ntemp=0: {response_low.choices[0].message.content}")

# --- Section 6: Multi-Turn Conversation ---
conversation = [
    {"role": "system", "content": "You are a helpful cooking assistant."},
    {"role": "user", "content": "How do I make scrambled eggs?"}
]

response = client.chat.completions.create(
    model="gpt-4o-mini", messages=conversation
)
assistant_reply = response.choices[0].message.content
print(f"\nAssistant: {assistant_reply}")

conversation.append({"role": "assistant", "content": assistant_reply})
conversation.append({"role": "user", "content": "What cheese goes best with that?"})

response = client.chat.completions.create(
    model="gpt-4o-mini", messages=conversation
)
print(f"Assistant: {response.choices[0].message.content}")

# --- Section 7: Streaming ---
print("\nStreaming response:")
stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": "Write a haiku about Python programming."}
    ],
    stream=True
)
for chunk in stream:
    content = chunk.choices[0].delta.content
    if content is not None:
        print(content, end="", flush=True)
print()

# --- Section 8: Code Review Assistant ---
def review_code(code_snippet):
    """Send a code snippet to GPT for review."""
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a senior Python developer. "
                    "Review the code for bugs, style issues, "
                    "and performance. Be specific and concise."
                )
            },
            {
                "role": "user",
                "content": f"Review this Python code:\n\n{code_snippet}"
            }
        ],
        temperature=0.3,
        max_tokens=500
    )
    return response.choices[0].message.content

sample_code = """
def get_avg(numbers):
    total = 0
    for i in range(len(numbers)):
        total = total + numbers[i]
    avg = total / len(numbers)
    return avg
"""

print(f"\nCode Review:\n{review_code(sample_code)}")

print("\nScript completed successfully.")

Frequently Asked Questions

How much does the OpenAI API cost?

Pricing varies by model and token volume. In 2025, gpt-4o-mini costs roughly \(0.15 per million input tokens and \)0.60 per million output tokens. A hands-on learning session with a few dozen calls runs just a few cents. See openai.com/pricing for current numbers.

What’s the difference between ChatGPT and the OpenAI API?

ChatGPT is a consumer app with a web UI. The API opens up the same models to your code. You can set system messages, tweak parameters, manage the chat thread, and wire the model into your own software — things the ChatGPT interface doesn’t allow.

Do I need a GPU to use the OpenAI API?

Nope. The heavy math runs on OpenAI’s servers. Your machine just ships text out and gets text back. A laptop that can run Python and hit the web is all you need.

Can I use other LLM providers with the same code?

Yes — many providers (Anthropic, Google, Mistral, local models via Ollama) expose endpoints that match OpenAI’s format. Most of the time you just swap the base URL and key. Libraries like LiteLLM wrap all of them behind one unified call.

References

  1. OpenAI Platform — API Reference: Chat Completions. Link
  2. OpenAI Platform — Developer Quickstart. Link
  3. OpenAI — Responses API vs Chat Completions Guide. Link
  4. Vaswani, A. et al. — “Attention Is All You Need.” NeurIPS 2017. arXiv
  5. OpenAI Python SDK — GitHub Repository. Link
  6. OpenAI — Tokenizer and tiktoken Library. Link
  7. OpenAI — Model Pricing. Link
  8. OpenAI — API Data Usage Policies. Link
Free Course
Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course
Trusted by 50,000+ learners
Related Course
Master Gen AI — Hands-On
Join 5,000+ students at edu.machinelearningplus.com
Explore Course
Free Callback - Limited Slots
Not Sure Which Course to Start With?
Talk to our AI Counsellors and Practitioners. We'll help you clear all your questions for your background and goals, bridging the gap between your current skills and a career in AI.
10-digit mobile number
📞
Thank You!
We'll Call You Soon!
Our learning advisor will reach out within 24 hours.
(Check your inbox too — we've sent a confirmation)
⚡ Before you go

Python.
SQL. NumPy.
All free.

Get the exact 10-course programming foundation that Data Science professionals use.

🐍
Core Python — from first line to expert level
📈
NumPy & Pandas — the #1 libraries every DS job needs
🗃️
SQL Levels I–III — basics to Window Functions
📄
Real industry data — Jupyter notebooks included
R A M S K
57,000+ students
★★★★★ Rated 4.9/5
⚡ Before you go
Python. SQL.
All Free.
R A M S K
57,000+ students  ★★★★★ 4.9/5
Get Free Access Now
10 courses. Real projects. Zero cost. No credit card.
New learners enrolling right now
🔒 100% free ☕ No spam, ever ✓ Instant access
🚀
You're in!
Check your inbox for your access link.
(Check Promotions or Spam if you don't see it)
Or start your first course right now:
Start Free Course →
Scroll to Top
Scroll to Top
Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science