machine learning +
OpenAI API Python Tutorial – A Complete Crash Course
Build an AI Chatbot with Memory in Python (Step-by-Step)
Learn to build a Python AI chatbot with conversation memory using the OpenAI API. Covers token management, streaming, memory strategies, and saving chat history.
Build a Python chatbot that truly recalls your chat — from first API call to a full assistant with smart memory.
You ask your chatbot a question. It gives a great answer. You ask a follow-up — and it has no clue what you just said. Sound like your day?
That’s because LLMs are stateless. Each API call starts fresh with no memory at all. Your chatbot doesn’t “forget” — it never knew in the first place.
In this guide, you’ll build a chatbot from scratch with Python and the OpenAI API. First, you’ll make a basic bot. Then you’ll see why it fails at real talks. After that, you’ll add true memory.
By the end, you’ll have a working bot that holds multi-turn chats. It will track token costs, stream replies, and save chat logs to disk.
What Is a Conversational AI Chatbot?
An AI chatbot is a program that uses a large language model (LLM) to give human-like replies in a back-and-forth chat. Old-school bots match keywords to scripted lines. AI chatbots read context and craft fresh replies on the fly.
The hard part? Keeping the thread going. One question and one answer is simple. But a clear chat across dozens of messages — that takes memory.
Here’s how every AI chatbot works at its core:
- User sends a message — your Python code captures the input
- Your code packages the message — along with conversation history and a system prompt
- The LLM processes everything — and generates a response
- Your code stores the exchange — adding both messages to memory
- Repeat — each new message includes the growing history
Key Insight: LLMs have no built-in memory. They are stateless — input goes in, output comes out, nothing stays. “Memory” in a chatbot is just your code sending old messages along with each new one.
Setting Up Your Environment
You need Python 3.9+ and an OpenAI API key. Let’s get the tools in place.
bash
pip install openai tiktoken python-dotenv
openai— the OpenAI Python SDKtiktoken— counts tokens so you can track costspython-dotenv— loads API keys from a.envfile (keeps secrets out of code)
Make a .env file in your project folder with your API key:
bash
# .env
OPENAI_API_KEY=sk-your-api-key-here
Warning: Never put API keys right in your Python files. If you push code to GitHub with a visible key, bots will find it fast and run up charges. Always use a `.env` file or shell variables.
Now load your key and make the OpenAI client. You’ll use this setup code for the rest of the guide:
python
import os
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
Output:
python
# No output — the client is ready to use
No errors? Good — your setup works. The client object talks to the API for you.
Build a Basic Chatbot (Without Memory)
Let’s start with the simplest chatbot — one that answers one question. This shows the core API pattern.
The Chat API takes a list of messages. Each message has a role and content. The role is "system", "user", or "assistant". The system message sets the bot’s tone. The user message is your question.
python
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful Python tutor."},
{"role": "user", "content": "What is a list comprehension?"}
]
)
print(response.choices[0].message.content)
Output:
python
A list comprehension is a concise way to create lists in Python. Instead of
writing a for loop to build a list, you write the entire operation in a single
line. The syntax is: [expression for item in iterable if condition]. For
example, [x**2 for x in range(5)] produces [0, 1, 4, 9, 16].
That works great for a one-shot question. But watch what happens when we try a conversation.
The Memory Problem
Let’s ask two related questions — a question and then a follow-up:
python
# First question
response1 = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful Python tutor."},
{"role": "user", "content": "What is a list comprehension?"}
]
)
print("Q1:", response1.choices[0].message.content[:100])
# Follow-up question
response2 = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful Python tutor."},
{"role": "user", "content": "Can you show me a more complex example of that?"}
]
)
print("Q2:", response2.choices[0].message.content[:100])
Output:
python
Q1: A list comprehension is a concise way to create lists in Python. Instead of writing a for loop...
Q2: Sure! Could you clarify what specific topic you'd like a more complex example of? I'd be happy...
The second reply has no idea what “that” means. Each API call stands alone. The model only sees what you send in that one request.
This is the core problem. The fix is simple: send the chat history with every request.
Add Conversation Memory
The simplest chatbot memory is a Python list. You store every message in this list — both user and bot replies. Then you send the whole list with each API call. The model sees the full chat.
Here’s a chatbot with basic memory. The chat() function adds user input to the list, sends it all to the API, and saves the reply:
python
def create_chatbot(system_prompt="You are a helpful assistant."):
messages = [{"role": "system", "content": system_prompt}]
def chat(user_input):
messages.append({"role": "user", "content": user_input})
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages
)
assistant_reply = response.choices[0].message.content
messages.append({"role": "assistant", "content": assistant_reply})
return assistant_reply
return chat, messages
Output:
python
# No output — function defined
This uses a closure — the messages list lives on between calls to chat(). Let’s test it with the same two questions:
python
chat, history = create_chatbot("You are a helpful Python tutor.")
print("Q1:", chat("What is a list comprehension?"))
print()
print("Q2:", chat("Can you show me a more complex example of that?"))
Output:
python
Q1: A list comprehension is a concise way to create lists in Python. The syntax
is [expression for item in iterable if condition]. For example, [x**2 for x in
range(5)] gives you [0, 1, 4, 9, 16].
Q2: Here is a more complex list comprehension that flattens a matrix (list of
lists) into a single list:
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [num for row in matrix for num in row]
# Result: [1, 2, 3, 4, 5, 6, 7, 8, 9]
This uses a nested comprehension — the outer loop iterates over rows, and the
inner loop iterates over numbers within each row.
Now the bot recalls the first question. The follow-up gets a clear answer because the full history rides along with each API call.
Key Insight: Chat memory is just a Python list. You add each message, send the full list each time, and the model acts like it “recalls” the chat. No magic here — your code holds the state that the API can’t.
How Memory Grows
Let’s look inside the messages list to see what the API gets:
python
for i, msg in enumerate(history):
role = msg["role"].upper()
preview = msg["content"][:60]
print(f"[{i}] {role}: {preview}...")
Output:
python
[0] SYSTEM: You are a helpful Python tutor....
[1] USER: What is a list comprehension?...
[2] ASSISTANT: A list comprehension is a concise way to create lists in...
[3] USER: Can you show me a more complex example of that?...
[4] ASSISTANT: Here is a more complex list comprehension that flattens a ...
Each exchange adds two messages — one from you, one from the bot. After 50 exchanges, you’ll have 101 messages. That’s a lot of tokens. And tokens cost real money.
Build an Interactive Chat Loop
Let’s wrap this into a proper chatbot you can run in the terminal:
python
def run_chatbot():
print("AI Chatbot (type 'quit' to exit)")
print("-" * 40)
chat, _ = create_chatbot(
"You are a friendly and helpful assistant. "
"Keep responses concise — under 3 sentences when possible."
)
while True:
user_input = input("\nYou: ").strip()
if user_input.lower() in ("quit", "exit", "q"):
print("Goodbye!")
break
if not user_input:
continue
response = chat(user_input)
print(f"\nAssistant: {response}")
run_chatbot()
Output:
python
AI Chatbot (type 'quit' to exit)
----------------------------------------
You: What is the capital of France?
Assistant: The capital of France is Paris.
You: How many people live there?
Assistant: About 2.1 million people live in Paris proper. The greater Paris
metropolitan area has around 12 million residents.
You: quit
Goodbye!
You now have a working chatbot with memory. But there’s a hidden trap — it’ll bite you after about 20 to 30 exchanges.
Handle the Token Limit Problem
Each message you send costs tokens. As your chat grows, you send more tokens per call. Soon you’ll hit two walls: the model’s context limit and your budget.
Let’s build a token counter. You’ll see how many tokens each call uses. The tiktoken library counts tokens the same way OpenAI does — exact counts, no guessing.
python
import tiktoken
def count_tokens(messages, model="gpt-4o-mini"):
"""Count the exact number of tokens in a message list."""
encoding = tiktoken.encoding_for_model(model)
token_count = 0
for message in messages:
token_count += 4 # every message has overhead tokens
for key, value in message.items():
token_count += len(encoding.encode(value))
token_count += 2 # reply priming tokens
return token_count
Output:
python
# No output — function defined
Now let’s watch how token use grows as chats get longer. We’ll fake a 10-turn chat and count tokens at each step:
python
sample_messages = [
{"role": "system", "content": "You are a helpful assistant."},
]
# Simulate 10 exchanges
for i in range(10):
sample_messages.append(
{"role": "user", "content": f"Tell me fact number {i+1} about Python."}
)
sample_messages.append(
{"role": "assistant", "content": f"Here is fact {i+1}: Python was "
f"created by Guido van Rossum and released in 1991. "
f"It emphasizes code readability and simplicity."}
)
tokens = count_tokens(sample_messages)
print(f"After exchange {i+1:2d}: {len(sample_messages):3d} messages, {tokens:5d} tokens")
Output:
python
After exchange 1: 3 messages, 68 tokens
After exchange 2: 5 messages, 115 tokens
After exchange 3: 7 messages, 162 tokens
After exchange 4: 9 messages, 209 tokens
After exchange 5: 11 messages, 256 tokens
After exchange 6: 13 messages, 303 tokens
After exchange 7: 15 messages, 350 tokens
After exchange 8: 17 messages, 397 tokens
After exchange 9: 19 messages, 444 tokens
After exchange 10: 21 messages, 491 tokens
Token use grows in a straight line. In a real chat with longer replies, you can hit thousands of tokens in 15-20 turns. GPT-4o-mini takes up to 128K tokens, but you pay for each one.
Tip: Always track token use in live chatbots. Add a counter that warns you when a chat nears your budget. This stops surprise bills from long chats.
Add a Token Budget
Let’s update our chatbot to enforce a token budget. When the conversation gets too long, we’ll trim the oldest messages while keeping the system prompt:
python
def create_chatbot_with_budget(system_prompt, token_budget=4000):
messages = [{"role": "system", "content": system_prompt}]
def trim_history():
"""Remove oldest messages (except system) to stay within budget."""
while count_tokens(messages) > token_budget and len(messages) > 2:
messages.pop(1) # remove oldest non-system message
def chat(user_input):
messages.append({"role": "user", "content": user_input})
trim_history()
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages
)
assistant_reply = response.choices[0].message.content
messages.append({"role": "assistant", "content": assistant_reply})
return assistant_reply
return chat, messages
Output:
python
# No output — function defined
The trim_history() function drops the oldest non-system message one by one. It stops when the token count falls below the budget. Simple but solid — the bot keeps recent context and “forgets” old turns.
Warning: Popping old messages is a one-way door — that context is gone for good. If the user asks about something from 20 turns ago, the bot won’t know. For chats where old context matters, try the summary method up next.
Memory Strategies: Window, Summary, and Hybrid
The “send it all” method breaks down in long chats. Here are three smarter ways to handle memory. Each trades off context quality against token cost.
Strategy 1: Sliding Window Memory
Keep only the last N turns. Drop older ones. This is the simplest plan and works great for most bots.
python
def create_windowed_chatbot(system_prompt, window_size=10):
"""Keep only the last `window_size` exchanges in memory."""
full_history = [{"role": "system", "content": system_prompt}]
def chat(user_input):
full_history.append({"role": "user", "content": user_input})
# Build windowed context: system + last N exchanges
windowed = [full_history[0]] # always keep system prompt
recent = full_history[1:][-window_size * 2:] # last N exchanges
windowed.extend(recent)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=windowed
)
assistant_reply = response.choices[0].message.content
full_history.append({"role": "assistant", "content": assistant_reply})
return assistant_reply
return chat, full_history
Output:
python
# No output — function defined
With window_size=10, the bot sends the system prompt plus the last 10 turns (20 messages). Token use stays flat no matter how long the chat runs.
Strategy 2: Summary Memory
Don’t drop old messages — sum them up instead. The bot asks the LLM to squeeze the old chat into a short recap. That recap then stands in for all the older messages.
python
def summarize_messages(messages_to_summarize):
"""Use the LLM to summarize a list of messages."""
conversation_text = "\n".join(
f"{m['role']}: {m['content']}" for m in messages_to_summarize
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{
"role": "user",
"content": f"Summarize this conversation in 2-3 sentences. "
f"Focus on key facts and decisions:\n\n"
f"{conversation_text}"
}]
)
return response.choices[0].message.content
def create_summary_chatbot(system_prompt, max_messages=20):
"""Chatbot that summarizes old messages when history gets long."""
messages = [{"role": "system", "content": system_prompt}]
summary = ""
def chat(user_input):
nonlocal summary
messages.append({"role": "user", "content": user_input})
# Summarize when history gets too long
if len(messages) > max_messages:
old_messages = messages[1:max_messages - 4]
summary = summarize_messages(old_messages)
# Keep system prompt + summary + recent messages
recent = messages[max_messages - 4:]
messages.clear()
messages.append({"role": "system", "content": system_prompt})
messages.append({
"role": "system",
"content": f"Previous conversation summary: {summary}"
})
messages.extend(recent)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages
)
assistant_reply = response.choices[0].message.content
messages.append({"role": "assistant", "content": assistant_reply})
return assistant_reply
return chat, messages
Output:
python
# No output — function defined
Summary memory saves the gist without the token cost. The catch? You lose fine details. The bot knows “the user asked about Python lists” but may forget the exact code shown.
Strategy 3: Hybrid Memory (Best of Both)
Mix window and summary memory. Keep recent messages in full, and sum up the rest. You get both sharp detail (recent turns) and broad context (old turns).
python
def create_hybrid_chatbot(system_prompt, window_size=8, summary_threshold=20):
"""Combine summary of old messages with a window of recent ones."""
all_messages = [{"role": "system", "content": system_prompt}]
conversation_summary = ""
def chat(user_input):
nonlocal conversation_summary
all_messages.append({"role": "user", "content": user_input})
if len(all_messages) > summary_threshold:
old = all_messages[1:-window_size * 2]
if old:
conversation_summary = summarize_messages(old)
keep = all_messages[-window_size * 2:]
all_messages.clear()
all_messages.append({"role": "system", "content": system_prompt})
all_messages.extend(keep)
# Build context with summary + recent window
context = [all_messages[0]] # system prompt
if conversation_summary:
context.append({
"role": "system",
"content": f"Earlier conversation summary: {conversation_summary}"
})
context.extend(all_messages[1:])
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=context
)
assistant_reply = response.choices[0].message.content
all_messages.append({"role": "assistant", "content": assistant_reply})
return assistant_reply
return chat, all_messages
Output:
python
# No output — function defined
Comparing Memory Strategies
Here’s when to use each strategy:
| Strategy | Token Usage | Context Quality | Best For |
|---|---|---|---|
| Full History | Grows linearly | Perfect recall | Short chats (under 20 exchanges) |
| Sliding Window | Constant | Recent only | Customer support, quick Q&A |
| Summary | Grows slowly | Approximate | Long research sessions |
| Hybrid | Moderate | Good balance | Production chatbots, complex tasks |
Key Insight: There’s no one “best” memory plan. The right pick depends on chat length and how much old context matters. For most bots, a sliding window of 10-15 turns hits the sweet spot.
typescript
{
type: 'exercise',
id: 'chatbot-memory-ex1',
title: 'Exercise 1: Build a Chatbot with Windowed Memory',
difficulty: 'beginner',
exerciseType: 'write',
instructions: 'Complete the `windowed_chat` function below. It should keep only the last 3 exchanges (6 messages) plus the system prompt. Use the provided `messages` list and `window_size` variable. After adding the user message, build a windowed list and print the number of messages being sent.',
starterCode: 'messages = [\n {"role": "system", "content": "You are a helpful assistant."}\n]\nwindow_size = 3\n\ndef windowed_chat(user_msg):\n messages.append({"role": "user", "content": user_msg})\n # Build windowed context: system + last window_size exchanges\n windowed = [messages[0]]\n recent = messages[1:] # FIX THIS LINE to keep only last window_size*2 messages\n windowed.extend(recent)\n # Simulate assistant reply\n messages.append({"role": "assistant", "content": f"Reply to: {user_msg}"})\n return len(windowed)\n\n# Simulate 5 exchanges\nfor i in range(5):\n count = windowed_chat(f"Question {i+1}")\n print(f"Exchange {i+1}: sending {count} messages")',
testCases: [
{ id: 'tc1', input: '', expectedOutput: 'Exchange 5: sending 7', description: 'After 5 exchanges with window=3, should send 7 messages (1 system + 6 recent)' }
],
hints: [
'Slice the non-system messages with [-window_size * 2:] to keep only the last N exchanges',
'Change the recent line to: recent = messages[1:][-window_size * 2:]'
],
solution: 'messages = [\n {"role": "system", "content": "You are a helpful assistant."}\n]\nwindow_size = 3\n\ndef windowed_chat(user_msg):\n messages.append({"role": "user", "content": user_msg})\n windowed = [messages[0]]\n recent = messages[1:][-window_size * 2:]\n windowed.extend(recent)\n messages.append({"role": "assistant", "content": f"Reply to: {user_msg}"})\n return len(windowed)\n\nfor i in range(5):\n count = windowed_chat(f"Question {i+1}")\n print(f"Exchange {i+1}: sending {count} messages")',
solutionExplanation: 'The key fix is slicing with [-window_size * 2:]. Since each exchange has 2 messages (user + assistant), window_size * 2 gives us the right number of individual messages to keep. The negative slice takes from the end of the list.',
xpReward: 15,
}
Add Streaming Responses
Without streaming, the user stares at a blank screen while the API builds the full reply. Streaming shows words as they come in — just like ChatGPT. It makes your bot feel way faster.
Turn on streaming with stream=True. You’ll get chunks instead of one big reply. Each chunk holds a small bit of text:
python
def chat_stream(messages):
"""Send messages and stream the response token by token."""
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
stream=True
)
full_response = ""
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
full_response += delta.content
print() # newline after streaming completes
return full_response
Output:
python
# No output — function defined
Let’s test it. Watch how the response appears word by word instead of all at once:
python
test_messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain what an API is in 2 sentences."}
]
result = chat_stream(test_messages)
Output:
python
An API (Application Programming Interface) is a set of rules that lets
different software programs talk to each other. Think of it like a waiter
in a restaurant — you tell the waiter what you want, the waiter tells the
kitchen, and then brings back your food.
The text shows up word by word as the model writes it. Users get instant feedback — they see the reply forming live.
Tip: Always use streaming in live bots. Even if total reply time stays the same, streaming feels 3-5x faster. Users start reading right away. The wait drops from “whole reply” to “first word” — usually under 500ms.
Streaming Chatbot with Memory
Let’s combine streaming with our conversation memory system:
python
def create_streaming_chatbot(system_prompt, window_size=10):
messages = [{"role": "system", "content": system_prompt}]
def chat(user_input):
messages.append({"role": "user", "content": user_input})
# Apply window
context = [messages[0]]
context.extend(messages[1:][-window_size * 2:])
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=context,
stream=True
)
full_response = ""
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
full_response += delta.content
print()
messages.append({"role": "assistant", "content": full_response})
return full_response
return chat, messages
Output:
python
# No output — function defined
This pairs window memory with streaming. The bot recalls context and replies in real time.
Customize Your Chatbot’s Personality
The system prompt sets your bot’s tone, style, and skills. A sharp prompt turns a bland helper into a focused tool. Let’s look at some good patterns.
Here are three sample roles that show how the system prompt shapes replies:
python
# Personality 1: Concise technical expert
tech_prompt = (
"You are a senior Python developer. Give precise, code-focused answers. "
"Skip pleasantries. Use code examples over explanations when possible. "
"If the user's approach has issues, point them out directly."
)
# Personality 2: Patient beginner tutor
tutor_prompt = (
"You are a patient programming tutor teaching someone who has never "
"coded before. Use simple analogies. Explain every term. Celebrate "
"small wins. Never make the student feel stupid for asking."
)
# Personality 3: Data science advisor
ds_prompt = (
"You are a senior data scientist at a tech company. Help users with "
"ML model selection, feature engineering, and experiment design. "
"Always ask clarifying questions about the dataset before recommending "
"an approach. Mention tradeoffs for every suggestion."
)
Output:
python
# No output — prompts defined
Note: The system prompt rides with every API call, so it eats your token budget. Keep it short — 2 to 4 lines is best. A 500-word prompt wastes tokens on every turn.
System Prompt Best Practices
Follow these tips for good system prompts:
- Name the role — “You are a senior Python dev” not “You are helpful”
- Set the format — “Answer in bullet points” or “Use code examples”
- Add limits — “Keep replies under 100 words” or “Ask one follow-up”
- Say what NOT to do — “Never give medical advice” or “Don’t use jargon”
python
# A well-structured system prompt
system_prompt = (
"You are a Python code reviewer. "
"When the user shares code, respond with:\n"
"1. A one-sentence summary of what the code does\n"
"2. Any bugs or issues (if none, say 'No bugs found')\n"
"3. One specific improvement suggestion with a code example\n"
"Keep your total response under 150 words."
)
chat, _ = create_chatbot(system_prompt)
print(chat("Review this: x = [i for i in range(100) if i % 2 == 0]"))
Output:
python
**Summary:** Creates a list of even numbers from 0 to 98 using a list
comprehension with a filter condition.
**Bugs:** No bugs found.
**Improvement:** Use `range(0, 100, 2)` instead of filtering — it generates
only even numbers directly, which is faster and more readable:
x = list(range(0, 100, 2))
A clear system prompt gives you steady, expected output. This matters for bots built to do one thing — like code review, help desks, or teaching.
Save and Load Conversations
A bot that wipes its memory when you close the terminal isn’t much use. Let’s fix that — we’ll save chats to JSON files and load them back.
The json module works great here since our messages are plain dicts:
python
import json
from datetime import datetime
def save_conversation(messages, filepath="chat_history.json"):
"""Save conversation history to a JSON file."""
data = {
"saved_at": datetime.now().isoformat(),
"message_count": len(messages),
"messages": messages
}
with open(filepath, "w") as f:
json.dump(data, f, indent=2)
print(f"Saved {len(messages)} messages to {filepath}")
def load_conversation(filepath="chat_history.json"):
"""Load conversation history from a JSON file."""
with open(filepath, "r") as f:
data = json.load(f)
print(f"Loaded {data['message_count']} messages (saved {data['saved_at']})")
return data["messages"]
Output:
python
# No output — functions defined
Let’s test the save and load cycle:
python
# Create a chatbot and have a conversation
chat, history = create_chatbot("You are a helpful Python tutor.")
chat("What are decorators in Python?")
chat("Show me a simple example.")
# Save the conversation
save_conversation(history, "my_chat.json")
# Load it back
loaded_messages = load_conversation("my_chat.json")
print(f"\nFirst message role: {loaded_messages[0]['role']}")
print(f"Total messages: {len(loaded_messages)}")
Output:
python
Saved 5 messages to my_chat.json
Loaded 5 messages (saved 2026-03-22T14:30:00.000000)
First message role: system
Total messages: 5
Now you can pick up chats across sessions. Load the saved messages, hand them to a new bot, and carry on.
Manage Multiple Conversations
For a live bot, you’ll want to track many chats by session ID. Here’s a simple session manager:
python
import os
class SessionManager:
def __init__(self, storage_dir="chat_sessions"):
self.storage_dir = storage_dir
os.makedirs(storage_dir, exist_ok=True)
def save(self, session_id, messages):
filepath = os.path.join(self.storage_dir, f"{session_id}.json")
data = {
"session_id": session_id,
"saved_at": datetime.now().isoformat(),
"messages": messages
}
with open(filepath, "w") as f:
json.dump(data, f, indent=2)
def load(self, session_id):
filepath = os.path.join(self.storage_dir, f"{session_id}.json")
if not os.path.exists(filepath):
return None
with open(filepath, "r") as f:
return json.load(f)["messages"]
def list_sessions(self):
files = [f.replace(".json", "") for f in os.listdir(self.storage_dir)
if f.endswith(".json")]
return sorted(files)
sessions = SessionManager()
print(f"Storage directory: {sessions.storage_dir}")
print(f"Active sessions: {sessions.list_sessions()}")
Output:
python
Storage directory: chat_sessions
Active sessions: []
Each chat gets its own JSON file. You can list sessions, load any past chat, and keep going.
typescript
{
type: 'exercise',
id: 'chatbot-persistence-ex2',
title: 'Exercise 2: Add Conversation Metadata',
difficulty: 'beginner',
exerciseType: 'write',
instructions: 'Modify the `save_with_metadata` function to save conversation data with extra metadata: the total word count across all messages and the number of user messages. Print the metadata after saving.',
starterCode: 'import json\n\nmessages = [\n {"role": "system", "content": "You are a helpful assistant."},\n {"role": "user", "content": "What is machine learning?"},\n {"role": "assistant", "content": "Machine learning is a branch of AI where computers learn patterns from data."},\n {"role": "user", "content": "Give me an example."},\n {"role": "assistant", "content": "Email spam filters learn to identify spam by analyzing thousands of labeled emails."}\n]\n\ndef save_with_metadata(messages):\n word_count = 0 # FIX: count total words across all message contents\n user_count = 0 # FIX: count messages where role is "user"\n \n for msg in messages:\n pass # Replace this line with your logic\n \n print(f"Total words: {word_count}")\n print(f"User messages: {user_count}")\n\nsave_with_metadata(messages)',
testCases: [
{ id: 'tc1', input: '', expectedOutput: 'User messages: 2', description: 'Should count 2 user messages' }
],
hints: [
'For word count, use len(msg["content"].split()) and add it to the total for each message',
'For user count, check if msg["role"] == "user" inside the loop and increment the counter'
],
solution: 'import json\n\nmessages = [\n {"role": "system", "content": "You are a helpful assistant."},\n {"role": "user", "content": "What is machine learning?"},\n {"role": "assistant", "content": "Machine learning is a branch of AI where computers learn patterns from data."},\n {"role": "user", "content": "Give me an example."},\n {"role": "assistant", "content": "Email spam filters learn to identify spam by analyzing thousands of labeled emails."}\n]\n\ndef save_with_metadata(messages):\n word_count = 0\n user_count = 0\n \n for msg in messages:\n word_count += len(msg["content"].split())\n if msg["role"] == "user":\n user_count += 1\n \n print(f"Total words: {word_count}")\n print(f"User messages: {user_count}")\n\nsave_with_metadata(messages)',
solutionExplanation: 'We loop through each message, split the content into words using .split() and add the count. For user messages, we check if the role equals "user" and increment the counter.',
xpReward: 15,
}
Build a Complete Chatbot Class
Let’s pull it all into one clean Chatbot class. This class packs in memory, streaming, token tracking, saving, and custom prompts:
python
import os
import json
import tiktoken
from datetime import datetime
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
class Chatbot:
"""A production-ready chatbot with memory, streaming, and persistence."""
def __init__(
self,
system_prompt="You are a helpful assistant.",
model="gpt-4o-mini",
window_size=15,
token_budget=8000,
storage_dir="chat_sessions"
):
self.client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
self.model = model
self.window_size = window_size
self.token_budget = token_budget
self.storage_dir = storage_dir
self.messages = [{"role": "system", "content": system_prompt}]
self.total_tokens_used = 0
os.makedirs(storage_dir, exist_ok=True)
def _count_tokens(self, messages):
"""Count tokens in a message list."""
encoding = tiktoken.encoding_for_model(self.model)
count = 0
for msg in messages:
count += 4
for value in msg.values():
count += len(encoding.encode(str(value)))
return count + 2
def _build_context(self):
"""Build windowed context within token budget."""
context = [self.messages[0]] # system prompt
recent = self.messages[1:][-self.window_size * 2:]
context.extend(recent)
# Trim further if over budget
while self._count_tokens(context) > self.token_budget and len(context) > 2:
context.pop(1)
return context
def chat(self, user_input, stream=True):
"""Send a message and get a response."""
self.messages.append({"role": "user", "content": user_input})
context = self._build_context()
if stream:
response_text = self._stream_response(context)
else:
response = self.client.chat.completions.create(
model=self.model,
messages=context
)
response_text = response.choices[0].message.content
self.total_tokens_used += response.usage.total_tokens
self.messages.append({"role": "assistant", "content": response_text})
return response_text
def _stream_response(self, context):
"""Stream the response token by token."""
stream = self.client.chat.completions.create(
model=self.model,
messages=context,
stream=True
)
full_response = ""
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
full_response += delta.content
print()
return full_response
def save(self, session_id="default"):
"""Save conversation to disk."""
filepath = os.path.join(self.storage_dir, f"{session_id}.json")
data = {
"session_id": session_id,
"saved_at": datetime.now().isoformat(),
"model": self.model,
"total_tokens_used": self.total_tokens_used,
"messages": self.messages
}
with open(filepath, "w") as f:
json.dump(data, f, indent=2)
print(f"Saved to {filepath}")
def load(self, session_id="default"):
"""Load a previous conversation."""
filepath = os.path.join(self.storage_dir, f"{session_id}.json")
with open(filepath, "r") as f:
data = json.load(f)
self.messages = data["messages"]
self.total_tokens_used = data.get("total_tokens_used", 0)
print(f"Loaded {len(self.messages)} messages from {session_id}")
def stats(self):
"""Print conversation statistics."""
user_msgs = sum(1 for m in self.messages if m["role"] == "user")
tokens_now = self._count_tokens(self.messages)
print(f"Messages: {len(self.messages)} ({user_msgs} from user)")
print(f"Current tokens: {tokens_now}")
print(f"Total tokens used: {self.total_tokens_used}")
Output:
python
# No output — class defined
Here’s how to use the complete chatbot:
python
# Create and use the chatbot
bot = Chatbot(
system_prompt="You are a senior data scientist. Give concise, practical advice.",
window_size=10,
token_budget=4000
)
# Non-streaming mode for scripted usage
response = bot.chat("What is the best way to handle missing data?", stream=False)
print(response)
print()
# Check stats
bot.stats()
Output:
python
The best approach depends on your data and model. For small amounts of missing
data (under 5%), dropping rows is fine. For more, use imputation — median for
numeric columns, mode for categorical. Scikit-learn's SimpleImputer handles
both. For tree-based models, some implementations handle NaN natively (XGBoost,
LightGBM), so you may not need imputation at all.
Messages: 3 (1 from user)
Current tokens: 142
Total tokens used: 187
typescript
{
type: 'exercise',
id: 'chatbot-class-ex3',
title: 'Exercise 3: Add a Reset Method to the Chatbot',
difficulty: 'beginner',
exerciseType: 'write',
instructions: 'Add a `reset` method to the SimpleChatbot class below. The method should clear all messages except the system prompt, reset the exchange counter to 0, and print "Chat reset. System prompt preserved." Then test it by running the provided code.',
starterCode: 'class SimpleChatbot:\n def __init__(self, system_prompt):\n self.system_prompt = system_prompt\n self.messages = [{"role": "system", "content": system_prompt}]\n self.exchanges = 0\n\n def add_exchange(self, user_msg, bot_reply):\n self.messages.append({"role": "user", "content": user_msg})\n self.messages.append({"role": "assistant", "content": bot_reply})\n self.exchanges += 1\n\n def reset(self):\n pass # YOUR CODE HERE\n\n# Test\nbot = SimpleChatbot("You are a helpful assistant.")\nbot.add_exchange("Hello", "Hi there!")\nbot.add_exchange("How are you?", "I am doing well!")\nprint(f"Before reset: {bot.exchanges} exchanges, {len(bot.messages)} messages")\nbot.reset()\nprint(f"After reset: {bot.exchanges} exchanges, {len(bot.messages)} messages")',
testCases: [
{ id: 'tc1', input: '', expectedOutput: 'Chat reset. System prompt preserved.', description: 'Should print reset confirmation' },
{ id: 'tc2', input: '', expectedOutput: 'After reset: 0 exchanges, 1 messages', description: 'Should have 0 exchanges and 1 message after reset' }
],
hints: [
'Reset messages to a list containing only the system prompt: self.messages = [{"role": "system", "content": self.system_prompt}]',
'Set self.exchanges = 0 and print the confirmation message'
],
solution: 'class SimpleChatbot:\n def __init__(self, system_prompt):\n self.system_prompt = system_prompt\n self.messages = [{"role": "system", "content": system_prompt}]\n self.exchanges = 0\n\n def add_exchange(self, user_msg, bot_reply):\n self.messages.append({"role": "user", "content": user_msg})\n self.messages.append({"role": "assistant", "content": bot_reply})\n self.exchanges += 1\n\n def reset(self):\n self.messages = [{"role": "system", "content": self.system_prompt}]\n self.exchanges = 0\n print("Chat reset. System prompt preserved.")\n\nbot = SimpleChatbot("You are a helpful assistant.")\nbot.add_exchange("Hello", "Hi there!")\nbot.add_exchange("How are you?", "I am doing well!")\nprint(f"Before reset: {bot.exchanges} exchanges, {len(bot.messages)} messages")\nbot.reset()\nprint(f"After reset: {bot.exchanges} exchanges, {len(bot.messages)} messages")',
solutionExplanation: 'The reset method rebuilds the messages list with only the system prompt, sets exchanges to 0, and prints a confirmation. We stored the system_prompt separately in __init__ so we can recreate the initial state without losing the prompt text.',
xpReward: 15,
}
Common Mistakes and How to Fix Them
Mistake 1: Not Including Previous Messages in API Calls
This is the number one newbie mistake. Each API call must carry the full chat context.
Wrong:
python
# Each call is independent — no memory
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "What about the second point?"}]
)
Why it’s wrong: The model has no clue what “the second point” means. With no old messages, every call is a blank slate.
Correct:
python
# Include full history — model has context
messages.append({"role": "user", "content": "What about the second point?"})
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages # includes all previous exchanges
)
Output:
python
# The response now correctly references previous context
Mistake 2: Forgetting to Append the Assistant’s Reply
If you skip saving the bot’s reply, the model won’t know what it said last time.
Wrong:
python
def chat_broken(user_input, messages):
messages.append({"role": "user", "content": user_input})
response = client.chat.completions.create(
model="gpt-4o-mini", messages=messages
)
return response.choices[0].message.content
# BUG: assistant reply is never added to messages
Why it’s wrong: The next call shows two user messages in a row with no bot reply between them. The model gets lost in the broken flow.
Correct:
python
def chat_fixed(user_input, messages):
messages.append({"role": "user", "content": user_input})
response = client.chat.completions.create(
model="gpt-4o-mini", messages=messages
)
reply = response.choices[0].message.content
messages.append({"role": "assistant", "content": reply}) # store the reply
return reply
Mistake 3: Ignoring Token Limits Until the API Errors
With no token check, long chats crash with a context length error.
Wrong:
python
# No limit checking — will eventually fail
def chat_no_limit(user_input, messages):
messages.append({"role": "user", "content": user_input})
response = client.chat.completions.create(
model="gpt-4o-mini", messages=messages # messages grow forever
)
return response.choices[0].message.content
Why it’s wrong: After too many messages, you blow past the model’s context window. The API throws an error and the bot dies mid-chat.
Correct:
python
# Trim old messages before each call
def chat_with_limit(user_input, messages, max_tokens=4000):
messages.append({"role": "user", "content": user_input})
while count_tokens(messages) > max_tokens and len(messages) > 2:
messages.pop(1) # remove oldest non-system message
response = client.chat.completions.create(
model="gpt-4o-mini", messages=messages
)
reply = response.choices[0].message.content
messages.append({"role": "assistant", "content": reply})
return reply
Mistake 4: Hardcoding API Keys in Source Code
Wrong:
python
client = OpenAI(api_key="sk-abc123mykey456") # exposed in source code
Why it’s wrong: If this file hits GitHub — even for a minute — bots scrape the key fast. You’ll face rogue charges and a hacked account.
Correct:
python
from dotenv import load_dotenv
load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
Mistake 5: Not Handling API Errors
The API can fail for many reasons — rate limits, network drops, bad requests. With no error handling, your bot crashes.
Wrong:
python
# No error handling — any API failure crashes the chatbot
response = client.chat.completions.create(
model="gpt-4o-mini", messages=messages
)
Correct:
python
import openai
try:
response = client.chat.completions.create(
model="gpt-4o-mini", messages=messages
)
reply = response.choices[0].message.content
except openai.RateLimitError:
reply = "I'm getting too many requests right now. Please wait a moment."
except openai.APIConnectionError:
reply = "I can't reach the API. Please check your internet connection."
except openai.APIError as e:
reply = f"API error occurred: {e}"
Output:
python
# Chatbot gracefully handles errors instead of crashing
Frequently Asked Questions
How much does it cost to run an AI chatbot?
It depends on your model and chat length. GPT-4o-mini costs about $0.15 per million input tokens and $0.60 per million output tokens. A 20-message chat uses roughly 2,000 to 4,000 tokens. That’s under $0.01. For side projects, costs are near zero.
Can I use a different LLM instead of OpenAI?
Yes. The message format (system/user/assistant roles) is nearly the same across providers. Claude, Gemini, and Llama all use a similar pattern. Swap the client setup and tweak small API diffs — the memory logic stays the same.
How do I deploy my chatbot as a web app?
Wrap your bot in a web framework like FastAPI or Flask. Make an endpoint that takes user messages and sends back replies. Use session IDs (cookies or tokens) to keep one chat history per user. For the frontend, a simple HTML page with fetch calls works. Or use Streamlit for a quick demo.
What is the difference between conversation memory and RAG?
Chat memory stores what was said in the current session. RAG (Retrieval-Augmented Generation) pulls in outside facts from docs or databases. They solve different problems. Memory answers “what did we just talk about?” RAG answers “what does our policy say?” Many live bots use both.
How do I prevent my chatbot from generating harmful content?
Set rules in the system prompt: “Never give medical, legal, or money advice.” Add input checks to catch abuse. The OpenAI API has built-in content filters too. For live systems, add OpenAI’s Moderation API as a safety layer before showing replies.
Complete Code
References
- OpenAI API Documentation — Chat Completions. Link
- OpenAI API Documentation — Streaming. Link
- OpenAI Cookbook — How to count tokens with tiktoken. Link
- OpenAI Documentation — Best practices for prompt engineering. Link
- LangChain Documentation — Conversational Memory. Link
- tiktoken — OpenAI’s token counting library. Link
- python-dotenv Documentation. Link
- Pinecone — Conversational Memory for LLMs with LangChain. Link
[SCHEMA HINTS]
– Article type: Tutorial
– Primary technology: OpenAI API, Python 3.9+
– Programming language: Python
– Difficulty: Beginner
– Keywords: python ai chatbot, chatbot with memory, conversational assistant, openai api chatbot, chatbot conversation history, python chatbot tutorial, llm memory management, streaming chatbot python
Free Course
Master Core Python — Your First Step into AI/ML
Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.
Start Free Course →Trusted by 50,000+ learners
Related Course
Master Gen AI — Hands-On
Join 5,000+ students at edu.machinelearningplus.com
Explore Course
