machine learning +
Zero-Shot vs Few-Shot Prompting: Complete Guide
OpenAI Function Calling Tutorial in Python (2026)
Learn OpenAI function calling in Python with 3 working tools. Build the tool-use loop, handle parallel calls, and design schemas using raw HTTP requests.
This post has interactive code — click ‘Run’ or press Ctrl+Enter on any code block to execute it directly in your browser. The first run may take a few seconds to initialize.
Build a multi-tool assistant that runs your Python functions on command — using raw HTTP requests to the OpenAI API.
You ask ChatGPT “What’s 7 raised to the power of 12?” and it confidently answers… incorrectly. LLMs are terrible at math. They guess instead of computing. But what if the model could call a real calculator function, get the exact answer, and then reply?
That’s what function calling does. You give the model a menu of tools — Python functions you’ve written. It picks the right one and fills in the arguments. You run the function, feed the result back, and the model turns it into a natural response.
By the end of this article, you’ll build a multi-tool assistant with three tools: a calculator, a weather lookup, and a database query. You’ll understand tool schemas, the tool-use loop, and parallel tool calls.
All code uses raw HTTP requests. No SDK required. And every code block runs in the browser with Pyodide — we mock the API responses so you can practice the full pattern without an API key.
What Is OpenAI Function Calling?
Imagine you’re building a chatbot for a retail company. A customer asks: “What’s the shipping cost for order #4521?” The model doesn’t have access to your database. It can’t look up order #4521. Without function calling, it would either hallucinate an answer or say “I don’t know.”
Function calling solves this. You describe a get_shipping_cost function to the model — its name, what it does, and what parameters it takes.
When the user asks about shipping, the model doesn’t guess. It returns a structured request: “Call get_shipping_cost with order_id=4521.” You run that function, get the result, and send it back. The model then responds with the actual answer.
KEY INSIGHT: The model never executes your functions. It only decides which function to call and generates the arguments. You run the function in your own code and control what happens.
Here’s the mental model in three steps:
- You describe your tools — JSON schemas that tell the model what functions exist and what arguments they accept.
- The model picks a tool — based on the user’s message, it returns a
tool_callsresponse instead of a text response. - You execute and reply — you run the function, send the result back, and the model generates a final answer using that result.
That’s the entire pattern. Every function calling implementation follows these three steps. Let’s build it.
Setting Up for OpenAI Function Calling
Prerequisites
- Python version: 3.9+
- Required libraries:
requests(built-in with most Python installations) - Install:
pip install requests(if not already available) - API key: OpenAI API key (get one here)
- Time to complete: 25–30 minutes
Getting Your API Key
Go to platform.openai.com/api-keys and create a new key. Store it as an environment variable — never put API keys directly in your code.
bash
# macOS/Linux
export OPENAI_API_KEY="sk-your-key-here"
# Windows (Command Prompt)
set OPENAI_API_KEY=sk-your-key-here
# Windows (PowerShell)
$env:OPENAI_API_KEY="sk-your-key-here"
The first code block sets up imports and a helper function. The chat() function wraps the HTTP POST to the OpenAI endpoint. It sends headers with the API key, builds the JSON body, and returns the parsed response.
We also define a MOCK_MODE flag. When True, the function returns fake API responses instead of calling OpenAI. This lets every code block run in the browser with Pyodide — no API key needed.
Set it to False when you’re ready to hit the real API.
import micropip
await micropip.install('requests')
import json
import os
import math
import uuid
MOCK_MODE = True # Set to False to use the real OpenAI API
API_KEY = os.environ.get("OPENAI_API_KEY", "your-key-here")
API_URL = "https://api.openai.com/v1/chat/completions"
MODEL = "gpt-4o"
Next, the chat() function. In mock mode, it reads the user’s message and picks the right tool by keyword. In real mode, it sends the HTTP request. Either way, your tool-use loop works the same.
def chat(messages, tools=None, tool_choice="auto"):
"""Send a chat completion request (or mock it)."""
if MOCK_MODE:
return _mock_chat(messages, tools, tool_choice)
import requests
headers = {"Content-Type": "application/json", "Authorization": f"Bearer {API_KEY}"}
payload = {"model": MODEL, "messages": messages}
if tools:
payload["tools"] = tools
payload["tool_choice"] = tool_choice
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
The mock function fakes what the OpenAI API returns. It reads the last user message and decides which tool to “call.” This is a teaching shortcut. The real model uses the tool schemas to make this choice.
def _mock_chat(messages, tools, tool_choice):
"""Simulate OpenAI API responses for browser execution."""
last_user = ""
has_tool_results = False
for m in messages:
if m.get("role") == "user":
last_user = m["content"].lower()
if m.get("role") == "tool":
has_tool_results = True
# If we already have tool results, return a text summary
if has_tool_results:
results = [m["content"] for m in messages if m.get("role") == "tool"]
summary = "Based on the tool results: " + " | ".join(results)
return {"choices": [{"message": {"role": "assistant", "content": summary}}]}
# Decide which tools to call based on keywords
tool_calls = []
if tools and tool_choice != "none":
if any(w in last_user for w in ["calculate", "math", "power", "percent", "%", "+"]):
expr = "7**12" if "power" in last_user else "0.15 * 89" if "15%" in last_user or "percent" in last_user else "2 + 2"
tool_calls.append({"id": f"call_{uuid.uuid4().hex[:8]}", "type": "function",
"function": {"name": "calculate", "arguments": json.dumps({"expression": expr})}})
if any(w in last_user for w in ["weather", "temperature", "forecast"]):
city = "Tokyo" if "tokyo" in last_user else "Mumbai" if "mumbai" in last_user else "London"
tool_calls.append({"id": f"call_{uuid.uuid4().hex[:8]}", "type": "function",
"function": {"name": "get_weather", "arguments": json.dumps({"city": city})}})
if any(w in last_user for w in ["order", "customer", "cust-"]):
cid = "CUST-1234" if "1234" in last_user else "CUST-5678"
tool_calls.append({"id": f"call_{uuid.uuid4().hex[:8]}", "type": "function",
"function": {"name": "lookup_orders", "arguments": json.dumps({"customer_id": cid})}})
if tool_calls:
return {"choices": [{"message": {"role": "assistant", "content": None, "tool_calls": tool_calls}}]}
return {"choices": [{"message": {"role": "assistant", "content": f"I can help with that! (Mock response for: {last_user})"}}]}
No output here — this is setup. We’ll call chat() in the next section.
Defining Function Calling Tool Schemas
Before the model can call your functions, you describe them in a format it understands. OpenAI uses JSON Schema for this.
Each tool definition has three parts: a name, a description, and a parameters object. The parameters list each argument the function accepts.
Why does the description matter so much? The model reads it to decide when to use the tool. A vague description like “does math” won’t help. A clear one like “evaluates a math expression and returns the exact result” tells the model exactly when to pick this tool.
Let’s define our first tool — a calculator. The schema tells the model: “This tool takes a math expression as a string and returns the result.”
Inside parameters, type specifies the data type, description explains what to pass, and required lists the mandatory fields.
calculator_tool = {
"type": "function",
"function": {
"name": "calculate",
"description": "Evaluate a mathematical expression and return the exact numerical result. Use this for any arithmetic, exponents, roots, or numeric computation.",
"parameters": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "A mathematical expression to evaluate, e.g. '7**12' or '(25 * 4) + 17'"
}
},
"required": ["expression"]
}
}
}
Next, a weather tool. It takes a city name and an optional unit. Notice the enum on unit — this limits the model to “celsius” or “fahrenheit”. It won’t invent other options.
weather_tool = {
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city. Returns temperature, condition, and humidity.",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city name, e.g. 'London' or 'San Francisco'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit. Defaults to celsius."
}
},
"required": ["city"]
}
}
}
And a database lookup tool. It takes a customer ID and returns their recent orders.
database_tool = {
"type": "function",
"function": {
"name": "lookup_orders",
"description": "Look up recent orders for a customer by their customer ID. Returns order IDs, dates, and totals.",
"parameters": {
"type": "object",
"properties": {
"customer_id": {
"type": "string",
"description": "The unique customer identifier, e.g. 'CUST-1234'"
}
},
"required": ["customer_id"]
}
}
}
tools = [calculator_tool, weather_tool, database_tool]
Three tools, three schemas. Notice a few design choices:
enumfor constrained values — the weather tool limitsunitto “celsius” or “fahrenheit”. The model won’t invent other options.- Clear descriptions with examples — “e.g. ‘CUST-1234′” helps the model format the argument correctly.
- Optional parameters —
unitisn’t inrequired, so the model can skip it.
TIP: Write descriptions as if you’re explaining the function to a new coworker. The model reads them to decide when to call and how to format arguments. Vague descriptions lead to wrong picks.
Implementing the Tool Functions
The model generates arguments. Your code does the real work. Let’s write the three functions that match our schemas.
In production, these would call real APIs or databases. For this tutorial, we use mock data so the code runs without outside dependencies.
The calculator uses Python’s built-in eval() with a safe restriction. The weather function returns hardcoded data for a few cities. The database function returns sample orders for known customer IDs.
def calculate(expression):
"""Safely evaluate a math expression."""
allowed = {"__builtins__": {}, "math": math}
try:
result = eval(expression, allowed)
return json.dumps({"result": result})
except Exception as e:
return json.dumps({"error": str(e)})
WARNING: Using
eval()on untrusted input is dangerous in production. We restrict it here with an empty__builtins__dict. That blocks file access and imports. For production, use a proper math parser likenumexprorsympy.
The weather function simulates an API response. In a real app, you’d call a weather API like OpenWeatherMap here.
def get_weather(city, unit="celsius"):
"""Mock weather data for demonstration."""
weather_data = {
"london": {"temp_c": 14, "condition": "Cloudy", "humidity": 78},
"san francisco": {"temp_c": 18, "condition": "Foggy", "humidity": 82},
"tokyo": {"temp_c": 26, "condition": "Sunny", "humidity": 60},
"mumbai": {"temp_c": 32, "condition": "Humid", "humidity": 88},
}
data = weather_data.get(city.lower(), {"temp_c": 20, "condition": "Unknown", "humidity": 50})
temp = data["temp_c"] if unit == "celsius" else round(data["temp_c"] * 9/5 + 32, 1)
unit_label = "°C" if unit == "celsius" else "°F"
return json.dumps({
"city": city,
"temperature": f"{temp}{unit_label}",
"condition": data["condition"],
"humidity": f"{data['humidity']}%"
})
And the database lookup — again, mock data that simulates what a real query would return.
def lookup_orders(customer_id):
"""Mock database lookup for customer orders."""
orders_db = {
"CUST-1234": [
{"order_id": "ORD-5001", "date": "2026-03-10", "total": "$149.99"},
{"order_id": "ORD-5023", "date": "2026-03-14", "total": "$29.50"},
],
"CUST-5678": [
{"order_id": "ORD-4999", "date": "2026-03-08", "total": "$89.00"},
],
}
results = orders_db.get(customer_id, [])
if not results:
return json.dumps({"error": f"No orders found for {customer_id}"})
return json.dumps({"customer_id": customer_id, "orders": results})
One more piece — a dispatcher function that maps tool names to Python functions. When the model says “call calculate“, the dispatcher finds the right function and runs it.
TOOL_FUNCTIONS = {
"calculate": calculate,
"get_weather": get_weather,
"lookup_orders": lookup_orders,
}
def execute_tool(name, arguments):
"""Run a tool function by name with parsed arguments."""
func = TOOL_FUNCTIONS.get(name)
if not func:
return json.dumps({"error": f"Unknown tool: {name}"})
args = json.loads(arguments)
return func(**args)
execute_tool takes the tool name and a JSON string of arguments. It parses the JSON and calls the right function with keyword unpacking (**args). This pattern scales well. Add a new tool by writing the function and adding one line to TOOL_FUNCTIONS.
The Tool-Use Loop: How It All Fits Together
This is the heart of function calling — and honestly, the part that clicked for me only after I built it once. The tool-use loop is a back-and-forth between your code and the model:
- Send the user’s message + tool schemas to the API.
- Check if the model returned
tool_callsor a regular text message. - If tool calls — execute each one, append the results, and send everything back.
- The model now has the tool results. It generates a final text response.
Here’s a diagram of that flow:
python
User message → API call (with tools)
↓
Model returns tool_calls?
├── NO → Return text response (done)
└── YES → Execute each tool
↓
Append tool results to messages
↓
API call again (with updated messages)
↓
Model returns final text response (done)
Let’s build this as a function. run_assistant() takes a user message, sends it with our tool schemas, checks for tool calls, runs them, and gets the final response. Watch for the tool_calls field — it’s a list, because the model might call several tools at once.
def run_assistant(user_message):
"""Complete tool-use loop: send message, handle tool calls, return final response."""
messages = [
{"role": "system", "content": "You are a helpful assistant with access to tools. Use them when needed."},
{"role": "user", "content": user_message}
]
# Step 1: Send message with tools
response = chat(messages, tools=tools)
assistant_msg = response["choices"][0]["message"]
# Step 2: Check for tool calls
if not assistant_msg.get("tool_calls"):
return assistant_msg["content"]
# Step 3: Execute each tool call
messages.append(assistant_msg)
for tool_call in assistant_msg["tool_calls"]:
name = tool_call["function"]["name"]
args = tool_call["function"]["arguments"]
print(f" Calling tool: {name}({args})")
result = execute_tool(name, args)
messages.append({
"role": "tool",
"tool_call_id": tool_call["id"],
"content": result
})
# Step 4: Get final response with tool results
final_response = chat(messages)
return final_response["choices"][0]["message"]["content"]
A few things to notice in this code:
- The assistant’s message (with
tool_calls) goes into the conversation before the tool results. The API requires this order. - Each tool result has a
roleof"tool"and atool_call_idthat matches the ID from the model’s request. That’s how the API links results to requests. - After all tool results are in, we make a second API call. The model now has what it needs to write a final answer.
KEY INSIGHT: Function calling is a conversation protocol, not a single request. The model says “I need this data,” you provide it, and then the model speaks to the user. Think of it as a two-round conversation with the model.
Let’s test it. We’ll ask a math question that the model would normally get wrong.
answer = run_assistant("What is 7 to the power of 12?")
print(answer)
The model called calculate with "7**12", got 13841287201, and replied with the correct answer. Without the tool, most models would guess — and get it wrong.
Let’s try the weather tool:
answer = run_assistant("What's the weather like in Tokyo right now?")
print(answer)
And the database tool:
answer = run_assistant("Show me recent orders for customer CUST-1234")
print(answer)
Three tools, one assistant, zero hardcoding of “if user asks about weather, call weather function.” The model reads the schemas and picks the right tool based on the user’s intent. That’s the power of function calling.
Parallel Tool Calls: Multiple Tools in One Turn
Sometimes a user asks something that needs two tools at once. “What’s 15% of $89, and what’s the weather in Mumbai?” That’s a calculator question and a weather question. Without parallel calls, the model would need two round trips. With them, it calls both tools in one shot.
The model returns multiple entries in the tool_calls array. Our run_assistant function already handles this — the for tool_call in assistant_msg["tool_calls"] loop processes every tool call, not just the first one.
Let’s test it:
answer = run_assistant(
"I need two things: calculate 15% of 89 dollars, "
"and tell me the current weather in Mumbai."
)
print(answer)
Check the printed output. You should see two “Calling tool” lines — one for calculate and one for get_weather. The model bundled both into one response. Your loop ran both, sent both results back, and the model wrote one answer covering both questions.
TIP: Parallel calls happen on their own when the model sees that it needs several tools. You don’t set any special flag. But if you want to prevent parallel calls, add
parallel_tool_calls: falseto your request body.
Here’s a more complex example — three tools in one turn:
answer = run_assistant(
"For customer CUST-5678: look up their orders, "
"calculate the total with 8% tax, "
"and check the weather in London for their delivery update."
)
print(answer)
Here’s what’s interesting. The model might call lookup_orders and get_weather in parallel — they don’t depend on each other. Then it uses the order total to call calculate in a second round. Or it might call all three at once. Either way, our loop handles it.
Tool Schema Design: Best Practices
Writing good schemas is the difference between a reliable assistant and one that calls the wrong tool half the time. Here are the rules I follow.
Rule 1: Descriptions are prompts. The description field isn’t just docs. The model reads it to decide when to use this tool. Write it like a prompt.
# Bad — too vague
{"description": "Gets data"}
# Good — tells the model exactly when to use this
{"description": "Look up recent orders for a customer by their customer ID. Returns order IDs, dates, and totals."}
Rule 2: Use enum for constrained choices. If a parameter only accepts specific values, list them. This prevents the model from inventing invalid arguments.
# Bad — model might pass "kelvin" or "K"
{"type": "string", "description": "Temperature unit"}
# Good — model can only pick these two
{"type": "string", "enum": ["celsius", "fahrenheit"]}
Rule 3: Add examples in descriptions. The model parses description text for formatting cues. Adding “e.g. ‘CUST-1234′” helps it format the argument correctly.
Rule 4: Keep schemas flat. Deeply nested parameters confuse the model. If you need complex input, flatten it into separate string parameters. Don’t nest objects three levels deep.
| Practice | Why It Matters |
|---|---|
| Clear descriptions | Model picks the right tool more often |
enum for fixed choices | Prevents invalid argument values |
| Examples in descriptions | Model formats arguments correctly |
| Flat parameter structure | Fewer parsing errors from the model |
| Specific required fields | Model always provides mandatory data |
COMMON MISTAKE: Naming two tools too alike — like
search_productsandfind_productswith similar descriptions. The model gets confused about which to call. Give each tool a distinct name and a description that clearly sets it apart.
Error Handling and Edge Cases
Production assistants need to handle failures gracefully. What happens when the model calls a tool that crashes? What if it passes bad arguments? I’ve seen both happen in real deployments — and without proper handling, the whole assistant freezes.
The improved version wraps tool execution in a try-except block. If a tool fails, the error goes back to the model. It can then retry with new arguments or explain the problem to the user. We also add a max loop count to stop infinite tool-calling cycles.
def run_assistant_safe(user_message, max_rounds=5):
"""Tool-use loop with error handling and loop protection."""
messages = [
{"role": "system", "content": "You are a helpful assistant. Use tools when needed. If a tool returns an error, explain it to the user."},
{"role": "user", "content": user_message}
]
for round_num in range(max_rounds):
response = chat(messages, tools=tools)
msg = response["choices"][0]["message"]
if not msg.get("tool_calls"):
return msg.get("content", "No response generated.")
messages.append(msg)
for tool_call in msg["tool_calls"]:
name = tool_call["function"]["name"]
args = tool_call["function"]["arguments"]
try:
result = execute_tool(name, args)
except Exception as e:
result = json.dumps({"error": f"Tool '{name}' failed: {str(e)}"})
messages.append({
"role": "tool",
"tool_call_id": tool_call["id"],
"content": result
})
return "Maximum tool-calling rounds reached. Please try a simpler query."
Three key improvements over the basic version:
- Try-except around execution — a crashing tool doesn’t crash your assistant. The error goes back to the model.
- Loop protection —
max_rounds=5prevents the model from calling tools endlessly. Without this, a buggy tool could cause an infinite loop. - Graceful fallback — if max rounds are hit, the user gets a clear message instead of a hang.
Let’s test with a bad calculation:
answer = run_assistant_safe("What is the square root of negative one?")
print(answer)
The model tried the calculation, got an error, and explained the result intelligently. That’s robust error handling — the assistant recovers instead of crashing.
Controlling Tool Choice
By default, tool_choice is "auto" — the model decides whether to call a tool or respond directly. But you have more control than that.
| Value | Behavior |
|---|---|
"auto" | Model decides — might call a tool, might not |
"none" | Model never calls tools (responds with text only) |
"required" | Model must call at least one tool |
{"type": "function", "function": {"name": "calculate"}} | Force a specific tool |
When would you force a tool? Say you’re building a calculator app. Every message should trigger calculate — no exceptions:
response = chat(
messages=[
{"role": "system", "content": "You are a calculator. Always use the calculate tool."},
{"role": "user", "content": "What is 2 + 2?"}
],
tools=tools,
tool_choice={"type": "function", "function": {"name": "calculate"}}
)
print(json.dumps(response["choices"][0]["message"]["tool_calls"], indent=2))
And "none" is handy for follow-ups. Once the model has used tools and you’re in a “just chat” phase, tool_choice="none" stops needless tool calls.
Common Mistakes and How to Fix Them
Mistake 1: Forgetting to append the assistant message before tool results
❌ Wrong:
# Missing: messages.append(assistant_msg)
for tool_call in assistant_msg["tool_calls"]:
result = execute_tool(tool_call["function"]["name"], tool_call["function"]["arguments"])
messages.append({"role": "tool", "tool_call_id": tool_call["id"], "content": result})
Why it breaks: The API needs the assistant message (with tool_calls) before the tool results. Skip it, and you get a 400 Bad Request: “messages with role ‘tool’ must be preceded by a message with a tool_calls field.”
✅ Correct:
messages.append(assistant_msg) # Must come first
for tool_call in assistant_msg["tool_calls"]:
result = execute_tool(tool_call["function"]["name"], tool_call["function"]["arguments"])
messages.append({"role": "tool", "tool_call_id": tool_call["id"], "content": result})
Mistake 2: Returning non-string content from tool functions
❌ Wrong:
def calculate(expression):
return eval(expression) # Returns an int or float
Why it breaks: The content field in tool messages must be a string. Pass a number, and the API either errors or converts it wrong.
✅ Correct:
def calculate(expression):
result = eval(expression)
return json.dumps({"result": result}) # Always return a JSON string
Mistake 3: Mismatched tool_call_id
❌ Wrong:
messages.append({
"role": "tool",
"tool_call_id": "some-random-id", # Wrong ID
"content": result
})
Why it breaks: Each tool result must have the exact tool_call_id from the model’s request. A mismatch causes a 400 error. Always use tool_call["id"] from the response.
✅ Correct:
messages.append({
"role": "tool",
"tool_call_id": tool_call["id"], # Exact ID from the model's request
"content": result
})
Exercise 1: Add a New Tool
{
type: 'exercise',
id: 'add-tool-ex1',
title: 'Exercise 1: Add a Unit Converter Tool',
difficulty: 'intermediate',
exerciseType: 'write',
instructions: 'Create a tool schema and implementation for a unit converter that converts between kilometers and miles. Define the `convert_distance` tool schema with parameters `value` (number), `from_unit` (enum: "km", "miles"), and `to_unit` (enum: "km", "miles"). Then implement the `convert_distance` function that performs the conversion. 1 mile = 1.60934 km.',
starterCode: '# Step 1: Define the tool schema\nconvert_tool = {\n "type": "function",\n "function": {\n "name": "convert_distance",\n "description": "Convert a distance value between kilometers and miles.",\n "parameters": {\n "type": "object",\n "properties": {\n # YOUR CODE: define value, from_unit, to_unit\n },\n "required": ["value", "from_unit", "to_unit"]\n }\n }\n}\n\n# Step 2: Implement the function\ndef convert_distance(value, from_unit, to_unit):\n # YOUR CODE: perform the conversion\n pass\n\n# Test it\nresult = convert_distance(10, "miles", "km")\nprint(result)',
testCases: [
{ id: 'tc1', input: 'print(json.loads(convert_distance(10, "miles", "km"))["result"])', expectedOutput: '16.0934', description: '10 miles should be 16.0934 km' },
{ id: 'tc2', input: 'print(json.loads(convert_distance(100, "km", "miles"))["result"])', expectedOutput: '62.1371', description: '100 km should be 62.1371 miles', hidden: false },
],
hints: [
'Use "enum": ["km", "miles"] for both from_unit and to_unit in the schema properties.',
'In the function: if from_unit == "miles" and to_unit == "km", multiply by 1.60934. For the reverse, divide by 1.60934. Round to 4 decimal places.'
],
solution: 'convert_tool = {\n "type": "function",\n "function": {\n "name": "convert_distance",\n "description": "Convert a distance value between kilometers and miles.",\n "parameters": {\n "type": "object",\n "properties": {\n "value": {"type": "number", "description": "The numeric value to convert"},\n "from_unit": {"type": "string", "enum": ["km", "miles"], "description": "The source unit"},\n "to_unit": {"type": "string", "enum": ["km", "miles"], "description": "The target unit"}\n },\n "required": ["value", "from_unit", "to_unit"]\n }\n }\n}\n\ndef convert_distance(value, from_unit, to_unit):\n if from_unit == to_unit:\n result = value\n elif from_unit == "miles" and to_unit == "km":\n result = round(value * 1.60934, 4)\n else:\n result = round(value / 1.60934, 4)\n return json.dumps({"result": result})',
solutionExplanation: 'The schema uses "enum" to restrict unit values and "number" type for the value. The function checks the direction of conversion and applies the 1.60934 factor. Results are returned as JSON strings, matching the pattern all our tools follow.',
xpReward: 15,
}
Exercise 2: Build a Multi-Round Tool Loop
{
type: 'exercise',
id: 'tool-loop-ex2',
title: 'Exercise 2: Handle Dependent Tool Calls',
difficulty: 'intermediate',
exerciseType: 'write',
instructions: 'Write a function `run_with_retry` that implements the tool-use loop but handles up to 3 rounds of tool calls. The function should: (1) send the user message with tools, (2) if the model returns tool_calls, execute them and send results back, (3) repeat up to 3 rounds, (4) return the final text response. If 3 rounds pass with no text response, return "Max rounds reached."',
starterCode: 'def run_with_retry(user_message, max_rounds=3):\n messages = [\n {"role": "system", "content": "You are a helpful assistant."},\n {"role": "user", "content": user_message}\n ]\n for i in range(max_rounds):\n response = chat(messages, tools=tools)\n msg = response["choices"][0]["message"]\n # YOUR CODE: check for tool_calls, execute, append results\n # If no tool_calls, return msg["content"]\n pass\n return "Max rounds reached."\n\n# Test: this should work in one round\nprint(run_with_retry("What is 5 + 3?"))',
testCases: [
{ id: 'tc1', input: 'print(type(run_with_retry("Hello")).__name__)', expectedOutput: 'str', description: 'Function should return a string' },
{ id: 'tc2', input: 'print("DONE")', expectedOutput: 'DONE', description: 'Function should run without errors' },
],
hints: [
'Inside the loop, check if msg.get("tool_calls") exists. If not, return msg["content"]. If yes, append msg to messages, then loop through tool_calls and execute each one.',
'After executing tools, append each result with role "tool" and the matching tool_call_id. Then let the for-loop continue to the next round — the next chat() call will include the tool results.'
],
solution: 'def run_with_retry(user_message, max_rounds=3):\n messages = [\n {"role": "system", "content": "You are a helpful assistant."},\n {"role": "user", "content": user_message}\n ]\n for i in range(max_rounds):\n response = chat(messages, tools=tools)\n msg = response["choices"][0]["message"]\n if not msg.get("tool_calls"):\n return msg.get("content", "No response.")\n messages.append(msg)\n for tc in msg["tool_calls"]:\n result = execute_tool(tc["function"]["name"], tc["function"]["arguments"])\n messages.append({"role": "tool", "tool_call_id": tc["id"], "content": result})\n return "Max rounds reached."',
solutionExplanation: 'Each iteration sends the current conversation to the API. If the model returns tool_calls, we execute them and add results. The loop then repeats — the next API call includes the full conversation history with tool results. When the model finally has enough information, it returns a text response instead of tool_calls, and we exit.',
xpReward: 20,
}
When Not to Use Function Calling
Function calling isn’t always the right choice. Here’s when to skip it.
Simple Q&A without external data. If the model can answer from its training data — “What is gradient descent?” — adding tools just adds latency and cost. Every tool-enabled request is a bit slower because the model reads all the schemas.
High-throughput pipelines. Each tool call adds a round trip. For bulk work — thousands of requests per minute — that extra time adds up. Pre-compute results or use structured outputs instead.
When the model needs to be creative. Function calling is for retrieval and computation. If you want the model to write poetry or brainstorm ideas, tools won’t help. They’ll just get in the way.
| Use Function Calling When… | Don’t Use When… |
|---|---|
| You need real-time data (weather, stock prices) | The model knows the answer already |
| You need exact computation (math, unit conversion) | You need creative text generation |
| You need to query your own database or API | Latency is critical and data is static |
| You want structured, reliable argument extraction | You have fewer than 2 tools |
Function Calling vs Structured Outputs
You might wonder: “OpenAI also has Structured Outputs. How’s that different from function calling?”
They solve different problems. Function calling lets the model request actions — “call this function with these arguments.” Structured Outputs force the model to return data in a specific JSON format — no function execution involved.
| Feature | Function Calling | Structured Outputs |
|---|---|---|
| Purpose | Model triggers your code | Model returns formatted data |
| Execution | You run the function | No execution needed |
| Use case | “Look up order #4521” | “Return {name, age, email}” |
| Multi-step | Yes — tool-use loop | No — single response |
| Schema format | tools parameter | response_format parameter |
Use function calling when the model needs to do something — query a database, run math, call an API. Use structured outputs when you need the model’s own answer in a fixed shape — like pulling entities from text.
You can combine both. Use function calling to fetch data, then structured outputs to shape the final response. But start with function calling alone — it covers most assistant use cases.
Summary: What You Built
You built a multi-tool assistant from scratch using raw HTTP requests. Here’s what you learned:
- Tool schemas — JSON Schema definitions that describe your functions to the model (name, description, parameters).
- The tool-use loop — send message → model returns
tool_calls→ execute functions → send results back → model responds. - Parallel tool calls — the model can call multiple tools in one turn. Your loop handles this naturally.
- Schema design — clear descriptions,
enumfor constraints, flat structures, examples in parameter descriptions. - Error handling — try-except around tool execution, loop limits, graceful fallbacks.
Practice Exercise
Build a personal finance assistant with two tools: get_balance(account_id) that returns a mock balance, and transfer_money(from_account, to_account, amount) that simulates a transfer. The transfer should check if the balance is sufficient.
Complete Code
Frequently Asked Questions
What’s the difference between “function calling” and “tool use”?
Same concept, different name. OpenAI first called it “function calling” with a functions parameter. They later renamed it “tool use” with tools. New code should use tools — it’s the current standard.
Does the model actually run my Python functions?
No. The model only outputs the function name and arguments as JSON. Your code runs the function and sends the result back. The model never touches your runtime.
How many tools can I define in one request?
OpenAI doesn’t publish a hard limit. In practice, 10-20 tools work well. Past that, the model makes more mistakes picking the right one. If you have 50+ tools, group them. First let the model pick a category, then show only that category’s tools.
Can I use function calling with streaming responses?
Yes. With streaming on, tool calls arrive as deltas — partial JSON chunks you piece together. The finish_reason changes from "stop" to "tool_calls" when the model wants a tool. You collect the chunks, run the tool, and keep streaming.
Why does the model sometimes ignore my tools and respond directly?
With tool_choice="auto", the model decides on its own. If it can answer from training data, it skips the tool. That’s usually fine — you don’t need a calculator for “what is 2+2.” For guaranteed tool use, set tool_choice="required".
References
- OpenAI documentation — Function Calling. Link
- OpenAI API Reference — Chat Completions. Link
- OpenAI Cookbook — Function Calling examples. Link
- JSON Schema specification — Understanding JSON Schema. Link
- OpenAI documentation — Structured Outputs. Link
- DataCamp — OpenAI Function Calling Tutorial. Link
- OpenAI documentation — Models. Link
Reviewed: March 2026 | Model tested: gpt-4o | API version: v1/chat/completions
Free Course
Master Core Python — Your First Step into AI/ML
Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.
Start Free Course →Trusted by 50,000+ learners
Related Course
Master Gen AI — Hands-On
Join 5,000+ students at edu.machinelearningplus.com
Explore Course
