Prompt Engineering Fundamentals — Reliable LLM Outputs

Apply prompt engineering fundamentals — zero-shot, few-shot, chain-of-thought, and structured output — to get consistent, reliable results from any LLM.

Written by Selva Prabhakaran | 27 min read

Prompt engineering is the practice of writing clear, structured inputs so that an LLM gives you the right answer, in the right format, every time. In this post, I’ll walk you through the five core techniques — zero-shot, few-shot, role-based, chain-of-thought, and structured output — with runnable Python code you can try right now.

Here’s a scene you’ve likely lived through. You type a clear request to an LLM. What comes back isn’t broken — it’s just off. Wrong layout, wrong voice, missing half the info you asked for.

You fiddle with the prompt. Swap a word, toss in a line. Somehow it starts working. But you can’t say why. A week later the task changes, and you’re guessing all over again.

Prompt engineering closes that loop. It’s not about magic words or clever hacks. It’s about seeing how the model reads your input — and shaping that input so you get reliable results every time.

What Is Prompt Engineering?

In plain terms, prompt engineering is the art of writing inputs that guide the model toward the answer you need. No secret sauce — just clear, organized directions.

Here’s a useful way to think about it. Imagine handing a task to a brand-new intern. If you say “summarize this report,” the result is a coin flip. But tell them “write 3 bullets — focus on revenue trends” and you get exactly the output you had in mind.

LLMs behave the same way. What you get out tracks almost perfectly with how clearly you spelled out what you wanted.

python

from openai import OpenAI
import json
import os

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

def ask_llm(prompt, model="gpt-4o-mini", temperature=0.0):
    """Send a prompt to the LLM and return the response."""
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=temperature,
    )
    return response.choices[0].message.content

This tiny helper does one job: send a prompt and return what the model wrote. I’ll call it throughout the post. The temperature=0.0 setting pins the output down — feed in the same words, get the same reply.

Key Insight: Prompt engineering is about cutting ambiguity, not tricking the model. The sharper you describe what you want, the closer the model gets to giving it to you.

Prerequisites

Python version: 3.9+
Required library: openai (1.0+)
Install: pip install openai
API key: You need an OpenAI API key. Create one at platform.openai.com/api-keys. Set it as an environment variable: export OPENAI_API_KEY="your-key-here"
Time to complete: 20-25 minutes

What Is Zero-Shot Prompting?

If you’ve ever typed a question into an LLM without showing it an example first, you’ve already done zero-shot prompting. You give a task, provide zero samples of what the right output looks like, and let the model figure it out from its training.

Here’s a zero-shot request for sentiment tagging:

python

result = ask_llm(
    "Classify the sentiment as positive, negative, or neutral: "
    "'The food was okay but the service was terrible.'"
)
print(result)

python

Negative

Clean and correct. The model picked up sentiment tagging from the text it was trained on — no help needed.

But zero-shot stumbles when you need the result in a rigid format. Take a look:

python

result = ask_llm(
    "Extract the product name and price from: "
    "'The new MacBook Pro 16-inch starts at $2,499'"
)
print(result)

python

Product Name: MacBook Pro 16-inch
Price: $2,499

The facts are correct. But the way they’re arranged changes from run to run — colons one time, dashes the next, bullets after that. If your code tries to parse this, the inconsistency will bite you.

Tip: Zero-shot shines for tasks the model already knows — sorting sentiment, translating text, writing summaries. The moment you need a locked-down format, reach for few-shot examples or structured output.

Two signs you need something stronger:

The format matters. If the result flows into another system, you need it to be the same every time.
The task calls for niche reasoning. Medical, legal, or domain-specific labels need extra guidance.

How Does Few-Shot Prompting Work?

Few-shot prompting tackles the number-one weakness of zero-shot: unreliable formatting. Instead of crossing your fingers that the model picks your preferred layout, you demonstrate the layout with real examples.

The recipe: place 2–5 input-output pairs above your actual question. The model reads the pattern and mirrors it.

Let me redo the sentiment task, this time with a locked format:

python

few_shot_prompt = """Classify the sentiment and confidence.
Use exactly this format:
Sentiment: [positive/negative/neutral]
Confidence: [high/medium/low]

Text: "This is the best phone I've ever owned!"
Sentiment: positive
Confidence: high

Text: "The battery life is decent but nothing special."
Sentiment: neutral
Confidence: medium

Text: "Broke after two days. Complete waste of money."
Sentiment: negative
Confidence: high

Text: "The camera quality is amazing but it overheats."
"""

result = ask_llm(few_shot_prompt)
print(result)

python

Sentiment: negative
Confidence: medium

The reply locks onto the template set by the examples — every run, without fail. That’s the whole point of few-shot: you draw the shape, the model colors it in.

So how many samples should you provide? For most tasks, three is the magic number:

Examples	Effect
1 (one-shot)	Picks up the format but may miss edge cases
2-3	Strong pattern lock, handles tricky inputs
4-5	Diminishing returns, useful for complex tasks
6+	Wastes tokens, rarely boosts quality

Warning: Your examples can plant bias. If every positive example is short and every negative one is long, the model may learn “short = positive.” Mix up the length and style of your samples.

How Do You Pick Good Examples?

How good your samples are matters more than how many you include. Three guidelines:

Guideline 1: Cover the tricky cases. Doing sentiment? Include one mixed-feeling example — not just clear positives and negatives.

Guideline 2: Make every example look the same. If one sample writes “Sentiment:” and the next writes “Sentiment -“, the model has no idea which format to follow.

Guideline 3: Use messy, real-life text. Skip the perfect toy sentences. Feed in inputs that look like what the model will see in the wild.

Quick Check: What would the model return for “It’s fine. Nothing amazing, nothing terrible.” using the few-shot prompt above? Think first. The answer should be Sentiment: neutral / Confidence: high — the text is clearly neutral with strong certainty.

With format issues solved by few-shot, let’s look at a technique that changes the quality and depth of what the model writes.

{
type: ‘exercise’,
id: ‘prompt-eng-ex1’,
title: ‘Exercise 1: Build a Few-Shot Classifier’,
difficulty: ‘beginner’,
exerciseType: ‘write’,
instructions: ‘Create a few-shot prompt that classifies support tickets into categories: billing, technical, account, or general. Test it with the sample ticket provided.’,
starterCode: ‘# Create a few-shot prompt for ticket classification\nfew_shot_prompt = “””\nClassify support tickets into one category: billing, technical, account, or general.\n\n# Add 3 examples here\n\nTicket: “I can\’t log into my account after changing my password.”\nCategory:”””\n\nresult = ask_llm(few_shot_prompt)\nprint(result)’,
testCases: [
{ id: ‘tc1’, input: ‘print(“account” in result.lower())’, expectedOutput: ‘True’, description: ‘Should classify as account’ },
{ id: ‘tc2’, input: ‘print(len(few_shot_prompt) > 200)’, expectedOutput: ‘True’, description: ‘Prompt includes examples (>200 chars)’ },
],
hints: [
‘Add 3 example tickets before the test ticket. One billing, one technical, one account.’,
‘Full format:\nTicket: “Why was I charged twice?”\nCategory: billing\n\nTicket: “The app crashes on upload”\nCategory: technical\n\nTicket: “How do I update my email?”\nCategory: account’,
],
solution: ‘few_shot_prompt = “””\nClassify support tickets into one category: billing, technical, account, or general.\n\nTicket: “Why was I charged twice this month?”\nCategory: billing\n\nTicket: “The app crashes when I upload large files”\nCategory: technical\n\nTicket: “How do I update my email address?”\nCategory: account\n\nTicket: “I can\’t log into my account after changing my password.”\nCategory:”””\n\nresult = ask_llm(few_shot_prompt)\nprint(result)’,
solutionExplanation: ‘Three examples teach the model the classification pattern. Each maps a ticket to one category. The login/password ticket correctly maps to “account.”‘,
xpReward: 15,
}

How Does Role-Based Prompting Work?

You’ve probably seen the “You are a…” line at the top of prompts. That’s role-based prompting — and it does far more than people give it credit for.

By setting a role, you activate a specific pocket of the model’s knowledge. Tell it “You are a senior Python dev” and you get different code style than “You are a data scientist.” The role steers vocabulary, depth, and the assumptions baked into the reply.

python

def ask_with_role(system_prompt, user_prompt, temperature=0.0):
    """Send a prompt with a system role."""
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
        temperature=temperature,
    )
    return response.choices[0].message.content

Here I’ve split the system message (the persona) from the user message (the task). OpenAI’s API reads the system message as standing instructions that color every reply.

Watch how switching the role changes the output. I’ll ask the same question with two different personas:

python

question = "How should I handle missing data in my dataset?"

generic = ask_with_role(
    "You are a helpful assistant.",
    question
)
expert = ask_with_role(
    "You are a senior data scientist at a Fortune 500 company. "
    "Give practical, opinionated advice. Be direct.",
    question
)

print("=== Generic ===")
print(generic[:200])
print("\n=== Expert ===")
print(expert[:200])

The generic reply treats every option the same. The expert takes a position, gives concrete advice, and warns about pitfalls. Exact same question — miles apart in usefulness.

Key Insight: A focused system prompt is like calling in a specialist instead of a generalist. The tighter the role, the sharper and more useful the reply.

Note: System messages differ across LLMs. OpenAI puts them in the `messages` array with `role: “system”`. Anthropic’s Claude uses a separate `system` parameter. Google’s Gemini uses `system_instruction`. The idea is the same — lasting context that shapes every reply — but the API call looks different. Check each provider’s docs.

What Makes a Strong System Prompt?

Vague prompts produce vague results: “You are an expert” gives the model nothing to latch onto. Strong prompts add rules, scope, and a clear persona:

python

system_prompt = """You are a senior data engineer reviewing code
for a production ETL pipeline.

Rules:
- Flag code that won't scale past 1 million rows
- Suggest polars or duckdb when appropriate
- Use concrete numbers in performance estimates
- Be direct. Skip pleasantries."""

More constraints means sharper output. Constraints don’t shrink what the model can do — they point it at the right target.

Now that we’ve covered roles, let’s look at tasks where the model needs to think through several steps before giving an answer.

What Is Chain-of-Thought Prompting?

Not every problem can be solved in one mental leap. When you hand the model a multi-step math or logic puzzle, it sometimes blurts out the wrong answer because it tried to do everything in its head at once.

Chain-of-thought (CoT) prompting is the fix. You ask the model to lay out its reasoning — one step at a time. Think about the difference between shouting a guess across the room and working through the problem on a whiteboard.

python

cot_prompt = """A farmer has 3 fields.
Field A: 120 bushels/acre, 5 acres.
Field B: 95 bushels/acre, 8 acres.
Field C: 110 bushels/acre, 3 acres.

Sells at \(4.50/bushel, transport costs \)1.20/bushel.

What is the total profit?

Think step by step. Show your work."""

result = ask_llm(cot_prompt)
print(result)

The model lays out the work in stages: figure out each field’s harvest, total them up, compute the gross, and take away shipping costs. Each stage is easy to verify. If the final number is wrong, you can find exactly where the logic went off track.

What Is Zero-Shot CoT?

You don’t always have to spell out each step. Sometimes just tacking on “Let’s think step by step” does the trick. That’s zero-shot chain-of-thought.

python

simple = "What is 23 * 17 + 45 - 12 * 3?"
cot = simple + "\n\nLet's think step by step."

print(f"Direct: {ask_llm(simple)}")
print(f"CoT: {ask_llm(cot)}")

Kojima et al. (2022) showed that just appending “Let’s think step by step” lifts reasoning scores across arithmetic, common-sense, and symbolic tasks — no examples needed.

Tip: Reach for chain-of-thought any time the task has multiple steps. Calculations, multi-criteria choices, debugging, logic puzzles — anywhere the middle steps matter.

When Does CoT Hurt More Than Help?

Keep in mind: CoT isn’t free. It uses more tokens and takes longer. For simple labels or data pulls, the extra steps add overhead without lifting accuracy. A good rule: if a human could answer without jotting anything down, you probably don’t need CoT.

Think ahead: You ask: “Is 97 a prime number? Let’s think step by step.” What will the model do? It will test divisibility by 2, 3, 5, 7, then conclude 97 is prime. The step-by-step layout makes the reasoning see-through and easy to verify.

{
type: ‘exercise’,
id: ‘prompt-eng-ex2’,
title: ‘Exercise 2: Chain-of-Thought for Multi-Step Reasoning’,
difficulty: ‘beginner’,
exerciseType: ‘write’,
instructions: ‘Write a chain-of-thought prompt for this problem: “A store offers 20% off on orders over $100. Tax is 8%. Someone buys 3 items at $45 each. What is the final price?” Make the model show each step.’,
starterCode: ‘# Write a CoT prompt for the pricing problem\nproblem = “A store offers 20% off on orders over $100. Tax is 8%. Someone buys 3 items at $45 each. What is the final price?”\n\ncot_prompt = f”””{problem}\n\n# Add CoT instructions here\n”””\n\nresult = ask_llm(cot_prompt)\nprint(result)\nprint(“DONE”)’,
testCases: [
{ id: ‘tc1’, input: ‘print(“step” in cot_prompt.lower() or “think” in cot_prompt.lower())’, expectedOutput: ‘True’, description: ‘Prompt includes step-by-step instructions’ },
{ id: ‘tc2’, input: ‘print(“DONE”)’, expectedOutput: ‘DONE’, description: ‘Code runs successfully’ },
],
hints: [
‘Add “Think step by step” or list numbered steps after the problem.’,
‘Numbered steps: “1. Calculate subtotal\n2. Check if discount applies\n3. Apply discount\n4. Calculate tax\n5. Find final price\n\nShow your work.”‘,
],
solution: ‘problem = “A store offers 20% off on orders over $100. Tax is 8%. Someone buys 3 items at $45 each. What is the final price?”\n\ncot_prompt = f”””{problem}\n\nSolve step by step:\n1. Calculate the subtotal\n2. Check if discount applies\n3. Apply discount if applicable\n4. Calculate tax on the discounted price\n5. Calculate the final price\n\nShow your work.”””\n\nresult = ask_llm(cot_prompt)\nprint(result)\nprint(“DONE”)’,
solutionExplanation: ‘Numbered steps force the model to break the problem apart: subtotal ($135), discount check (yes, over $100), discounted price ($108), tax ($8.64), final price ($116.64). Each step is verifiable.’,
xpReward: 15,
}

How Does Structured Output Work — JSON Mode and Beyond?

So far, every method has given you free-form text. That works when a person reads it. But when your code needs to parse the reply, free text is fragile.

Structured output makes the model reply in a fixed shape — usually JSON. No fluff, no prose, just clean data your program can eat.

OpenAI gives you two ways to do this: JSON mode (simple) and Structured Outputs (strict schema lock).

What Is JSON Mode?

JSON mode makes sure the reply is valid JSON. Turn it on with response_format={"type": "json_object"} and tell the model what fields you want in the prompt.

python

json_prompt = """Extract product info from this review as JSON:

"I bought the Sony WH-1000XM5 headphones for $348.
The noise cancellation is the best I've tried.
Battery lasts about 30 hours."

Fields: product_name, brand, price, key_features (list),
rating_sentiment"""

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": json_prompt}],
    response_format={"type": "json_object"},
    temperature=0.0,
)

data = json.loads(response.choices[0].message.content)
print(json.dumps(data, indent=2))

python

{
  "product_name": "WH-1000XM5",
  "brand": "Sony",
  "price": 348,
  "key_features": [
    "Noise cancellation",
    "30-hour battery life"
  ],
  "rating_sentiment": "positive"
}

The output is always valid JSON. No preamble, no markdown wrapper, no “Here’s the JSON:” prefix. Just raw data you can hand to json.loads().

How Do Structured Outputs with Pydantic Work?

For real apps, JSON mode alone isn’t enough. You want the full package — locked field names, correct types, and a shape that never shifts.

That’s what Structured Outputs gives you. It works with GPT-4o and later models via the client.beta.chat.completions.parse call. (The .beta tag means the API is stable but may evolve in future SDK versions.)

You set up a Pydantic model, and the API makes sure the reply fits it exactly:

python

from pydantic import BaseModel
from typing import Optional

class ProductReview(BaseModel):
    product_name: str
    brand: str
    price: float
    currency: str
    sentiment: str
    key_features: list[str]
    recommendation: bool

response = client.beta.chat.completions.parse(
    model="gpt-4o-mini",
    messages=[
        {"role": "user", "content": (
            "Extract product info from: 'The Dyson V15 at $749 "
            "is expensive but the laser dust detection is a "
            "game-changer. Absolutely worth it.'"
        )}
    ],
    response_format=ProductReview,
)

review = response.choices[0].message.parsed
print(f"Product: {review.product_name}")
print(f"Price: {review.currency}{review.price}")
print(f"Sentiment: {review.sentiment}")
print(f"Recommend: {review.recommendation}")

python

Product: Dyson V15
Price: $749.0
Sentiment: positive
Recommend: True

All fields present. All types correct. price arrives as a float, not a string. recommendation is a Python bool, not “yes.” This is what solid LLM integration looks like.

Warning: Always set `temperature=0.0` for data extraction. Higher temps add randomness that can shift field values between runs. For pulling data, you want the same answer every time.

How Do Prompt Templates Help?

Prompts pasted all over your codebase turn into a mess fast. Once a prompt works well, wrap it in a reusable template.

A template is just a string with blanks you fill at runtime. Here’s an example for data extraction:

python

def create_extraction_prompt(text, fields):
    """Build a reusable extraction prompt."""
    field_list = "\n".join(f"- {field}" for field in fields)
    return f"""Extract information from the text below.

Fields to extract:
{field_list}

Rules:
- If a field is not found, use null
- Return valid JSON only
- No explanations or extra text

Text: "{text}"
"""

prompt = create_extraction_prompt(
    text="John Smith, age 34, works at Google as a Senior "
         "Engineer since 2019.",
    fields=["name", "age", "company", "job_title", "start_year"]
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": prompt}],
    response_format={"type": "json_object"},
    temperature=0.0,
)

data = json.loads(response.choices[0].message.content)
print(json.dumps(data, indent=2))

python

{
  "name": "John Smith",
  "age": 34,
  "company": "Google",
  "job_title": "Senior Engineer",
  "start_year": 2019
}

Change the text and fields values and you’ve built a new extraction pipeline — zero prompt rewriting needed.

Tip: Version your prompt templates. Keep them in a config file or database, not sprinkled through your app code. That way you can A/B test prompts without shipping new code.

How Do Temperature and Parameters Shape Responses?

Temperature sets how much randomness the model adds. It pairs with your prompt in ways that aren’t always clear at first glance.

At temperature=0.0, the model grabs the most likely next token each time. Same input, same output. At temperature=1.0, the model casts a wider net, pulling in less obvious choices.

Here’s what that looks like in practice:

python

creative_prompt = (
    "Write a one-sentence tagline for a coffee shop "
    "called 'Midnight Brew'."
)

for temp in [0.0, 0.5, 1.0]:
    result = ask_llm(creative_prompt, temperature=temp)
    print(f"Temperature {temp}: {result}")

At 0.0, you’ll see the same tagline every run. At 1.0, each call gives you a fresh spin. Neither is “right” — it depends on the job.

How Do You Pick the Right Temperature?

Task Type	Temperature	Why
Data extraction	0.0	Consistency matters, creativity doesn’t
Classification	0.0 – 0.3	Small flex for gray areas
Summarization	0.3 – 0.5	Varied wording, but stay factual
Creative writing	0.7 – 1.0	You want novel phrasing
Brainstorming	0.9 – 1.2	Maximum spread of ideas

Warning: Don’t change temperature AND top_p at the same time. OpenAI says to tweak one or the other. They both control randomness, and stacking them leads to weird results.

What Other Parameters Are Worth Knowing?

max_tokens caps how long the reply can be. Set it too low and you’ll get cut-off sentences. For extraction tasks, 500–1000 tokens is enough. For long-form writing, go 2000–4000.

top_p (nucleus sampling) is a different knob for randomness. It limits the model to tokens whose combined probability hits a threshold. top_p=0.1 means only the top 10% gets considered.

Here’s the hidden link: a sharp prompt lowers the need for low temperature. A solid few-shot prompt gives steady results even at 0.5. A vague prompt needs 0.0 just to keep the model on track.

How Do You Build a Real-World Review Analyzer?

Let’s pull everything together into a production-grade example. We’ll build a review analyzer that uses role-based prompting, structured output, and explicit rules — all at once.

The analyze_review function below calls the client.beta.chat.completions.parse method from the Structured Output section. It sends a system prompt (the role) and user prompt (the review), and gets back a typed Python object:

python

from pydantic import BaseModel

class ReviewAnalysis(BaseModel):
    sentiment: str
    confidence: float
    key_themes: list[str]
    pros: list[str]
    cons: list[str]
    summary: str
    action_items: list[str]

def analyze_review(review_text):
    """Analyze a product review with structured output."""
    system_prompt = """You are a product analyst at an e-commerce
company specializing in customer feedback analysis.

Rules:
- sentiment: exactly positive, negative, mixed, or neutral
- confidence: float between 0.0 and 1.0
- key_themes: 2-5 recurring themes
- Be specific in pros/cons — quote the review
- summary: one sentence, max 20 words
- action_items: what should the product team do?"""

    response = client.beta.chat.completions.parse(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": f"Analyze:\n\n{review_text}"},
        ],
        response_format=ReviewAnalysis,
        temperature=0.0,
    )
    return response.choices[0].message.parsed

Notice how three methods stack in a single function: a persona (system prompt), a locked data shape (Pydantic), and clear guardrails (the rules list).

Time to run it on a real review:

python

sample_review = """
I've been using this standing desk for 6 months now. The motor
is whisper-quiet and the height presets are fantastic. Build
quality feels solid — no wobble even at max height.

However, the cable management tray is too small for a full
setup, and the desktop surface scratches easily. Customer
support took 2 weeks to respond to a warranty question.

Overall worth the $599 price tag, but not perfect.
"""

analysis = analyze_review(sample_review)
print(f"Sentiment: {analysis.sentiment} ({analysis.confidence})")
print(f"Themes: {', '.join(analysis.key_themes)}")
print(f"\nPros:")
for pro in analysis.pros:
    print(f"  + {pro}")
print(f"\nCons:")
for con in analysis.cons:
    print(f"  - {con}")
print(f"\nSummary: {analysis.summary}")
print(f"\nAction items:")
for item in analysis.action_items:
    print(f"  -> {item}")

Every call gives back the exact same shape — same keys, same types, same nesting. You can push this into a database or dashboard with no glue code at all.

And that’s the line between dabbling in prompts and mastering them. The dabbler gets answers. The skilled engineer gets answers that are stable, parseable, and testable.

Which Technique Should You Pick?

With five approaches on the table, choosing matters. Here’s a quick decision tree based on what your task actually needs:

Start here: Can the model already nail this task with no help?
– Yes → Zero-shot. Don’t add layers you don’t need.
– No → Keep reading.

Does the format need to be exact?
– Yes → Few-shot (show samples) or Structured Output (enforce a schema)
– No, free text is fine → Move to the reasoning check

Does the task require multi-step logic?
– Yes → Chain-of-thought. Add “Think step by step” or list the steps out.
– No → Move to the domain check

Does the task call for special knowledge or a certain voice?
– Yes → Role-based prompting with a detailed system prompt
– No → Clear zero-shot instructions should do

Will code — not a human — read the output?
– Yes → Structured Output with Pydantic. No debate for production work.
– No → Any text-based method works

Most real prompts blend 2–3 approaches. The review analyzer above used role + structured output + rules. A complex analysis pipeline might layer role + CoT + few-shot. Start simple. Add methods when the simpler version misses your quality bar.

What Are the Most Common Prompt Engineering Mistakes?

These errors show up over and over in LLM-powered apps. They’re not just bad habits — they cause real failures in production.

Mistake 1: Leaving the Output Format Vague

❌ Wrong:

python

bad = "Tell me about the planets in our solar system."
result = ask_llm(bad)
print(result[:150])

What comes back is a block of unstructured text. Could be numbered, could be paragraphs, could be bullets. If your code expects a table, everything falls apart.

✅ Fixed:

python

good = """List the 8 planets in order from the Sun.
For each: name, type (rocky/gas/ice giant), moon count.
Format as a markdown table."""

result = ask_llm(good)
print(result)

Be explicit about the format you want. The model obeys instructions — but only the ones you actually give it.

Mistake 2: Cramming Too Many Tasks into One Prompt

❌ Wrong:

python

overloaded = """Analyze this review, extract the product name,
determine sentiment, identify issues, suggest a response,
and rate quality 1-10.

Review: 'The headphones sound great but the cushions wore
out after 3 months.'"""

Piling five tasks into one prompt drags down quality across the board. The model tries to juggle everything and ends up doing none of them well.

✅ Fixed: One task per prompt. Chain the results if needed.

python

# Step 1: Extract and classify
extract = """Extract from this review as JSON:
product_type, sentiment, issues (list).

Review: 'The headphones sound great but the cushions
wore out after 3 months.'"""

# Step 2 would use Step 1's output for the response

Mistake 3: Not Giving the Model an Escape Hatch

If the info isn’t in the provided text, the model will invent something. That’s hallucination — and it’s easy to prevent.

❌ Wrong:

python

no_escape = (
    "What is the CEO's favorite color? Context: "
    "'Acme Corp, founded in 2015, makes widgets.'"
)

✅ Fixed:

python

with_escape = """Answer ONLY from the context below.
If the answer isn't in the context, say "Not found."

Context: 'Acme Corp, founded in 2015, makes widgets.'
Question: What is the CEO's favorite color?"""

result = ask_llm(with_escape)
print(result)

python

Not found.

Always let the model say “I don’t know.” That one escape hatch cuts hallucination way down.

Mistake 4: Ignoring Token Limits and Position

Every model has a context window — GPT-4o and GPT-4o-mini cap out at 128K tokens. But longer prompts cost more, run slower, and make the model lose focus.

Research shows that models pay less attention to content buried in the middle of a long input. Material near the top and bottom gets stronger weight.

Key Insight: Put your most critical instructions at the top and bottom of long prompts. Those positions get the most attention from the model. Formatting rules belong up front, not buried in the middle.

Summary

Let’s wrap up. Everything in this post reduces to five core moves:

Technique	When to Use	Key Benefit
Zero-shot	Simple, well-known tasks	No setup cost
Few-shot	Need a locked output format	Consistent results
Role-based	Need domain expertise or a certain voice	Deeper, targeted replies
Chain-of-thought	Multi-step reasoning	Checkable logic
Structured output	Output feeds into code	Schema-enforced data

Start with zero-shot. If the output drifts, add few-shot samples. If the task needs reasoning, layer in CoT. If another program reads the output, enforce a schema. These methods layer on top of each other — and they’re strongest in combination.

Next up, we put these skills to work by building our first LLM-powered app with LangChain.

Practice Exercise

Challenge: Build a multi-technique prompt pipeline

**Task:** Create a function that takes a raw job posting and extracts structured information using at least three techniques.

The function should:
1. Use a role-based system prompt (senior recruiter)
2. Use structured output (Pydantic model)
3. Include chain-of-thought for salary estimation when no salary is stated

**Solution:**

python

from pydantic import BaseModel
from typing import Optional

class JobPosting(BaseModel):
    title: str
    company: str
    location: str
    salary_range: Optional[str]
    required_skills: list[str]
    experience_level: str
    remote_friendly: bool

def extract_job_info(posting_text):
    system = """You are a senior technical recruiter with
15 years of experience.

When salary isn't stated, estimate a range based on:
1. Job title and seniority
2. Location (cost of living)
3. Required skills (specialized = higher pay)
Mark estimates with '(estimated)'.

experience_level: entry, mid, senior, or lead."""

    response = client.beta.chat.completions.parse(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system},
            {"role": "user", "content": f"Extract:\n\n{posting_text}"},
        ],
        response_format=JobPosting,
        temperature=0.0,
    )
    return response.choices[0].message.parsed

job = """
Senior ML Engineer at DataCorp (San Francisco, hybrid)
Must have: Python, PyTorch, MLOps, 5+ years experience
Nice to have: Kubernetes, Ray, Spark
"""

info = extract_job_info(job)
print(f"Title: {info.title}")
print(f"Company: {info.company}")
print(f"Salary: {info.salary_range}")
print(f"Skills: {', '.join(info.required_skills)}")
print(f"Level: {info.experience_level}")

This blends role-based prompting, chain-of-thought (salary reasoning), and structured output in a single reusable function.

Frequently Asked Questions

Does prompt engineering transfer across different LLMs?

The core ideas — be clear, show examples, add structure — apply to every LLM out there. But the fine details shift. GPT-4o handles dense system prompts better than smaller models. Claude follows formatting rules very closely. Open-source options like Llama need more samples to lock in a pattern. Bottom line: always test on the model you plan to ship with.

How do I fix prompts that break on edge cases?

Quickest fix: turn the failing cases into new few-shot examples. If one pattern keeps tripping up, write a constraint that tackles it head-on. In production, validate every reply — confirm the JSON parses, key fields exist, and values fall in the expected range. Reject and re-run when any check fails.

python

# Example: Retry logic for structured output
import json

def safe_extract(prompt, max_retries=3):
    for attempt in range(max_retries):
        result = ask_llm(prompt)
        try:
            data = json.loads(result)
            if "name" in data and "age" in data:
                return data
        except json.JSONDecodeError:
            continue
    return None

What’s the difference between JSON mode and Structured Outputs?

JSON mode only promises valid JSON — it won’t enforce a schema. So the model might return {"answer": "yes"} one call and {"result": true} the next. Both are valid JSON, but your code expects one specific layout. Structured Outputs with Pydantic lock down both the format and the field definitions. Names and types match your model every time. For any real app, always choose Structured Outputs.

Will prompt engineering become obsolete as models get smarter?

Models are getting better at handling fuzzy input, yes. But structured prompts solve engineering problems, not just model limits. Schema enforcement, clear roles, and step-by-step traces bring a level of reliability that vague inputs will never match. Even the strongest 2026 models do better work when you hand them a well-built prompt.

References

OpenAI — Prompt engineering guide. Link
OpenAI — Structured Outputs documentation. Link
Kojima, T., et al. — “Large Language Models are Zero-Shot Reasoners.” NeurIPS 2022. Link
Wei, J., et al. — “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.” NeurIPS 2022. Link
Brown, T., et al. — “Language Models are Few-Shot Learners.” NeurIPS 2020. Link
DAIR.AI — Prompt Engineering Guide. Link
OpenAI — Chat Completions API reference. Link
OpenAI — Best practices for prompt engineering. Link
Anthropic — Prompt engineering documentation. Link
Google — Gemini API prompting guide. Link

Complete Code

Click to expand the full script (copy-paste and run)

python

# Complete code from: Prompt Engineering Fundamentals
# Requires: pip install openai pydantic
# Python 3.9+
# Set OPENAI_API_KEY environment variable before running

from openai import OpenAI
import json
import os
from pydantic import BaseModel
from typing import Optional

client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

# --- Helper Functions ---

def ask_llm(prompt, model="gpt-4o-mini", temperature=0.0):
    """Send a prompt to the LLM and return the response."""
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=temperature,
    )
    return response.choices[0].message.content

def ask_with_role(system_prompt, user_prompt, temperature=0.0):
    """Send a prompt with a system role."""
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt},
        ],
        temperature=temperature,
    )
    return response.choices[0].message.content

# --- Zero-Shot ---

result = ask_llm(
    "Classify the sentiment as positive, negative, or neutral: "
    "'The food was okay but the service was terrible.'"
)
print("Zero-shot:", result)

# --- Few-Shot ---

few_shot_prompt = """Classify the sentiment and confidence.
Use exactly this format:
Sentiment: [positive/negative/neutral]
Confidence: [high/medium/low]

Text: "This is the best phone I've ever owned!"
Sentiment: positive
Confidence: high

Text: "The battery life is decent but nothing special."
Sentiment: neutral
Confidence: medium

Text: "Broke after two days. Complete waste of money."
Sentiment: negative
Confidence: high

Text: "The camera quality is amazing but it overheats."
"""

result = ask_llm(few_shot_prompt)
print("\nFew-shot:", result)

# --- Role-Based ---

expert = ask_with_role(
    "You are a senior data scientist. Give practical advice.",
    "How should I handle missing data in my dataset?"
)
print("\nExpert:", expert[:200])

# --- Chain-of-Thought ---

cot = """A farmer has 3 fields.
Field A: 120 bushels/acre, 5 acres.
Field B: 95 bushels/acre, 8 acres.
Field C: 110 bushels/acre, 3 acres.

Sells at \(4.50/bushel, transport costs \)1.20/bushel.

What is the total profit? Think step by step."""

print("\nCoT:", ask_llm(cot))

# --- JSON Mode ---

json_prompt = """Extract product info as JSON:

"I bought the Sony WH-1000XM5 headphones for $348.
Noise cancellation is incredible. Battery lasts 30 hours."

Fields: product_name, brand, price, key_features, sentiment"""

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": json_prompt}],
    response_format={"type": "json_object"},
    temperature=0.0,
)
print("\nJSON:", json.dumps(json.loads(
    response.choices[0].message.content), indent=2))

# --- Structured Output ---

class ReviewAnalysis(BaseModel):
    sentiment: str
    confidence: float
    key_themes: list[str]
    pros: list[str]
    cons: list[str]
    summary: str
    action_items: list[str]

def analyze_review(review_text):
    system = """You are a product analyst.
Rules:
- sentiment: positive, negative, mixed, or neutral
- confidence: float 0.0 to 1.0
- summary: one sentence, max 20 words"""

    response = client.beta.chat.completions.parse(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": system},
            {"role": "user", "content": f"Analyze:\n\n{review_text}"},
        ],
        response_format=ReviewAnalysis,
        temperature=0.0,
    )
    return response.choices[0].message.parsed

review = """Standing desk, 6 months in. Motor is quiet, presets
are great. But cable tray is too small and surface scratches.
Worth $599 but not perfect."""

a = analyze_review(review)
print(f"\nAnalysis: {a.sentiment} ({a.confidence})")
print(f"Summary: {a.summary}")

print("\nScript completed successfully.")

Free Course

Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course →

Trusted by 50,000+ learners

Written by

Selva Prabhakaran →

Related Course

Master Gen AI — Hands-On

Join 5,000+ students at edu.machinelearningplus.com

Explore Course

Prompt Engineering Fundamentals — Reliable LLM Outputs

What Is Prompt Engineering?

Prerequisites

What Is Zero-Shot Prompting?

How Does Few-Shot Prompting Work?

How Do You Pick Good Examples?

How Does Role-Based Prompting Work?

What Makes a Strong System Prompt?

What Is Chain-of-Thought Prompting?

What Is Zero-Shot CoT?

When Does CoT Hurt More Than Help?

How Does Structured Output Work — JSON Mode and Beyond?

What Is JSON Mode?

How Do Structured Outputs with Pydantic Work?

How Do Prompt Templates Help?

How Do Temperature and Parameters Shape Responses?

How Do You Pick the Right Temperature?

What Other Parameters Are Worth Knowing?

How Do You Build a Real-World Review Analyzer?

Which Technique Should You Pick?

What Are the Most Common Prompt Engineering Mistakes?

Mistake 1: Leaving the Output Format Vague

Mistake 2: Cramming Too Many Tasks into One Prompt

Mistake 3: Not Giving the Model an Escape Hatch

Mistake 4: Ignoring Token Limits and Position

Summary

Practice Exercise

Frequently Asked Questions

Does prompt engineering transfer across different LLMs?

How do I fix prompts that break on edge cases?

What’s the difference between JSON mode and Structured Outputs?

Will prompt engineering become obsolete as models get smarter?

References

Complete Code

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

What Is Prompt Engineering?

Prerequisites

What Is Zero-Shot Prompting?

How Does Few-Shot Prompting Work?

How Do You Pick Good Examples?

How Does Role-Based Prompting Work?

What Makes a Strong System Prompt?

What Is Chain-of-Thought Prompting?

What Is Zero-Shot CoT?

When Does CoT Hurt More Than Help?

How Does Structured Output Work — JSON Mode and Beyond?

What Is JSON Mode?

How Do Structured Outputs with Pydantic Work?

How Do Prompt Templates Help?

How Do Temperature and Parameters Shape Responses?

How Do You Pick the Right Temperature?

What Other Parameters Are Worth Knowing?

How Do You Build a Real-World Review Analyzer?

Which Technique Should You Pick?

What Are the Most Common Prompt Engineering Mistakes?

Mistake 1: Leaving the Output Format Vague

Mistake 2: Cramming Too Many Tasks into One Prompt

Mistake 3: Not Giving the Model an Escape Hatch

Mistake 4: Ignoring Token Limits and Position

Summary

Practice Exercise

Frequently Asked Questions

Does prompt engineering transfer across different LLMs?

How do I fix prompts that break on edge cases?

What’s the difference between JSON mode and Structured Outputs?

Will prompt engineering become obsolete as models get smarter?

References

Complete Code

Related Articles

LLM Temperature, Top-P, and Top-K Explained — With Python Simulations

OpenAI API Python Tutorial — Chat Completions, Streaming & Error Handling

Zero-Shot vs Few-Shot Prompting: Complete Guide

Python.SQL. NumPy. All free.

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Python.
SQL. NumPy.
All free.