machine learning +
Build a Multi-Provider LLM Toolkit (Python Project)
Gemini API Tutorial: Multimodal AI in Python
Build a multimodal document analyzer with the Google Gemini API in Python. Analyze images, PDFs, and text with structured JSON output — using raw HTTP requests.
Send text, images, and PDFs to one API endpoint and get structured answers back — using raw HTTP requests you can run anywhere.
This post has interactive code — click ‘Run’ or press Ctrl+Enter on any code block to execute it directly in your browser. The first run may take a few seconds to initialize.
You hand an AI model a scanned invoice, a product photo, and a paragraph of text. It reads all three and answers your question in valid JSON. No separate OCR step. No image-to-text pipeline. One API call handles it all.
That’s the Gemini API’s superpower — native multimodal input. In this article, you’ll build a document analyzer that uses it.
You’ll send text prompts, analyze images, and extract data from PDFs. You’ll configure safety filters, ground responses with live Google Search, and force JSON replies.
We’ll use raw HTTP requests to generativelanguage.googleapis.com. No SDK needed. Every code block runs in Pyodide or any standard Python environment.
What Is the Gemini API?
The Gemini API is Google’s way to access its Gemini family of large language models. What makes Gemini different from text-only models? It was trained on text, images, audio, and video from day one. Multimodal understanding isn’t an add-on. It’s part of the core design.
You talk to it through a single REST endpoint. Send a JSON payload with your content — text, base64 images, PDF data — and get JSON back.
python
https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent
You authenticate with an API key as a query parameter. No OAuth flows or service accounts needed for basic usage.
KEY INSIGHT: Unlike text-only APIs, Gemini’s
contentsarray can mix text and binary data in one request. Describe what you want in text, attach the file as base64, and the model reasons across both at once.
Setting Up: API Key and First Request
Before writing code, you need a Gemini API key. I recommend getting this done first — nothing worse than writing code and then waiting for key provisioning.
Prerequisites
- Python version: 3.9+
- Required libraries: None beyond the standard library (
urllib.requestandjson) - API key: Free at aistudio.google.com
- Time to complete: 25-30 minutes
Getting Your API Key
Go to Google AI Studio and click “Create API Key.” Copy it and store it safely. You’ll pass it as a query parameter in every request.
WARNING: Never hardcode API keys in scripts you share or commit. Use environment variables or a
.envfile. Here, we assign it to a variable at the top so you can swap it easily.
Let’s make our first call. The code below builds a JSON payload with a text prompt. It sends an HTTP POST to the Gemini API and parses the response. We use urllib.request so it works without pip installs — even inside Pyodide.
import urllib.request
import json
import base64
# Replace with your actual API key
GEMINI_API_KEY = "YOUR_API_KEY_HERE"
MODEL = "gemini-2.5-flash"
BASE_URL = f"https://generativelanguage.googleapis.com/v1beta/models/{MODEL}"
def gemini_request(endpoint, payload):
"""Send a POST request to the Gemini API and return parsed JSON."""
url = f"{BASE_URL}:{endpoint}?key={GEMINI_API_KEY}"
data = json.dumps(payload).encode("utf-8")
req = urllib.request.Request(
url, data=data,
headers={"Content-Type": "application/json"}
)
with urllib.request.urlopen(req) as resp:
return json.loads(resp.read().decode("utf-8"))
# Simple text generation
payload = {
"contents": [{
"parts": [{"text": "What are the three main types of machine learning? One sentence each."}]
}]
}
result = gemini_request("generateContent", payload)
print(result["candidates"][0]["content"]["parts"][0]["text"])
The response has a candidates array. Each candidate has a content object with parts. For text, the first part’s text field holds the answer.
Notice the request body. The contents array holds conversation turns. Each turn has a parts array. A part can be text, an image, or a file. This structure powers everything we’ll build.
Here’s the full shape of a request — keep this mental map handy:
# Anatomy of a generateContent request (reference — not runnable)
request_shape = {
"contents": [{"role": "user", "parts": [...]}], # Conversation turns
"generationConfig": {"temperature": 0.7}, # Output controls
"safetySettings": [...], # Content filters
"tools": [...] # Search grounding, etc.
}
Four keys matter. contents holds the conversation. generationConfig controls temperature and token limits. safetySettings sets filter thresholds. tools enables search grounding. We’ll use all four.
TIP: Set
temperaturebetween 0.0 and 0.3 for factual extraction. Use 0.7-1.0 for creative tasks like brainstorming.
Multi-Turn Conversations with the Gemini API
So far we’ve sent single-turn requests. But what if you need the model to remember context? Maybe you upload a document, ask a question, then ask a follow-up.
The contents array supports multiple turns. Each turn has a role — either "user" or "model". Include the model’s previous response in the array. The model sees the full history and answers in context.
payload = {
"contents": [
{"role": "user", "parts": [{"text": "What is pandas in Python?"}]},
{"role": "model", "parts": [{"text": "Pandas is a data analysis library for Python."}]},
{"role": "user", "parts": [{"text": "What's its most important data structure?"}]}
]
}
result = gemini_request("generateContent", payload)
print(result["candidates"][0]["content"]["parts"][0]["text"])
The model builds on context. It knows “its” refers to pandas because it can see the full exchange. You can mix images into any turn too. Send an image in turn 1, then ask text questions in turns 2 and 3.
Handling Gemini API Errors
API calls fail. Rate limits, bad keys, server errors — you’ll hit all of them eventually. Here’s how to handle the common ones gracefully.
The Gemini API returns standard HTTP status codes. Common ones: 400 (bad request), 403 (invalid key), 429 (rate limit), and 500 (server error). Let’s wrap our request function with error handling using urllib.error.
from urllib.error import HTTPError, URLError
def safe_gemini_request(endpoint, payload):
"""Gemini API call with error handling."""
url = f"{BASE_URL}:{endpoint}?key={GEMINI_API_KEY}"
data = json.dumps(payload).encode("utf-8")
req = urllib.request.Request(
url, data=data, headers={"Content-Type": "application/json"}
)
try:
with urllib.request.urlopen(req) as resp:
return json.loads(resp.read().decode("utf-8"))
except HTTPError as e:
body = e.read().decode("utf-8", errors="replace")
print(f"HTTP {e.code}: {body[:200]}")
return None
except URLError as e:
print(f"Connection error: {e.reason}")
return None
# Test with an intentionally bad key
result = safe_gemini_request("generateContent", payload)
if result:
print(result["candidates"][0]["content"]["parts"][0]["text"])
else:
print("Request failed — check the error above.")
TIP: For rate limit errors (429), wait and retry. Google’s free tier allows roughly 15 requests per minute. In production, add exponential backoff — wait 1 second, then 2, then 4.
Analyzing Images with the Gemini API
Here’s where Gemini’s multimodal capability earns its name. Send an image with a text prompt, and the model reasons about both together. No separate vision API. No preprocessing.
How do you send an image? Encode it as base64 and add it as an inline_data part next to your text prompt. The mime_type tells Gemini what kind of file it is. Let’s download a sample image, encode it, and ask Gemini to describe it.
# Download a sample image for analysis
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/ARA_San_Juan_search_area.jpg/640px-ARA_San_Juan_search_area.jpg"
img_req = urllib.request.Request(image_url, headers={"User-Agent": "Mozilla/5.0"})
with urllib.request.urlopen(img_req) as resp:
image_bytes = resp.read()
image_base64 = base64.b64encode(image_bytes).decode("utf-8")
payload = {
"contents": [{
"parts": [
{"text": "Describe this image in 2-3 sentences."},
{"inline_data": {"mime_type": "image/jpeg", "data": image_base64}}
]
}]
}
result = gemini_request("generateContent", payload)
print(result["candidates"][0]["content"]["parts"][0]["text"])
Text and image sit side by side in the same parts array. Gemini sees them as one context. Ask “What color is the largest object?” or “How many people are visible?” and it answers from what it sees.
What about multiple images? Just add more inline_data parts. Both images go into the same array. Gemini handles them together.
# Compare two images in one request (using same image twice as demo)
payload = {
"contents": [{
"parts": [
{"text": "I'm sending two images. How do they differ?"},
{"inline_data": {"mime_type": "image/jpeg", "data": image_base64}},
{"inline_data": {"mime_type": "image/jpeg", "data": image_base64}}
]
}]
}
result = gemini_request("generateContent", payload)
print(result["candidates"][0]["content"]["parts"][0]["text"])
When I first tested this with invoice scans, the results surprised me. Gemini read handwritten totals, spotted logos, and matched line items — all in one call.
COMMON MISTAKE: Sending an image without the correct
mime_typecauses the API to reject the request or produce garbage. Always match it:image/jpegfor JPGs,image/pngfor PNGs,image/webpfor WebP.
Now you know how to analyze images. Let’s see if you can build on that.
Exercise 1: Build an Image Captioner
Write a function called caption_image that takes a base64-encoded image and returns a JSON object with two fields: caption (one-sentence description) and mood (emotional tone). Use gemini_request and structured output.
Analyzing PDFs with the Gemini API
PDFs are where a document analyzer earns its keep. Gemini accepts PDF data directly — no need to convert pages to images first. Encode the bytes as base64, set MIME type to application/pdf, and send it like an image.
The model reads text, sees charts, and gets page layouts. For a data team, this means you can pull tables from papers or summarize a 50-page spec — all in one API call.
Here’s how. We’ll download a public PDF, encode it, and ask Gemini to extract information. The inline_data approach is identical to images. Only the mime_type changes.
# Download a sample PDF
pdf_url = "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
pdf_req = urllib.request.Request(pdf_url, headers={"User-Agent": "Mozilla/5.0"})
with urllib.request.urlopen(pdf_req) as resp:
pdf_bytes = resp.read()
pdf_base64 = base64.b64encode(pdf_bytes).decode("utf-8")
payload = {
"contents": [{
"parts": [
{"text": "Read this PDF and list its contents in 2-3 bullet points."},
{"inline_data": {"mime_type": "application/pdf", "data": pdf_base64}}
]
}]
}
result = gemini_request("generateContent", payload)
print(result["candidates"][0]["content"]["parts"][0]["text"])
You can get very specific with your prompts. Try “Extract every date and dollar amount” or “List all section headings.” The model handles structured extraction well — especially with JSON mode (coming up next).
KEY INSIGHT: Gemini processes PDFs using vision — it “sees” each page as an image while also reading embedded text. Scanned documents work just as well as digital ones.
Gemini API Safety Settings
Every Gemini response passes through content safety filters. By default, the API blocks content it sees as harmful. Sometimes the filters are too strict for real work — like analyzing medical docs or security reports.
You control the threshold for each category. Here are the four categories:
| Category | What It Catches |
|---|---|
HARM_CATEGORY_HARASSMENT | Threats, bullying, targeted insults |
HARM_CATEGORY_HATE_SPEECH | Slurs, discrimination |
HARM_CATEGORY_SEXUALLY_EXPLICIT | Sexual content |
HARM_CATEGORY_DANGEROUS_CONTENT | Instructions for harm, weapons |
Each accepts a threshold: BLOCK_NONE, BLOCK_ONLY_HIGH, BLOCK_MEDIUM_AND_ABOVE, or BLOCK_LOW_AND_ABOVE. Default is BLOCK_MEDIUM_AND_ABOVE.
Let’s lower the filter to block only the worst content. The safetySettings array goes at the top level, next to contents. This helps with security topics that would trip the default filters.
payload = {
"contents": [{
"parts": [{"text": "Explain common cybersecurity attack vectors and defenses."}]
}],
"safetySettings": [
{"category": "HARM_CATEGORY_HARASSMENT", "threshold": "BLOCK_ONLY_HIGH"},
{"category": "HARM_CATEGORY_HATE_SPEECH", "threshold": "BLOCK_ONLY_HIGH"},
{"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT", "threshold": "BLOCK_ONLY_HIGH"},
{"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_ONLY_HIGH"}
]
}
result = gemini_request("generateContent", payload)
print(result["candidates"][0]["content"]["parts"][0]["text"][:500])
If a response gets blocked, the candidates array has a finishReason of "SAFETY" instead of "STOP". Always check for this in production.
WARNING:
BLOCK_NONEremoves all safety filtering. Only use it in controlled environments. Google may revoke API access if generated content violates their policies — even with filters disabled.
Gemini API Grounding with Google Search
Every LLM has a training data cutoff. Ask about yesterday’s stock price, and the model either refuses or makes something up. Sound familiar?
Grounding fixes this. Add a tools parameter, and Gemini searches Google in real time before answering. The response includes citations you can verify.
Why does this matter for document analysis? Say you’re reading a company’s annual report and want to check their revenue against the latest public figure. Grounding lets the model fetch live data for that comparison.
The setup is minimal. Add "tools": [{"google_search": {}}] to the request body. Gemini decides when to search based on your prompt.
payload = {
"contents": [{
"parts": [{"text": "What is Google's current stock price and market cap?"}]
}],
"tools": [{"google_search": {}}]
}
result = gemini_request("generateContent", payload)
text = result["candidates"][0]["content"]["parts"][0]["text"]
print(text[:500])
# Check for grounding metadata
if "groundingMetadata" in result["candidates"][0]:
sources = result["candidates"][0]["groundingMetadata"]
print("\n--- Grounding Sources ---")
print(json.dumps(sources, indent=2)[:500])
The response includes groundingMetadata — the search queries and web sources Gemini cited. Log these for transparency.
One thing to know: grounding adds latency. Each grounded request takes 2-5 extra seconds for the search step. For bulk processing, only turn it on when you need current data.
TIP: Combine grounding with multimodal input. Send a PDF of a quarterly report plus
google_search, and ask Gemini to fact-check the numbers. It reads the PDF, searches the web, and gives you a comparison.
Gemini Structured Output: JSON Mode
Free-text responses are fine for chatbots. For a document analyzer, you want structured data — JSON you can feed into a database or dashboard.
Gemini handles this natively. Set responseMimeType to "application/json" in generationConfig. Add a schema if you want. The model returns valid JSON that matches your schema. No regex needed.
Here’s the real power: combine structured output with images or PDFs. Send an image and tell Gemini to fill a JSON template. The responseSchema defines field names and types. Gemini fills in the values from what it sees.
payload = {
"contents": [{
"parts": [
{"text": "Analyze this image and extract information."},
{"inline_data": {"mime_type": "image/jpeg", "data": image_base64}}
]
}],
"generationConfig": {
"responseMimeType": "application/json",
"responseSchema": {
"type": "object",
"properties": {
"description": {"type": "string"},
"objects_detected": {
"type": "array",
"items": {"type": "string"}
},
"dominant_colors": {
"type": "array",
"items": {"type": "string"}
},
"setting": {"type": "string"}
},
"required": ["description", "objects_detected"]
}
}
}
result = gemini_request("generateContent", payload)
data = json.loads(result["candidates"][0]["content"]["parts"][0]["text"])
print(json.dumps(data, indent=2))
The model returns a JSON string that fits your schema. Parse it with json.loads() to get a Python dict. Every field matches the type you asked for.
You can skip responseSchema and just describe the format in your prompt. But the schema is more reliable. I prefer it because the model can’t stray from your structure.
KEY INSIGHT: Structured output turns Gemini from a chatbot into a data extraction engine. With multimodal input, you build pipelines that take raw documents and output clean, typed records — no post-processing.
Exercise 2: Multi-Document Comparison
Write a function called compare_pdfs that takes two PDFs (base64 strings) and a comparison question. Send both in one request and return the model’s text analysis.
Building the Complete Document Analyzer
Let’s tie everything together. We’ll build a DocumentAnalyzer class that combines multimodal input, safety settings, optional grounding, and structured output.
The class wraps gemini_request with one method per document type. analyze_text handles plain text. analyze_image takes base64 image data. analyze_pdf takes base64 PDF data. Each method takes an optional JSON schema.
class DocumentAnalyzer:
"""Multimodal document analyzer powered by the Gemini API."""
def __init__(self, api_key, model="gemini-2.5-flash"):
self.api_key = api_key
self.model = model
self.base_url = (
f"https://generativelanguage.googleapis.com/v1beta/models/{model}"
)
self.safety_settings = [
{"category": c, "threshold": "BLOCK_MEDIUM_AND_ABOVE"}
for c in [
"HARM_CATEGORY_HARASSMENT",
"HARM_CATEGORY_HATE_SPEECH",
"HARM_CATEGORY_SEXUALLY_EXPLICIT",
"HARM_CATEGORY_DANGEROUS_CONTENT"
]
]
def _request(self, payload):
url = f"{self.base_url}:generateContent?key={self.api_key}"
data = json.dumps(payload).encode("utf-8")
req = urllib.request.Request(
url, data=data,
headers={"Content-Type": "application/json"}
)
with urllib.request.urlopen(req) as resp:
return json.loads(resp.read().decode("utf-8"))
That’s the base — init and a private request method. Next come the payload builder and the three analysis methods. _build_payload puts together contents, safetySettings, and optional config.
def _build_payload(self, parts, schema=None, grounding=False):
payload = {
"contents": [{"parts": parts}],
"safetySettings": self.safety_settings
}
if schema:
payload["generationConfig"] = {
"responseMimeType": "application/json",
"responseSchema": schema
}
if grounding:
payload["tools"] = [{"google_search": {}}]
return payload
def analyze_text(self, prompt, schema=None, grounding=False):
parts = [{"text": prompt}]
result = self._request(self._build_payload(parts, schema, grounding))
return self._parse(result, schema)
def analyze_image(self, prompt, img_b64, mime="image/jpeg", schema=None):
parts = [
{"text": prompt},
{"inline_data": {"mime_type": mime, "data": img_b64}}
]
result = self._request(self._build_payload(parts, schema))
return self._parse(result, schema)
def analyze_pdf(self, prompt, pdf_b64, schema=None):
parts = [
{"text": prompt},
{"inline_data": {"mime_type": "application/pdf", "data": pdf_b64}}
]
result = self._request(self._build_payload(parts, schema))
return self._parse(result, schema)
def _parse(self, result, schema=None):
candidate = result["candidates"][0]
if candidate.get("finishReason") == "SAFETY":
return {"error": "Response blocked by safety filters"}
text = candidate["content"]["parts"][0]["text"]
return json.loads(text) if schema else text
Quick test to confirm it works:
analyzer = DocumentAnalyzer(GEMINI_API_KEY)
print(analyzer.analyze_text("What is the capital of France?"))
Now let’s use it for a real task. We’ll analyze the image we downloaded earlier with a structured schema. The schema tells Gemini what fields to return. analyze_image handles the rest.
image_schema = {
"type": "object",
"properties": {
"scene_description": {"type": "string"},
"key_elements": {"type": "array", "items": {"type": "string"}},
"use_case": {"type": "string"}
},
"required": ["scene_description", "key_elements"]
}
result = analyzer.analyze_image(
prompt="Analyze this image for a content management system.",
img_b64=image_base64,
schema=image_schema
)
print(json.dumps(result, indent=2))
That’s a working document analyzer in about 80 lines. It takes text, images, and PDFs. It gives back structured JSON. It runs anywhere Python runs — no SDK needed.
Common Mistakes and How to Fix Them
Mistake 1: Forgetting to base64-encode binary data
❌ Wrong:
# Passing raw bytes — JSON can't serialize this
payload = {"contents": [{"parts": [
{"inline_data": {"mime_type": "image/png", "data": image_bytes}}
]}]}
# Result: TypeError or API error
Why it breaks: The data field needs a base64 string. Raw bytes can’t go into JSON.
✅ Correct:
image_base64 = base64.b64encode(image_bytes).decode("utf-8")
payload = {"contents": [{"parts": [
{"inline_data": {"mime_type": "image/png", "data": image_base64}}
]}]}
Mistake 2: Not checking the finish reason
❌ Wrong:
# Assumes the response always has text content text = result["candidates"][0]["content"]["parts"][0]["text"]
Why it breaks: If safety filters block the response, content may be missing. Your code crashes with a KeyError.
✅ Correct:
candidate = result["candidates"][0]
if candidate.get("finishReason") == "SAFETY":
print("Response blocked by safety filters")
else:
print(candidate["content"]["parts"][0]["text"])
Mistake 3: Using v1 instead of v1beta
❌ Wrong:
# v1 endpoint — missing newer features url = "https://generativelanguage.googleapis.com/v1/models/gemini-2.5-flash:generateContent"
Why it fails: Grounding, structured output schemas, and the latest models live only in v1beta.
✅ Correct:
url = "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent"
Gemini vs GPT-4 vs Claude: Multimodal API Comparison
How does Gemini stack up against other multimodal APIs? Here’s a quick comparison for document analysis tasks.
| Feature | Gemini 2.5 Flash | GPT-4o (OpenAI) | Claude 3.5 Sonnet |
|---|---|---|---|
| Text input | Yes | Yes | Yes |
| Image input | Yes (base64 + URI) | Yes (base64 + URL) | Yes (base64) |
| Native PDF input | Yes | No (convert to images) | Yes (beta) |
| Audio/video input | Yes | Audio only (GPT-4o) | No |
| JSON schema enforcement | Yes (responseSchema) | Yes (response_format) | Yes (tool use) |
| Search grounding | Built-in (google_search) | Requires plugins/tools | No built-in |
| Free tier | 500 req/day | No free tier | Limited free tier |
| Raw HTTP (no SDK) | Yes | Yes | Yes |
Gemini’s biggest edge: native PDF support and built-in search grounding. If your documents mix text and images, Gemini needs less prep work than the others.
GPT-4o has more third-party tools. Claude is great at long-context tasks. Pick based on what you need most.
When NOT to Use the Gemini API
Gemini is powerful, but it’s not always the right tool.
High-volume, low-latency processing. Need to process 10,000 images per minute? The API’s per-request latency (1-10 seconds) becomes a wall. Use dedicated models — YOLO for detection, Tesseract for OCR.
Need for exact same output every time. Responses vary between identical requests. If rules demand byte-identical results, use rule-based extraction or templates.
Data that can’t leave your network. Every request hits Google’s servers. For HIPAA or GDPR data, you need on-premise tools. Vertex AI offers data residency, but it’s a different (and pricier) product.
Simple documents with known layouts. If every invoice follows the same template, regex or PyMuPDF is faster, cheaper, and more reliable.
TIP: I use Gemini for the “messy middle” — documents too varied for rules but too few to justify a custom model. That’s where multimodal AI earns its cost.
Summary
You’ve built a multimodal document analyzer with the Gemini API’s raw HTTP endpoint. Here’s what you covered:
- Text generation — Sending prompts and parsing the
candidatesresponse. - Multi-turn conversations — Using
rolefields to maintain context across turns. - Error handling — Catching HTTP errors gracefully with status-specific responses.
- Image analysis — Base64 encoding images and mixing them with text in
parts. - PDF processing — Same as images, with
mime_type: "application/pdf". - Safety settings — Adjusting filter thresholds per harm category.
- Grounding — Adding
tools: [{"google_search": {}}]for live web data. - Structured output — Forcing JSON with
responseMimeTypeandresponseSchema. - DocumentAnalyzer class — A reusable wrapper combining all capabilities.
Practice Exercise
Build a “Research Assistant” function. It takes a topic, uses grounding to search the web, and returns JSON with summary (2-3 sentences), key_facts (string array), and sources (URLs from grounding metadata).
Complete Code
Frequently Asked Questions
How much does the Gemini API cost?
Gemini 2.5 Flash has a free tier: 500 requests per day. Beyond that, it costs about $0.15 per million input tokens and $0.60 per million output tokens. One page image uses about 258 tokens. Check Google’s pricing page — rates change often.
Can Gemini process audio and video files?
Yes. Use the same inline_data approach with mime_type set to audio/mp3, audio/wav, video/mp4, or similar. For files over 20MB, upload through the File API first and reference the file URI instead.
What’s the maximum size for inline_data?
About 20MB of decoded data. For bigger files, use the File API. Upload the file, get a URI back, then use {"file_data": {"file_uri": "...", "mime_type": "..."}} in your request.
How does Gemini compare to GPT-4 Vision for documents?
Gemini has three edges: native PDF support, built-in Google Search grounding, and strict JSON schemas. GPT-4 Vision needs extra tools for grounding and can’t take PDFs directly.
Is the v1beta endpoint stable enough for production?
Google recommends v1beta for development and stands behind its stability. The v1 stable endpoint exists but lacks newer features. Check the Gemini docs for the latest.
References
- Google AI for Developers — Gemini API Documentation. Link
- Google — Generating Content with the Gemini API. Link
- Google — Document Understanding with Gemini. Link
- Google — Structured Output in Gemini. Link
- Google — Safety Settings and Content Filtering. Link
- Google — Grounding with Google Search. Link
- Google — Gemini API REST Quickstart. Link
- Google — Gemini Models Overview. Link
- Google Gemini Cookbook — GitHub. Link
Free Course
Master Core Python — Your First Step into AI/ML
Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.
Start Free Course →Trusted by 50,000+ learners
Related Course
Master Gen AI — Hands-On
Join 5,000+ students at edu.machinelearningplus.com
Explore Course
