Ollama Tutorial: Your Guide to running LLMs Locally

A complete guide to Ollama — run LLMs like Llama 3, Mistral, and Gemma locally. Covers installation, model management, prompting, API usage, and customization.

Written by Selva Prabhakaran | 5 min read

Ollama is a tool used to run the open-weights large language models locally. It’s quick to install, pull the LLM models and start prompting in your terminal / command prompt.

This tutorial should serve as a good reference for anything you wish to do with Ollama, so bookmark it and let’s get started.

What is Ollama?

Ollama is an open-source tool that simplifies running LLMs like Llama 3.2, Mistral, or Gemma locally on your computer. It supports macOS, Linux, and Windows and provides a command-line interface, API, and integration with tools like LangChain. Running models locally ensures privacy, reduces reliance on cloud services, and allows customization.

It can run on Mac, Linux or Windows, no problem, but since we are going to be downloading LLMs and loading up the LLM to your local, having a large RAM helps. So, here’s what will be good to have:

Hardware: At least 8GB RAM for smaller models (e.g., 7B parameters); 16GB+ recommended for larger models. A GPU (NVIDIA/AMD) is optional but improves performance.
Disk Space: Models range from 1-50GB depending on size.

Step 1: Install Ollama

Download Ollama:
- Visit the official Ollama website
- Click the “Download” button for your operating system (macOS, Linux, or Windows).
- For macOS/Windows, download the installer. For Linux, use the provided script:
  bash
```
curl -fsSL https://ollama.com/install.sh | sh
```
Install Ollama:
- macOS/Windows: Run the downloaded installer and follow the on-screen instructions.
- Linux: The script above installs Ollama automatically. If you need a specific version, set the OLLAMA_VERSION environment variable (e.g., OLLAMA_VERSION=0.1.15).
- Verify installation by opening a terminal and running:
  bash ollama
  This displays available commands (e.g., serve, run, list).
Check GPU Support (Optional):
The typical behaviour is for Ollama to auto-detect NVIDIA/AMD GPUs if drivers are installed. For CUDA on Linux, ensure drivers are set up (run nvidia-smi to verify). CPU-only mode works but is slower for larger models.

Step 2: Download and Run a Model

Explore Available Models:
Visit the Ollama model library to view the list of available LLM models.
Popular models include:
- llama3.2 (small, general-purpose, ~2GB).
- mistral (good for text generation, ~4GB).
- phi3 (lightweight, ~2.2GB, good for low-spec machines).
- llava (multimodal, supports text + images).
Pull a Model:
To download a model without running it, use the pull command. For example:
bash ollama pull llama3.2
This downloads the model to your local storage (e.g., ~/.ollama/models on macOS/Linux or C:\Users\<YourUsername>\.ollama\models on Windows).
Run a Model:
- Use the run command to download (if not already pulled) and interact with the model:
  bash ollama run llama3.2
- This starts an interactive REPL (Read-Eval-Print Loop) where you can type prompts and get responses. For example:
  >>> What is the capital of France? The capital of France is Paris. >>> /bye
  Use /bye to exit the REPL.
Run with a Single Prompt:
To run a model with a one-off prompt without entering the REPL:
bash ollama run llama3.2 "Explain the basics of machine learning."

Step 3: Manage Models

List Installed Models:
- To see all models downloaded on your system:
  bash ollama list
  Output example:
  NAME ID SIZE MODIFIED llama3.2:latest 1234567890ab 2.1 GB 5 minutes ago mistral:latest 0987654321cd 4.1 GB 1 day ago
Remove a Model:
- To free up space, remove a model:
  bash
```
ollama rm llama3.2
```
Check Model Details:
- To view metadata about a model:
  bash ollama show llama3.2

Step 4: Customize and Use Models

Customize with a Modelfile:
- Create a custom model by defining a Modelfile. For example, to create a model based on llama3.2 that behaves like Mario from Super Mario Bros:
  bash echo 'FROM llama3.2 PARAMETER temperature 1 SYSTEM "You are Mario from Super Mario Bros. Answer as Mario, the assistant, only."' > MarioModelfile
- Create the model:
  bash ollama create mario -f MarioModelfile
- Run it:
  bash ollama run mario
  Now the model responds as Mario:
  >>> What's your favorite activity? It's-a me, Mario! I love jumpin’ on Goombas and savin’ Princess Peach! Wahoo!
Use the Ollama API:
- Ollama runs a local server on `http://localhost:11434`. Start it with:
  bash ollama serve
- Test the API with a curl command:
  bash curl http://localhost:11434/api/chat -d '{ "model": "llama3.2", "messages": [{"role": "user", "content": "What is 25 * 25?"}], "stream": false }'
- Integrate with Python using the ollama package:
  bash pip install ollama
  Example script:
  python import ollama response = ollama.chat( model="llama3.2", messages=[{"role": "user", "content": "Explain Newton's second law"}] ) print(response["message"]["content"])
Automate Tasks:
- Create a bash script to automate model runs and save outputs:
  bash #!/bin/bash ollama run llama3.2 "What are AI trends in 2025?" > ai_trends.txt
- Make it executable:
  bash chmod +x script.sh
- Run it:
  bash ./script.sh

Step 5: Optional – Use a Web Interface

For a more user-friendly experience, use Open WebUI with Ollama:

Install Docker: Ensure Docker is installed (https://www.docker.com/).
Run Open WebUI:
bash docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway \ -v open-webui:/app/backend/data --name open-webui \ --restart always ghcr.io/open-webui/open-webui:main
Access the UI:
- Open a browser and go to `http://localhost:3000`.
- Connect to your Ollama instance (default: `http://host.docker.internal:11434`).
- Select and run models via a ChatGPT-like interface.

Tips and Troubleshooting

Use smaller models (e.g., phi3, gemma:2b) on low-spec machines. Ensure sufficient RAM and disk space. If you encounter the Model Not Found problem, If ollama list doesn’t show expected models, check the storage
path or you might want to re-pull the model.

Some more common issues are below:

API Issues: Ensure ollama serve is running before using the API. If the server isn’t responding, restart Ollama or check `http://localhost:11434`.
GPU Not Detected: Verify GPU drivers are installed (e.g., nvidia-smi for NVIDIA). On macOS, GPU support is limited.
Storage Path: Customize the model storage path using the OLLAMA_MODELS environment variable (e.g., export OLLAMA_MODELS=/path/to/models on Linux/macOS).

Next Steps

Explore Models: Try multimodal models like llava for image + text tasks (e.g., ollama run llava and drag-and-drop an image in the terminal).
Fine-Tuning: Learn to fine-tune models for specific tasks (see tutorials like https://t.co/h7A03KFYty).
Integrations: Use Ollama with LangChain or Chainlit to build apps (e.g., chatbots).
Community Resources: Check the Ollama GitHub (https://github.com/ollama/ollama) or Reddit (r/ollama) for advanced guides.

Resources

Official Website: https://ollama.com/
Model Library: https://ollama.com/library
GitHub: https://github.com/ollama/ollama
Open WebUI Docs: https://docs.openwebui.com/
CLI Reference: Run ollama help in the terminal.

Free Course

Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course →

Trusted by 50,000+ learners

Written by

Selva Prabhakaran →

Related Course

Master Gen AI — Hands-On

Join 5,000+ students at edu.machinelearningplus.com

Explore Course

Ollama Tutorial: Your Guide to running LLMs Locally

What is Ollama?

Step 1: Install Ollama

Step 2: Download and Run a Model

Step 3: Manage Models

Step 4: Customize and Use Models

Step 5: Optional – Use a Web Interface

Tips and Troubleshooting

Next Steps

Resources

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

What is Ollama?

Step 1: Install Ollama

Step 2: Download and Run a Model

Step 3: Manage Models

Step 4: Customize and Use Models

Step 5: Optional – Use a Web Interface

Tips and Troubleshooting

Next Steps

Resources

Related Articles

Ollama Tutorial: Run LLMs Locally (Llama, Mistral)

LLM Temperature, Top-P, and Top-K Explained — With Python Simulations

Build a Python AI Chatbot with Memory Using LangChain

Get Your Free AI/ML Engineer Roadmap

Want help choosing the right AI/ML path?

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science