Ollama Tutorial: Your Guide to running LLMs Locally
Ollama is a tool used to run the open-weights large language models locally. It’s quick to install, pull the LLM models and start prompting in your terminal / command prompt.
This tutorial should serve as a good reference for anything you wish to do with Ollama, so bookmark it and let’s get started.
What is Ollama?
Ollama is an open-source tool that simplifies running LLMs like Llama 3.2, Mistral, or Gemma locally on your computer. It supports macOS, Linux, and Windows and provides a command-line interface, API, and integration with tools like LangChain. Running models locally ensures privacy, reduces reliance on cloud services, and allows customization.
It can run on Mac, Linux or Windows, no problem, but since we are going to be downloading LLMs and loading up the LLM to your local, having a large RAM helps. So, here’s what will be good to have:
- Hardware: At least 8GB RAM for smaller models (e.g., 7B parameters); 16GB+ recommended for larger models. A GPU (NVIDIA/AMD) is optional but improves performance.
- Disk Space: Models range from 1-50GB depending on size.
Step 1: Install Ollama
- Download Ollama:
- Visit the official Ollama website
- Click the “Download” button for your operating system (macOS, Linux, or Windows).
- For macOS/Windows, download the installer. For Linux, use the provided script:
bash
curl -fsSL https://ollama.com/install.sh | sh
- Install Ollama:
- macOS/Windows: Run the downloaded installer and follow the on-screen instructions.
- Linux: The script above installs Ollama automatically. If you need a specific version, set the
OLLAMA_VERSIONenvironment variable (e.g.,OLLAMA_VERSION=0.1.15). - Verify installation by opening a terminal and running:
bash
ollama
This displays available commands (e.g.,serve,run,list).
- Check GPU Support (Optional):
The typical behaviour is for Ollama to auto-detect NVIDIA/AMD GPUs if drivers are installed. For CUDA on Linux, ensure drivers are set up (runnvidia-smito verify). CPU-only mode works but is slower for larger models.
Step 2: Download and Run a Model
-
Explore Available Models:
Visit the Ollama model library to view the list of available LLM models.
Popular models include:
llama3.2(small, general-purpose, ~2GB).mistral(good for text generation, ~4GB).phi3(lightweight, ~2.2GB, good for low-spec machines).llava(multimodal, supports text + images).
- Pull a Model:
To download a model without running it, use the
pullcommand. For example:
bash
ollama pull llama3.2
This downloads the model to your local storage (e.g.,~/.ollama/modelson macOS/Linux orC:\Users\<YourUsername>\.ollama\modelson Windows). -
Run a Model:
- Use the
runcommand to download (if not already pulled) and interact with the model:
bash
ollama run llama3.2 - This starts an interactive REPL (Read-Eval-Print Loop) where you can type prompts and get responses. For example:
>>> What is the capital of France?
The capital of France is Paris.
>>> /bye
Use/byeto exit the REPL.
- Use the
- Run with a Single Prompt:
To run a model with a one-off prompt without entering the REPL:
bash
ollama run llama3.2 "Explain the basics of machine learning."
Step 3: Manage Models
-
List Installed Models:
- To see all models downloaded on your system:
bash
ollama list
Output example:
NAME ID SIZE MODIFIED
llama3.2:latest 1234567890ab 2.1 GB 5 minutes ago
mistral:latest 0987654321cd 4.1 GB 1 day ago
- To see all models downloaded on your system:
- Remove a Model:
- To free up space, remove a model:
bash
ollama rm llama3.2
- To free up space, remove a model:
- Check Model Details:
- To view metadata about a model:
bash
ollama show llama3.2
- To view metadata about a model:
Step 4: Customize and Use Models
- Customize with a Modelfile:
- Create a custom model by defining a
Modelfile. For example, to create a model based onllama3.2that behaves like Mario from Super Mario Bros:
bash
echo 'FROM llama3.2
PARAMETER temperature 1
SYSTEM "You are Mario from Super Mario Bros. Answer as Mario, the assistant, only."'
> MarioModelfile - Create the model:
bash
ollama create mario -f MarioModelfile - Run it:
bash
ollama run mario
Now the model responds as Mario:
>>> What's your favorite activity?
It's-a me, Mario! I love jumpin’ on Goombas and savin’ Princess Peach! Wahoo!
- Create a custom model by defining a
- Use the Ollama API:
- Ollama runs a local server on `http://localhost:11434`. Start it with:
bash
ollama serve - Test the API with a
curlcommand:
bash
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [{"role": "user", "content": "What is 25 * 25?"}],
"stream": false
}' - Integrate with Python using the
ollamapackage:
bash
pip install ollama
Example script:
python
import ollama
response = ollama.chat(
model="llama3.2",
messages=[{"role": "user", "content": "Explain Newton's second law"}]
)
print(response["message"]["content"])
- Ollama runs a local server on `http://localhost:11434`. Start it with:
- Automate Tasks:
- Create a bash script to automate model runs and save outputs:
bash
#!/bin/bash
ollama run llama3.2 "What are AI trends in 2025?" > ai_trends.txt - Make it executable:
bash
chmod +x script.sh - Run it:
bash
./script.sh
- Create a bash script to automate model runs and save outputs:
Step 5: Optional – Use a Web Interface
For a more user-friendly experience, use Open WebUI with Ollama:
- Install Docker: Ensure Docker is installed (https://www.docker.com/).
- Run Open WebUI:
bash
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway \
-v open-webui:/app/backend/data --name open-webui \
--restart always ghcr.io/open-webui/open-webui:main - Access the UI:
- Open a browser and go to `http://localhost:3000`.
- Connect to your Ollama instance (default: `http://host.docker.internal:11434`).
- Select and run models via a ChatGPT-like interface.
Tips and Troubleshooting
Use smaller models (e.g., phi3, gemma:2b) on low-spec machines. Ensure sufficient RAM and disk space. If you encounter the Model Not Found problem, If ollama list doesn’t show expected models, check the storage
path or you might want to re-pull the model.
Some more common issues are below:
- API Issues: Ensure
ollama serveis running before using the API. If the server isn’t responding, restart Ollama or check `http://localhost:11434`. - GPU Not Detected: Verify GPU drivers are installed (e.g.,
nvidia-smifor NVIDIA). On macOS, GPU support is limited. - Storage Path: Customize the model storage path using the
OLLAMA_MODELSenvironment variable (e.g.,export OLLAMA_MODELS=/path/to/modelson Linux/macOS).
Next Steps
- Explore Models: Try multimodal models like
llavafor image + text tasks (e.g.,ollama run llavaand drag-and-drop an image in the terminal). - Fine-Tuning: Learn to fine-tune models for specific tasks (see tutorials like https://t.co/h7A03KFYty).
- Integrations: Use Ollama with LangChain or Chainlit to build apps (e.g., chatbots).
- Community Resources: Check the Ollama GitHub (https://github.com/ollama/ollama) or Reddit (r/ollama) for advanced guides.
Resources
- Official Website: https://ollama.com/
- Model Library: https://ollama.com/library
- GitHub: https://github.com/ollama/ollama
- Open WebUI Docs: https://docs.openwebui.com/
- CLI Reference: Run
ollama helpin the terminal.
Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.
Start Free Course →