Menu

Ollama Tutorial: Your Guide to running LLMs Locally

Written by Selva Prabhakaran | 5 min read

Ollama is a tool used to run the open-weights large language models locally. It’s quick to install, pull the LLM models and start prompting in your terminal / command prompt.

This tutorial should serve as a good reference for anything you wish to do with Ollama, so bookmark it and let’s get started.

What is Ollama?

Ollama is an open-source tool that simplifies running LLMs like Llama 3.2, Mistral, or Gemma locally on your computer. It supports macOS, Linux, and Windows and provides a command-line interface, API, and integration with tools like LangChain. Running models locally ensures privacy, reduces reliance on cloud services, and allows customization.

It can run on Mac, Linux or Windows, no problem, but since we are going to be downloading LLMs and loading up the LLM to your local, having a large RAM helps. So, here’s what will be good to have:

  • Hardware: At least 8GB RAM for smaller models (e.g., 7B parameters); 16GB+ recommended for larger models. A GPU (NVIDIA/AMD) is optional but improves performance.
  • Disk Space: Models range from 1-50GB depending on size.

Step 1: Install Ollama

  1. Download Ollama:
    • Visit the official Ollama website
    • Click the “Download” button for your operating system (macOS, Linux, or Windows).
    • For macOS/Windows, download the installer. For Linux, use the provided script:
      bash
      curl -fsSL https://ollama.com/install.sh | sh
      
  2. Install Ollama:
    • macOS/Windows: Run the downloaded installer and follow the on-screen instructions.
    • Linux: The script above installs Ollama automatically. If you need a specific version, set the OLLAMA_VERSION environment variable (e.g., OLLAMA_VERSION=0.1.15).
    • Verify installation by opening a terminal and running:
      bash
      ollama

      This displays available commands (e.g., serve, run, list).
  3. Check GPU Support (Optional):
    The typical behaviour is for Ollama to auto-detect NVIDIA/AMD GPUs if drivers are installed. For CUDA on Linux, ensure drivers are set up (run nvidia-smi to verify). CPU-only mode works but is slower for larger models.

Step 2: Download and Run a Model

  1. Explore Available Models:

    Visit the Ollama model library to view the list of available LLM models.

    Popular models include:

    • llama3.2 (small, general-purpose, ~2GB).
    • mistral (good for text generation, ~4GB).
    • phi3 (lightweight, ~2.2GB, good for low-spec machines).
    • llava (multimodal, supports text + images).
  2. Pull a Model:

    To download a model without running it, use the pull command. For example:
    bash
    ollama pull llama3.2

    This downloads the model to your local storage (e.g., ~/.ollama/models on macOS/Linux or C:\Users\<YourUsername>\.ollama\models on Windows).

  3. Run a Model:

    • Use the run command to download (if not already pulled) and interact with the model:
      bash
      ollama run llama3.2
    • This starts an interactive REPL (Read-Eval-Print Loop) where you can type prompts and get responses. For example:
      >>> What is the capital of France?
      The capital of France is Paris.
      >>> /bye

      Use /bye to exit the REPL.
  4. Run with a Single Prompt:

    To run a model with a one-off prompt without entering the REPL:
    bash
    ollama run llama3.2 "Explain the basics of machine learning."

Step 3: Manage Models

  1. List Installed Models:

    • To see all models downloaded on your system:
      bash
      ollama list

      Output example:
      NAME ID SIZE MODIFIED
      llama3.2:latest 1234567890ab 2.1 GB 5 minutes ago
      mistral:latest 0987654321cd 4.1 GB 1 day ago
  2. Remove a Model:
    • To free up space, remove a model:
      bash
      ollama rm llama3.2
      
  3. Check Model Details:
    • To view metadata about a model:
      bash
      ollama show llama3.2

Step 4: Customize and Use Models

  1. Customize with a Modelfile:
    • Create a custom model by defining a Modelfile. For example, to create a model based on llama3.2 that behaves like Mario from Super Mario Bros:
      bash
      echo 'FROM llama3.2
      PARAMETER temperature 1
      SYSTEM "You are Mario from Super Mario Bros. Answer as Mario, the assistant, only."'
      > MarioModelfile
    • Create the model:
      bash
      ollama create mario -f MarioModelfile
    • Run it:
      bash
      ollama run mario

      Now the model responds as Mario:
      >>> What's your favorite activity?
      It's-a me, Mario! I love jumpin’ on Goombas and savin’ Princess Peach! Wahoo!
  2. Use the Ollama API:
    • Ollama runs a local server on `http://localhost:11434`. Start it with:
      bash
      ollama serve
    • Test the API with a curl command:
      bash
      curl http://localhost:11434/api/chat -d '{
      "model": "llama3.2",
      "messages": [{"role": "user", "content": "What is 25 * 25?"}],
      "stream": false
      }'
    • Integrate with Python using the ollama package:
      bash
      pip install ollama

      Example script:
      python
      import ollama
      response = ollama.chat(
      model="llama3.2",
      messages=[{"role": "user", "content": "Explain Newton's second law"}]
      )
      print(response["message"]["content"])
  3. Automate Tasks:
    • Create a bash script to automate model runs and save outputs:
      bash
      #!/bin/bash
      ollama run llama3.2 "What are AI trends in 2025?" > ai_trends.txt
    • Make it executable:
      bash
      chmod +x script.sh
    • Run it:
      bash
      ./script.sh

Step 5: Optional – Use a Web Interface

For a more user-friendly experience, use Open WebUI with Ollama:

  1. Install Docker: Ensure Docker is installed (https://www.docker.com/).
  2. Run Open WebUI:
    bash
    docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway \
    -v open-webui:/app/backend/data --name open-webui \
    --restart always ghcr.io/open-webui/open-webui:main
  3. Access the UI:
    • Open a browser and go to `http://localhost:3000`.
    • Connect to your Ollama instance (default: `http://host.docker.internal:11434`).
    • Select and run models via a ChatGPT-like interface.

Tips and Troubleshooting

Use smaller models (e.g., phi3, gemma:2b) on low-spec machines. Ensure sufficient RAM and disk space. If you encounter the Model Not Found problem, If ollama list doesn’t show expected models, check the storage
path or you might want to re-pull the model.

Some more common issues are below:

  • API Issues: Ensure ollama serve is running before using the API. If the server isn’t responding, restart Ollama or check `http://localhost:11434`.
  • GPU Not Detected: Verify GPU drivers are installed (e.g., nvidia-smi for NVIDIA). On macOS, GPU support is limited.
  • Storage Path: Customize the model storage path using the OLLAMA_MODELS environment variable (e.g., export OLLAMA_MODELS=/path/to/models on Linux/macOS).

Next Steps

  • Explore Models: Try multimodal models like llava for image + text tasks (e.g., ollama run llava and drag-and-drop an image in the terminal).
  • Fine-Tuning: Learn to fine-tune models for specific tasks (see tutorials like https://t.co/h7A03KFYty).
  • Integrations: Use Ollama with LangChain or Chainlit to build apps (e.g., chatbots).
  • Community Resources: Check the Ollama GitHub (https://github.com/ollama/ollama) or Reddit (r/ollama) for advanced guides.

Resources

  • Official Website: https://ollama.com/
  • Model Library: https://ollama.com/library
  • GitHub: https://github.com/ollama/ollama
  • Open WebUI Docs: https://docs.openwebui.com/
  • CLI Reference: Run ollama help in the terminal.
Free Course
Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course
Trusted by 50,000+ learners
Related Course
Master Gen AI — Hands-On
Join 5,000+ students at edu.machinelearningplus.com
Explore Course
⚡ Before you go

Python.
SQL. NumPy.
All free.

Get the exact 10-course programming foundation that Data Science (AI/ML) professionals use and kick start your Data Science (AI/ML) Career.

🐍
Core Python — from first line of code to expert level
📈
NumPy & Pandas — the #1 libraries every DS job needs
🗀
SQL Levels I–III — from basics to Window Functions
📄
Real industry data — Jupyter notebooks included
R A M S K
57,000+ students
★★★★★ Rated 4.9/5
Get Free Access Now
10 courses. Real projects. Zero cost. No credit card.
New learners enrolling right now
🔒 100% free ☕ No spam, ever ✓ Instant access
🚀
You're in!
Check your inbox for your access link.
Or start your first course right now:
Start the free courses →
Scroll to Top
Scroll to Top
Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science