Menu

Install PySpark on MAC – A Step-by-Step Guide to Install PySpark on MAC with Code Examples

This step-by-step guide will cover prerequisites, installation, and example code to help you get started with PySpark on Mac operating system.

Written by Jagdeesh | 3 min read

Introduction

Apache PySpark is a powerful open-source data processing engine built on the Apache Hadoop ecosystem, used for big data processing and analytics.

In this blog post, we will walk through the process of setting up the environment and installing PySpark on a Mac operating system.

This step-by-step guide will cover prerequisites, installation, and example code to help you get started with PySpark.

Prerequisites

Before we begin the installation, ensure you have the following prerequisites:

  1. A Mac operating system (macOS High Sierra or later).

  2. Homebrew (a package manager for macOS) installed. If not, follow the installation instructions here: https://brew.sh/

  3. Python 3.x installed. If not, you can download it from https://www.python.org/downloads/mac-osx/

1. Install Java Development Kit (JDK)

PySpark requires Java 8 or later to run. To install the latest version of JDK, open your terminal and execute the following command:

python
brew install openjdk

To check if the installation was successful, run the following command:

python
java -version

2. Set JAVA_HOME environment variable

Set the JAVA_HOME environment variable in your shell profile (e.g., ~/.bashrc or ~/.zshrc) by adding the following line:

python
export JAVA_HOME=$(/usr/libexec/java_home)

Then, run the following command to source the changes:

python
source ~/.bashrc

3. Install Apache Spark

First, we need to install Apache Spark using Homebrew. Open the Terminal and run the following command:

python
brew install apache-spark

This command will install the latest version of Apache Spark on your macOS system.

4. Set Environment Variables

Next, we need to set the environment variables for PySpark. Add the following lines to your shell profile (e.g., ~/.bash_profile, ~/.bashrc, or ~/.zshrc):

python
export SPARK_HOME=/usr/local/Cellar/apache-spark/<version>/libexec
export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=python3

Replace version with the installed Spark version (e.g., 3.2.0). Save the file and run the following command to apply the changes:

python
source ~/.bash_profile

5. Install PySpark Python Package

To use PySpark in your Python projects, you need to install the PySpark package. Run the following command to install PySpark using pip:

python
pip install pyspark

Verify the Installation
To verify that PySpark is successfully installed and properly configured, run the following command in the Terminal:

python
pyspark --version

6. Example PySpark Code

Now that PySpark is installed, let’s run a simple example. Create a Python script called “wordcount.py” with the following content:

python
from pyspark.sql import SparkSession

# Initialize a Spark session
spark = SparkSession.builder \
    .appName("Word Count Example") \
    .getOrCreate()

# Create an RDD from a text file
text_file = spark.sparkContext.textFile("example.txt")

# Perform a word count
word_counts = text_file.flatMap(lambda line: line.split(" ")) \
    .map(lambda word: (word, 1)) \
    .reduceByKey(lambda a, b: a + b)

# Print the word count results
for word, count in word_counts.collect():
    print(f"{word}: {count}")

# Stop the Spark session
spark.stop()

Save the file and create a sample text file called “example.txt” in the same directory with some text. Run the script using the following command:

python
spark-submit wordcount.py

You should see the word count results in the Terminal.

Conclusion:

In this blog post, we’ve guided you through the process of installing PySpark on macOS and provided an example of PySpark code to get you started. Remember to consider the prerequisites and environment variables during the installation process.

With PySpark now installed, you’re ready to dive into large-scale data processing and analytics using Apache Spark on your macOS system.

Free Course
Master Core Python — Your First Step into AI/ML

Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.

Start Free Course
Trusted by 50,000+ learners
Jagdeesh
Written by
Related Course
Master PySpark — Hands-On
Join 5,000+ students at edu.machinelearningplus.com
Explore Course
Free Callback - Limited Slots
Not Sure Which Course to Start With?
Talk to our AI Counsellors and Practitioners. We'll help you clear all your questions for your background and goals, bridging the gap between your current skills and a career in AI.
10-digit mobile number
📞
Thank You!
We'll Call You Soon!
Our learning advisor will reach out within 24 hours.
(Check your inbox too — we've sent a confirmation)
⚡ Before you go

Python.
SQL. NumPy.
All free.

Get the exact 10-course programming foundation that Data Science professionals use.

🐍
Core Python — from first line to expert level
📈
NumPy & Pandas — the #1 libraries every DS job needs
🗃️
SQL Levels I–III — basics to Window Functions
📄
Real industry data — Jupyter notebooks included
R A M S K
57,000+ students
★★★★★ Rated 4.9/5
⚡ Before you go
Python. SQL.
All Free.
R A M S K
57,000+ students  ★★★★★ 4.9/5
Get Free Access Now
10 courses. Real projects. Zero cost. No credit card.
New learners enrolling right now
🔒 100% free ☕ No spam, ever ✓ Instant access
🚀
You're in!
Check your inbox for your access link.
(Check Promotions or Spam if you don't see it)
Or start your first course right now:
Start Free Course →
Scroll to Top
Scroll to Top
Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science