Skip to content

The implementation of Apache Spark (combine with PySpark, Jupyter Notebook) on top of Hadoop cluster using Docker

Notifications You must be signed in to change notification settings

nghoanglongde/spark-cluster-with-docker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
Ng Hoang Long
May 10, 2024
7e4e85b · May 10, 2024

History

22 Commits
May 9, 2024
May 10, 2024
May 10, 2024
May 9, 2024
May 10, 2024

Repository files navigation

Run Spark Cluster within Docker

Untitled Workspace (7)

This is the implementation of spark cluster on top of hadoop (1 masternode, 2 slaves node) using Docker

Follow this steps on Windows 10

1. clone github repo

# Step 1
https://github.com/nghoanglong/spark-cluster-with-docker.git

# Step 2
cd spark-cluster-with-docker

2. pull docker image

docker pull ghcr.io/nghoanglong/spark-cluster-with-docker/spark-cluster:1.0

3. start cluster

docker-compose up

4. access site

  1. hadoop cluster: http://localhost:50070/
  2. hadoop cluster - resource manager: http://localhost:8088/
  3. spark cluster: https://localhost:8080/
  4. jupyter notebook: https://localhost:8888/
  5. spark history server: http://localhost:18080/
  6. spark job monitoring: http://localhost:4040/

About

The implementation of Apache Spark (combine with PySpark, Jupyter Notebook) on top of Hadoop cluster using Docker

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published