The following tutorial I will use a four node Raspberry Pi Cluster for an example.
(And I'll use my preferred selection between many similar options.)
i.e. the memory allocation settings is fit for Raspberry Pi 3 with 1G RAM
After setting up the environment, I'll implement some popular distributed computing ecosystem on it. And try to write a quick start script for them. And maybe some example demo.
Usage in Detail !! (Manual)
Important!!. First check user settings in configure.yaml
, (for deeper settings check out fabfile.py
User Settings part)
# Install local dependencies
python3 -m pip install -r requirements.txt
fab update-and-upgrade # Make apt-get up to date (this can be done using the first login GUI of Raspbian Buster)
fab env-setup # Quick install basic utility function
fab set-hostname # Set hostname for each node (will need to reboot)
fab hosts-config # Set each others' hostname-to-IP on each Raspberry Pi (or they can't find each other using hostname)
fab ssh-config # Generate ssh-key and setup to all nodes
fab change-passwd # Change password for more security (Remember also change in fabfile.py later if you have changed pi's passowrd)
fab expand-swap # Expand swap (default 1024MB use --size=MEMSIZE to match your need) (System default is 100MB)
Regular used function (make sure you've generated ssh-key or move your ssh-key to ./connection/id_rsa
)
fab ssh-connect NODE_NUM # Connect to any node by it's index without password (use -h flag to be hadoop user)
fab uploadfile file_or_dir -s -p # Upload file or folder to remote (specific node use -n=NODE_NUM flag)
If you changed default hostname in fabfile.py
or configure.yaml
.
Make sure you also changed the Hadoop configuraiton file in ./Files.
(if you're using cloud server, make sure you've opened the ports that Hadoop need.)
fab install-hadoop # An one button setup for hadoop environment on all nodes!!!
fab update-hadoop-conf # Every time you update configure file in local you can update it to all nodes at once
(the key of Hadoop user is store in ./connection/hadoopSSH
)
Utility function
fab start-hadoop
fab restart-hadoop
fab stop-hadoop
fab status-hadoop # Monitor Hadoop behavior
fab example-hadoop # If everything is done. You can play around with some hadoop official example
If you changed default hostname in fabfile.py
or configure.yaml
.
Make sure you also changed the Spark configuraiton file in ./Files.
fab install-spark
There are lots of utility function like I did for Hadoop. Check it out by fab --list
This will be installed with Hadoop user
fab install-jupyter
fab install-docker
fab install-codeserver
Subject | Ecosystem | Purpose |
---|---|---|
MapReduce Practice | Hadoop | MapReduce practice with Hadoop Streaming |
Spark Practice | Spark | |
Inverted Index | Focus on multiple inverted index strategy for search |
A step by step record of how I build this system.
- Preparation
- Hardware purchase
- Software package and dependencies (PC/Laptop)
- Python > 3.6
- Fabric 2.X
-
Assemble hardwares
-
Follow steps in Quick Setup
- Make sure
- (setup locale)
- update and upgrade
- setup environment
- git
- Java (JDK)
- setup hostname (for each and between each others)
- ssh keys
- expand swap (if use Raspberry Pi 3 or small RAM Raspberry Pi 4)
- Make sure
-
Setup fabric (brief notes) - execute shell commands remotely over SSH to all hosts at once!
- I've built some utility function first and then move on setup Hadoop
- when any general purpose manipulation needed I'll add it.
-
Setup Docker Swarm - TODO
-
Setup Kubernetes - TODO
-
Setup Distributed Tensorflow - TODO
- on Hadoop
- on Kubernetes
- Setup VSCode code-server - TODO
Algorithm
Links
- Chameleon Cloud Training
- fffaraz/awesome-selfhosted-aws: A curated list of awesome self-hosted alternatives to Amazon Web Services (AWS)
Distributed Tensorflow
High Performance Computing (HPC)
Resource Manager
Intel has updated their DevCloud system and currently called oneAPI
- Deal with PySpark and Jupyter Notebook problem
- More friendly Document
- Hadoop utility function introduction
- Dynamic Configure based on different hardware and maybe GUI and save multiple settings
- Set up hardware detail e.g. RAM size
- Read and write *.xml
- list some alterative note
- pdsh == fab CMD
- ssh-copy-id == ssh-config
- Hive, HBase, Pig, ...
- Git server maybe
- 14+ Raspberry Pi Server Projects
- Change
apt-get
toapt
?! - MPI
- Dask
- Deploy Dask Clusters — Dask documentation
- Dask-MPI
- Cluster manager: PBS, SLURM, LSF, SGE
- Configuring a Distributed Dask Cluster
- Deploy Dask Clusters — Dask documentation
- Fabric alternative