-
-
Notifications
You must be signed in to change notification settings - Fork 64
Getting Started
This document explains all of the technology and data requirements to begin replicating and contributing to the project.
The Safe Water Project is currently being developed in Python. Some project contributors have utilized MySQL to import data into pandas in their Jupyter notebooks, and historically we have had contributions done in R.
In addition to technology requirements, there are some data requirements. If you already have all the technology requirements and know how to fork a repo, you can skip down to the Data section.
Download the latest version of Python here. Alternatively, you may be interested in downloading Anaconda, which is a distribution of Python that comes pre-installed with tons of data science tools (with a bare bones installation of Python, you need to preinstall these yourself). You can read more about different Python distributions and their pros and cons here. If you need additional help getting started in Python, check out this guide.
Most of the Python coding is being done in Jupyter notebooks, which are documents that execute code in small chunks and visualizes the outputs, all in one window. Jupyter comes with Anaconda by default; if you are using another distribution of Python, install Jupyter in the command line with:
pip install jupyter
To open up Jupyter after it is installed, in command line, run:
jupyter notebook
Alternatively, you can open up any .ipynb
file with jupyter-notebook.exe
, inside your Python's scripts folder.
If you want to learn more about Jupyter Notebook, check out this tutorial.
Download Git here. If you're new to GitHub, after you create an account, go through GitHub's Hello World tutorial. After that, you will need to learn how to fork the safe-water repo; you can learn how to do that by following GitHub's Fork a repo tutorial.
MySQL Server is not a prerequisite for making contributions to this project, but it will be easier to run many of the contributions by others in this project if you have it installed. You can download it here.
R is not a prerequisite for making contributions to this project, although some users in the past have used R, and we do not discourage volunteers from making contributions in R if they are more comfortable working in R. You can download R here, and RStudio (an IDE for R) here.
-
On GitHub, navigate to the repository. In the top-right corner of the page, click Fork.
-
Clone your fork. Navigate to a folder you would like to place this project, then type:
git clone https://github.com/<YOUR-USERNAME>/safe-water.git
cd safe-water
- Add the safe-water repository as a remote to your fork:
git remote add upstream
https://github.com/codeforboston/safe-water.git
- Checkout the master branch:
git checkout master
The easiest way to install the Python dependencies is using Pipenv. Ensure that you have Pipenv installed, then, with the repo as your working directory, run:
pipenv install
To add a new Python dependency, run:
pipenv install antigravity # Replace `antigravity` with desired package name
Be sure to commit Pipfile
and Pipfile.lock
to the repo.
Install the following packages:
install.packages(c("tidyverse", "noncensus",
"ggplot2", "choroplethr",
"choroplethrMaps", "lubridate"))
Make sure the symlink
in the R directory points to the data directory.
In order to run most of the Jupyter notebooks in this project, you will need to download the SDWIS data and place it into the data/
folder. (We do not have enough space in our Github repository to store this data). You can find this data pinned in our Slack channel #water in a file called SDWIS.zip
. Extract these csv files into data/sdwis
.
Alternatively, you can run the scraper in code/python/scraper
to obtain this information, although this would take some time to run.