Getting Started

This document explains all of the technology and data requirements to begin replicating and contributing to the project.

The Safe Water Project is currently being developed in Python. Some project contributors have utilized MySQL to import data into pandas in their Jupyter notebooks, and historically we have had contributions done in R.

In addition to technology requirements, there are some data requirements. If you already have all the technology requirements and know how to fork a repo, you can skip down to the Data section.

Technology

Python

Download the latest version of Python here. Alternatively, you may be interested in downloading Anaconda, which is a distribution of Python that comes pre-installed with tons of data science tools (with a bare bones installation of Python, you need to preinstall these yourself). You can read more about different Python distributions and their pros and cons here. If you need additional help getting started in Python, check out this guide.

Jupyter Notebook

Most of the Python coding is being done in Jupyter notebooks, which are documents that execute code in small chunks and visualizes the outputs, all in one window. Jupyter comes with Anaconda by default; if you are using another distribution of Python, install Jupyter in the command line with:

pip install jupyter

To open up Jupyter after it is installed, in command line, run:

jupyter notebook

Alternatively, you can open up any .ipynb file with jupyter-notebook.exe, inside your Python's scripts folder.

If you want to learn more about Jupyter Notebook, check out this tutorial.

Git / GitHub

Download Git here. If you're new to GitHub, after you create an account, go through GitHub's Hello World tutorial. After that, you will need to learn how to fork the safe-water repo; you can learn how to do that by following GitHub's Fork a repo tutorial.

MySQL Server

MySQL Server is not a prerequisite for making contributions to this project, but it will be easier to run many of the contributions by others in this project if you have it installed. You can download it here.

R

R is not a prerequisite for making contributions to this project, although some users in the past have used R, and we do not discourage volunteers from making contributions in R if they are more comfortable working in R. You can download R here, and RStudio (an IDE for R) here.

Forking the Project

On GitHub, navigate to the repository. In the top-right corner of the page, click Fork.
Clone your fork. Navigate to a folder you would like to place this project, then type:

git clone https://github.com/<YOUR-USERNAME>/safe-water.git
cd safe-water

Add the safe-water repository as a remote to your fork:

git remote add upstream
https://github.com/codeforboston/safe-water.git

Checkout the master branch:

git checkout master

Dependencies

Python Dependencies

The easiest way to install the Python dependencies is using Pipenv. Ensure that you have Pipenv installed, then, with the repo as your working directory, run:

pipenv install

To add a new Python dependency, run:

pipenv install antigravity  # Replace `antigravity` with desired package name

Be sure to commit Pipfile and Pipfile.lock to the repo.

R Dependencies

Install the following packages:

install.packages(c("tidyverse", "noncensus",
                   "ggplot2", "choroplethr",
                   "choroplethrMaps", "lubridate"))

Make sure the symlink in the R directory points to the data directory.

Data

In order to run most of the Jupyter notebooks in this project, you will need to download the SDWIS data and place it into the data/ folder. (We do not have enough space in our Github repository to store this data). You can find this data pinned in our Slack channel #water in a file called SDWIS.zip. Extract these csv files into data/sdwis.

Alternatively, you can run the scraper in code/python/scraper to obtain this information, although this would take some time to run.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly