You can find the deployed project frontend at https://b.bridgestoprosperity.dev/
You can find the deployed data science API at http://bridges-to-presperity-08272020.eba-3nqy3zpc.us-east-1.elasticbeanstalk.com/
Alex Kaiser | Jake Krafczyk | Ping Ao |
---|---|---|
1️⃣ Trello Board
1️⃣ Web Backend
1️⃣ Web Frontend
Final Datasets in either CSV or XLSX
Our API provides various merged and integrated Bridges-to-Prosperity bridge data endpoints, passing Rwandan bridge site data to the web backend/frontend application. The API is basted on the FastAPI framework, and hosted via AWS Elastic Beanstalk.
Detailed instructions on how to get started with FastAPI, Docker and AWS web deployment via Elastic Beanstalk can be found in this ds starter readme.
- AWS Elastic Beanstalk: Platform as a service, hosts your API.
- Docker: Containers, for reproducible environments.
- FastAPI: Web framework. Like Flask, but faster, with automatic interactive docs.
- Pandas: Open source data analysis and manipulation tool.
- Flake8: Linter, enforces PEP8 style guide.
- FuzzyWuzzy: Fuzzy string matching like a boss.
- Plotly: Visualization library, for Python & JavaScript.
- Pytest: Testing framework, runs your unit tests.
Create a new repository from this template.
Clone the repo
git clone https://github.com/YOUR-GITHUB-USERNAME/YOUR-REPO-NAME.git
cd YOUR-REPO-NAME
Build the Docker image
docker-compose build
Run the Docker image
docker-compose up
Go to localhost:8000
in your browser.
There you'll see the API documentation as well as several distinct endpoints:
-
An endpoint for POST requests,
/raw
: Initial endpoint returning raw site assessment data as provided by B2P for initial probing by web-backend. -
An endpoint for POST requests,
/sites
: Endpoint returning clean site assessment data. -
An endpoint for POST requests,
/villages
: Endpoint returning clean village and ID data as provided by the Gov. of Rwanda. -
An endpoint for POST requests,
/final-data
: Endpoint returning merged data following the agreed upon format.
{
"project_code": "1014328",
"province": "Southern Province",
"district": "Kamonyi",
"sector": "Gacurabwenge",
"cell": "Gihinga",
"village": "Kagarama",
"village_id": "28010101",
"name": "Kagarama",
"type": "Suspension",
"stage": "Rejected",
"sub_stage": "Technical",
"Individuals_directly_served": 0,
"span": 0,
"lat": -1.984548,
"long": 29.931428,
"communities_served": "['Kagarama', 'Ryabitana', 'Karama', 'Rugogwe', 'Karehe']"
},...
- An endpoint for POST requests,
/final-data/extended
: Similar to the/final-data
endpoint, but provides additional information ondistrict_id
,sector_id
,cell_id
,form
,case_safe_id
,opportunity_id
, andcountry
.
{
"project_code": "1014107",
"province": "Western Province",
"district": "Rusizi",
"district_id": 36,
"sector": "Giheke",
"sector_id": "3605",
"cell": "Gakomeye",
"cell_id": "360502",
"village": "Buzi",
"village_id": "36050201",
"name": "Buzi",
"type": "Suspended",
"stage": "Rejected",
"sub_stage": "Technical",
"Individuals_directly_served": 0,
"span": 0,
"lat": -2.42056,
"long": 28.9662,
"communities_served": "['Buzi', 'Kabuga', 'Kagarama', 'Gacyamo', 'Gasheke']",
"form": "Project Assessment - 2018.10.29",
"case_safe_id": "a1if1000002e51bAAA",
"opportunity_id": "006f100000d1fk1",
"country": "Rwanda"
},
Overall the file structure should be very intuitive and easy to follow.
data/
contains anything related to datasets or images.
notebooks/
is where any additional notebooks used for initial data exploration, data cleaning, and the extensive data merging procedures are stored.
/project/requirements.txt
is where you add Python packages that your app requires. Then run docker-compose build
to re-build your Docker image.
├── data
| ├── edit
| ├── final
| ├── image
| └── raw
|
├── notebooks
|
├── project
| ├── requirements.txt
└── app
├── __init__.py
├── main.py
├── api
│ ├── __init__.py
│ ├── raw.py
│ ├── sites.py
│ ├── villages.py
│ └── final_data_extended.py
│ └── final_data.py
└── tests
├── __init__.py
├── test_main.py
├── test_predict.py
└── test_viz.py
For the three non-compiled endpoints /raw
, /sites
and /villages
we used pandas
in order to load the respective datasets found in data/raw
and converted them into JSON objects using the standard json
library.
An example for the simple endpoint setup is shown below.
./project/app/api/villages.py
# Imports
from fastapi import APIRouter
import pandas as pd
import json
router = APIRouter()
names_codes = "https://raw.githubusercontent.com/Lambda-School-Labs/Labs25-Bridges_to_Prosperity-TeamB-ds/main/data/edit/Rwanda_Administrative_Levels_and_Codes_Province_through_Village_clean_2020-08-25.csv"
names_codes = pd.read_csv(names_codes)
# /villages endpoint
@router.get("/villages")
async def villages():
output = names_codes.to_json(orient="records")
parsed = json.loads(output)
return parsed
The two deployed production endpoints /final-data
and /final-data/extended
follow a slightly different approach as the returned JSON object data required a specific structure in order to be integrated with the web-backend application.
the CSV dataset was loaded with the request
library:
request = requests.get(
"https://raw.githubusercontent.com/Lambda-School-Labs/Labs25-Bridges_to_Prosperity-TeamB-ds/main/data/edit/B2P_Rwanda_Sites%2BIDs_full_2020-09-21.csv"
)
buff = io.StringIO(request.text)
directread = csv.DictReader(buff)
And data objects/dictionaries were assembled by looping over directread
# Loop over rows and return according to desired format
for row in directread:
# splitting "communities_served" into list of strings with every
# iteration
if len(row["communities_served"]) == 0:
communities_served = ["unavailable"]
else:
communities_served = list(row["communities_served"].split(", "))
# Set key for dictionary
key = row["project_code"]
# Set output format
output[key] = {
"project_code": row["project_code"],
"province": row["province"],
"district": row["district"],
"sector": row["sector"],
"cell": row["cell"],
"village": row["village"],
"village_id": row["village_id"],
"name": row["name"],
"type": row["type"],
"stage": row["stage"],
"sub_stage": row["sub_stage"],
"Individuals_directly_served": int(row["Individuals_directly_served"]),
"span": int(row["span"]),
"lat": float(row["lat"]),
"long": float(row["long"]),
"communities_served": communities_served,
}
app/main.py
is where you edit your app's title and description, which are displayed at the top of the your automatically generated documentation. This file also configures "Cross-Origin Resource Sharing", which you shouldn't need to edit.
app/api/predict.py
defines the Machine Learning endpoint. /predict
accepts POST requests and responds with random predictions. In a notebook, train your model and pickle it. Then in this source code file, unpickle your model and edit the predict
function to return real predictions.
When your API receives a POST request, FastAPI automatically parses and validates the request body JSON, using the Item
class attributes and functions. Edit this class so it's consistent with the column names and types from your training dataframe.
- FastAPI docs - Request Body
- FastAPI docs - Field additional arguments
- calmcode.io video - FastAPI - Json
- calmcode.io video - FastAPI - Type Validation
- pydantic docs - Validators
Web deployment of the API was done analogously to the procedure described in the ds starter readme.
We used Docker to build the image locally, test it, then pushed it to Docker Hub.
docker build -f project/Dockerfile -t YOUR-DOCKER-HUB-ID/YOUR-IMAGE-NAME ./project
docker login
docker push YOUR-DOCKER-HUB-ID/YOUR-IMAGE-NAME
Then we used the EB CLI:
git add --all
git commit -m "Your commit message"
eb init -p docker YOUR-APP-NAME --region us-east-1
eb create YOUR-APP-NAME
eb open
To redeploy:
git commit ...
docker build ...
docker push ...
eb deploy
eb open
- API test interface: http://bridges-to-presperity-08272020.eba-3nqy3zpc.us-east-1.elasticbeanstalk.com/
- Data output in desired format: http://bridges-to-presperity-08272020.eba-3nqy3zpc.us-east-1.elasticbeanstalk.com/final-data
- Data output with some extended information: http://bridges-to-presperity-08272020.eba-3nqy3zpc.us-east-1.elasticbeanstalk.com/final-data/extended
We used FastAPIs build in TestClient to test endpoints.