This repository contains code and data to reproduce the findings featured in our story, "Amazon Is Rolling Back COVID Protocols in Its Warehouses. Workers Say It’s Premature."
Jupyter notebooks used for data collection, preprocessing, and analysis are in the notebooks
folder.
Make sure you have Python 3.6+ installed. We used virtualenv to create a Python 3.8 virtual environment.
Then install the Python packages:
pip install -r requirements.txt
To run 0-download-oregon-health-data.ipynb
, you must download tika, which is used to convert the pdf files to xml. If you use brew, you can simply run brew install tika
. This notebook has been tested on OS X; installation may vary depending on your operating system.
These notebooks have already been run and do not need to run sequentially. For certain notebooks, publicly available datasets must be downloaded and placed into the data
folder before running.
This notebook downloads historical outbreak data reports from the Oregon Health Authority and converts them from PDF files to XML files.
This notebook takes XML files output from 0-download-oregon-health-data.ipynb
and extracts workplace outbreak data from them. It then outputs that data as output/oha-data-{latest_report}.csv
, where {latest_report}
is the date of the latest report release, formatted as %Y-%m-%d
.
This notebook generates the following filtered CSVs:
output/amazon-covid-reports-all.csv
output/cumulative-cases-data.csv
output/longest-outbreaks.csv
To run this notebook in its entirety, ITA Data CY 2020 - Sept.csv
must be downloaded from OSHA and placed in the data
folder.
This notebook filters OSHA complaint data down to just that related to Amazon facilities. To run this notebook in its entirety, Closed_Federal_State_Plan_Valid_COVID-19_Complaints_Through_1029_2021.xlsx
must be downloaded from OSHA and placed in the data
folder. This notebook filters that dataset to just complaints relevant to Amazon warehouses and exports that as ../output/osha-amazon-closed-complaints.csv
.
File | Description |
---|---|
The Markup - Amazon COVID-19 complaints Fed+State as of Oct 31 2021.xlsx |
Federal and state OSHA complaints where the establishment name includes the term "Amazon" as of Oct. 31, 2021. This data is a response to a FOIA request made by The Markup. |
data/The Markup - Amazon COVID-19 inspections Fed+State as of Oct 31 2021.xlsx |
Federal and state OSHA inspections of Amazon warehouses as of Oct. 31, 2021. This data is a response to a FOIA request made by The Markup. |
output/osha-amazon-closed-complaints.csv |
A filtered list of publicly available closed federal and state OSHA complaint data, which is available on OSHA's website. |
output/oha-data-2021-12-15.csv |
Workplace outbreak counts scraped from Oregon Health Authority outbreak reports dated from March 10, 2021 to Dec. 15, 2021. |
output/amazon-covid-reports-all.csv |
A filtered version of output/oha-data-2021-12-15.csv that only includes Amazon warehouse-related data. |
output/cumulative-cases-data.csv |
Culmulative COVID-19 counts over time for PDX7 and PDX9, based on weekly Oregon Health Authority outbreak reports. Dates are based on when the data is considered "finalized," which is specified in the introduction of each report. |
output/longest-outbreaks.csv |
A top ten list of Oregon workplaces with the longest COVID-19 outbreaks based on Oregon Health Authority reports as of Dec. 15, 2021. The outbreak length is calculated as the time between when the outbreak investigation began and the date of the most recent onset. OHA considers an outbreak "resolved" if more than 28 days have passed without additional cases. |