This example is an attempt to replicate the findings from an article in FiveThirtyEight that examines gender bias in the movie business using the Bechdel test: a movie passes the Bechdel test if there are (1) two named women in it, (2) who talk to each other, (3) about something besides a man. This example is based on an excellent blog post by Brian Keegan, who strongly advocates for reproducibility in data journalism. The code is available under the MIT license.
The original experiment is composed by two steps:
- Data Collection: the datasets used by this example are collected from the Web. Four datasets are needed: revenue data from movies, inflation data, Bechdel test data, and data from IMDB.
- Data Analysis: the datasets collected in the first step are joined and analysed, resulting in a number of different plots.
To run this experiment without ReproZip, you will first need to install the following requirements:
Then, run each script with Python, in the following order:
$ python fetch.py ## step 1
$ python bechdel.py ## step 2
Alternatively, you can run the data analysis step (step 2) directly using the data we provide in this repository.
The ReproZip package is available here (142 MB).
The steps of the experiment can be reproduced as follows:
$ reprounzip vagrant setup bechdel-full.rpz bechdel/
$ reprounzip vagrant run bechdel/ collectdata ## step 1
$ reprounzip vagrant run bechdel/ plotresults ## step 2
Next, you can retrieve all the plots produced from the analysis as follows:
$ reprounzip vagrant download bechdel/ --all
If you are using our demo VM image, you can run the following:
$ vagrant ssh
$ workon bechedl-test
$ cd reprozip-examples/bechdel-test/
$ python bechdel.py