This analysis is based on publicly available astronauts data from Wikidata. In this context, we investigated aspects such as time humans spent in space as well as the age distribution of the astronauts.
The repository is organized as follows:
- data: Contains the astronauts data set retrieved from Wikidata
- code: Contains the astronaut analysis script
- results: Contains the resulting analysis plots
The data set has been generated using the following SPARQL query [1] (retrieval date: 2018-10-25).
You can also analyze a recent version of the astronaut data by replacing the data set and re-running the analysis script:
- Run the SPARQL query
- Download the resulting data formatted as JSON
- Replace the file
data/astronauts.json
- Run the analysis script
The script requires Python >= 3.8 and uses the libraries pandas (BSD 3-Clause License) as well as matplotlib (Matplotlib License).
The script has been successfully tested on Windows 10 and Linux with Python 3.8.
Please clone this repository and install the required dependencies as follows:
git clone ...
cd astronaut-analysis/code
pip install -r requirements.txt
You can run the script as follows:
python astronauts-analysis.py
The script processes the astronauts data set and stores the plots in the same directory. Existing result plots will be overwritten.
The test.sh script performs some basic checks to support maintaining the analysis script:
- It installs the required packages.
- It runs the flake8 linter to find programming mistakes and code style issues.
- It runs the analysis script and checks that the expected plots are produced.
The script runs as part of the GitLab build pipeline to find errors introduced by new commits.
Please see the file LICENSE.md for further information about how the content is licensed.