This workshop is aimed at Python users with no prior knowledge of Pandas. In this workshop, we will explore a small dataset and introduce you to the basics of data analysis workflows using the Pandas library and Jupyter notebook. After the workshop, learners will know how to to load data from a CSV file, do some basic exploratory data analysis and data cleaning, generate simple statistics, and create some basic data visualizations.
Materials by Sam Bail @spbail, based on a workshop by Alda Pontes.
Pre-requirements for the workshop
We expect a working knowledge of Python in order to be able to follow along with the workshop. If you are an absolute beginner in Python and aren't familiar with Python syntax, this workshop might not be suited for you.
Binder is a web-based hub for Jupyter notebook. If your local setup does not work or if you prefer not to install anything locally, you can use the link here to work in a notebook on Binder. Please note that Binder will delete your notebook instance after 12 hours. You can download the notebook to your local machine at the end to have your own copy!
Click this icon to launch the notebook:
- Clone this git repo to your machine and move your notebook copy you've downloaded from binder into the directory
- Or start over with the default version of the notebook in the repo
- I'm using a miniconda installation with Python 3.7
- Install the necessary libraries by running
pip install -r requirements.txt
in the repo directory - Do this in a new virtual environment (e.g. a new conda environment) if necessary
- Open a terminal window in the directory where you downloaded the notebook and run:
jupyter notebook
- This should open a browser window, or go to http://localhost:8888/notebooks/
Download the mock_treatment_starts_2016.csv file from this repo. NOTE The data is entirely made up and is in no way related to any real patient data.
Hi, I'm Sam! I am a data professional with experience working with healthcare data and building data infrastructure tools. I draw from a large toolkit ranging from various SQL flavors to Python, Pandas, Jupyter Notebook and R to statistical methods, data science and data visualization (Tableau, Superset...), as well as clinical terminologies and software engineering and automation tools - whatever gets the job done.
I completed a PhD in theoretical semantic web foundations at the School of Computer Science, The University of Manchester, UK. My thesis focused on exploring and exploiting the "justificatory structure" of OWL ontologies. While in the UK, I co-founded and lead "Manchester Girl Geeks", a volunteer-based community organization that has been running STEM workshops for girls and women in the area since 2009.