Skip to content

Latest commit

 

History

History
47 lines (29 loc) · 3.46 KB

README.md

File metadata and controls

47 lines (29 loc) · 3.46 KB

An introduction to data analysis with Pandas & Jupyter notebook

This workshop is aimed at Python users with no prior knowledge of Pandas. In this workshop, we will explore a small dataset and introduce you to the basics of data analysis workflows using the Pandas library and Jupyter notebook. After the workshop, learners will know how to to load data from a CSV file, do some basic exploratory data analysis and data cleaning, generate simple statistics, and create some basic data visualizations.

Materials by Sam Bail @spbail, based on a workshop by Alda Pontes.

Pre-requirements for the workshop

We expect a working knowledge of Python in order to be able to follow along with the workshop. If you are an absolute beginner in Python and aren't familiar with Python syntax, this workshop might not be suited for you.

OPTION 1: Binder link to run remote notebook

Binder is a web-based hub for Jupyter notebook. If your local setup does not work or if you prefer not to install anything locally, you can use the link here to work in a notebook on Binder. Please note that Binder will delete your notebook instance after 12 hours. You can download the notebook to your local machine at the end to have your own copy!

Click this icon to launch the notebook: Binder

OPTION 2: Setup to run the notebook locally

Step 0: Download the materials

  • Clone this git repo to your machine and move your notebook copy you've downloaded from binder into the directory
  • Or start over with the default version of the notebook in the repo

Step 1: Make sure you are running a recent version of Python

  • I'm using a miniconda installation with Python 3.7

Step 2: Install the necessary libraries for the workshop

  • Install the necessary libraries by running pip install -r requirements.txt in the repo directory
  • Do this in a new virtual environment (e.g. a new conda environment) if necessary

Step 3: Make sure Jupyter Notebook runs

  • Open a terminal window in the directory where you downloaded the notebook and run: jupyter notebook
  • This should open a browser window, or go to http://localhost:8888/notebooks/

Step 4: Download the data file

Download the mock_treatment_starts_2016.csv file from this repo. NOTE The data is entirely made up and is in no way related to any real patient data.

About Sam

Hi, I'm Sam! I am a data professional with experience working with healthcare data and building data infrastructure tools. I draw from a large toolkit ranging from various SQL flavors to Python, Pandas, Jupyter Notebook and R to statistical methods, data science and data visualization (Tableau, Superset...), as well as clinical terminologies and software engineering and automation tools - whatever gets the job done.

I completed a PhD in theoretical semantic web foundations at the School of Computer Science, The University of Manchester, UK. My thesis focused on exploring and exploiting the "justificatory structure" of OWL ontologies. While in the UK, I co-founded and lead "Manchester Girl Geeks", a volunteer-based community organization that has been running STEM workshops for girls and women in the area since 2009.

https://www.twitter.com/spbail

https://www.linkedin.com/in/spbail/