Skip to content

Commit

Permalink
add book refs, bit more info
Browse files Browse the repository at this point in the history
  • Loading branch information
gbekes authored Sep 13, 2022
1 parent 132ff9c commit c785d0b
Showing 1 changed file with 12 additions and 12 deletions.
24 changes: 12 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,38 +11,38 @@ To get a copy: [Inspection copy for instructors](https://www.cambridge.org/highe

## Acknowledgments

We'd like to say thanks for [Ágoston Reguly](https://github.com/regulyagoston) who created the template for the initial coding supplement in R to the Data Analysis handbook. We followed his steps in writing the Python-version of the teaching material.
We'd like to say thanks for [Ágoston Reguly](https://github.com/regulyagoston) who created the template for Coding for Data Analysis series. We followed his steps in writing the Python-version of the teaching material.


## Status

This is version 0.1, as of August 29, 2022.
This is version 0.1, as of 13 September, 2022.

Comments are really welcome in email or as a GitHub issue.
Comments are really welcome -- just add a GitHub issue.


## Overview

The course is an introducton to the Python programming language, its software environment, and also to data exploration, data transformation, visualization, and more advanced data analysis.
The course is an introducton to the Python programming language, its software environment, and also to data exploration, data transformation, visualization, and more advanced data analysis. The idea is that people will learn working with Python along with learning to carry out data analysis.

The material primarily consists of `Jupyter notebooks`, and is sometimes supplemented with additional data. In most cases, however, we used the [textbook's datasets](https://gabors-data-analysis.com/datasets/) to bring the course as close to the original textbook as possible.

Lectures 0 to 6 are general introductions to Python and its concepts. These notebooks focus on coding principles, Python's main building blocks, and introduce the data analyst's most important data structure: Pandas dataframes.
Lectures 0 to 9 mostly complements [Part I: Data Exploration (Chapter 1-6)](https://gabors-data-analysis.com/chapters/#part-i-data-exploration).Lectures 0 to 6 are general introductions to Python and its concepts. These notebooks focus on coding principles, Python's main building blocks, and introduce the data analyst's most important data structure: Pandas dataframes. Lecture 7 gives insight how to use Python for data exploration. Lectures 8 and 9 expands the toolkit for advanced data analytics techniques.

Lecture 7 gives insight how to use Python for data exploration. Lectures 8 and 9 expands the toolkit for advanced data analytics techniques.

Lectures 10 to 16 cover everything you need to know about linear regression in Python on an introductionary level. We start with simple linear regression on cross-sectional data, then we explore binary models, and multiple linear regression. Finally we discuss the basic time-series regression model and its intricacies.
Lecture 10 to 16 complements [PART II: Regression Analysis (Chapter 7-12)](https://gabors-data-analysis.com/chapters/#part-ii-regression-analysis) and cover everything you need to know about linear regression in Python on an introductionary level. We start with simple linear regression on cross-sectional data, then we explore binary models, and multiple linear regression. Finally we discuss the basic time-series regression model and its intricacies.


## Philosopy and how to use

We tried to put together a benchmark course to supplement the Data Analysis texbook and to help anyone, students and intructors alike, follow the book's material. Anyone is free to use the notebooks in their current or in any modified form, with proper reference to the original material.

While we try to teach the basics on Python, this is not a classical coding course material. The notebooks take the reader through the data analysis workflow of the first 12 chapters of the textbook providing assitance in Python along the way. It is possible to learn the very basics of Python using these notebooks, but simply completing the exercises won't make anyone a programmer. Using the codebase _and_ the textbook together however, does help in understanding statistical and data analytics concepts and see the theory in practice.
While we teach the basics on Python, this is not a classical coding course material. The notebooks take the reader through the data analysis workflow of the first 12 chapters of the textbook providing assitance in Python along the way. You will learn gradually what is needed to carry out analytical steps from loading data to running regressions. We will suggest additional resources to learn more coding tools and enhance your skills.

It is possible to learn the very basics of Python using these notebooks, but simply completing the exercises won't make anyone a programmer. Using the codebase _and_ the textbook together however, does help in understanding statistical and data analytics concepts and see the theory in practice.

The lectures are pre-written, which an educated reader can follow and understand. Nevertheless, instructors may want to modify and tailor-make the codes according to their own teaching habits and philosophy. Homeworks are not part of the codebase, giving instructors another task in the practical coding sessions of their data analytics courses.

The material's main focus is the manipulation and analysis of tabular data. Pandas dataframes provide most of the tools for these manipulation exercises, and we use the `statsmodels` package for running linear regressions. We added a basic a matplotlib intro but we use `plotnine`, the Python-implementation of _ggplot_, for visualization and graphical representation.
The material's main focus is the manipulation and analysis of tabular data. `Pandas` dataframes provide most of the tools for these manipulation exercises, and we use the `statsmodels` package for running linear regressions. As for data vizualization, we added a basic intro to the most popular `matplotlib`pacakge, but rely heavily on a new favorite: `plotnine`, the Python-implementation of R's _ggplot_, for visualization and graphical representation.


## Course content
Expand Down Expand Up @@ -70,6 +70,6 @@ The material's main focus is the manipulation and analysis of tabular data. Pand



## Note
## Technical Note: environment

Most data science courses use the Anaconda environment for Python. We, however, use `pip` and `pipenv`, and run Jupyter notebooks from the course's environment. Anaconda is a great tool for data analysis and data science, but once someone goes beyond ad-hoc adata analysis and needs to develop and deploy advanced data solutions in a production environment in Python, `pip` is going to be the way to go.
Most data science courses use the Anaconda environment for Python. We, however, use `pip` and `pipenv`, and run Jupyter notebooks from the course's environment. Anaconda is a great tool for data analysis and data science, but once someone goes beyond ad-hoc adata analysis and needs to develop and deploy advanced data solutions in a production environment in Python, `pip` is going to be the way to go.

0 comments on commit c785d0b

Please sign in to comment.