You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+12-12Lines changed: 12 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -11,38 +11,38 @@ To get a copy: [Inspection copy for instructors](https://www.cambridge.org/highe
11
11
12
12
## Acknowledgments
13
13
14
-
We'd like to say thanks for [Ágoston Reguly](https://github.com/regulyagoston) who created the template for the initial coding supplement in R to the Data Analysis handbook. We followed his steps in writing the Python-version of the teaching material.
14
+
We'd like to say thanks for [Ágoston Reguly](https://github.com/regulyagoston) who created the template for Coding for Data Analysis series. We followed his steps in writing the Python-version of the teaching material.
15
15
16
16
17
17
## Status
18
18
19
-
This is version 0.1, as of August 29, 2022.
19
+
This is version 0.1, as of 13 September, 2022.
20
20
21
-
Comments are really welcome in email or as a GitHub issue.
21
+
Comments are really welcome -- just add a GitHub issue.
22
22
23
23
24
24
## Overview
25
25
26
-
The course is an introducton to the Python programming language, its software environment, and also to data exploration, data transformation, visualization, and more advanced data analysis.
26
+
The course is an introducton to the Python programming language, its software environment, and also to data exploration, data transformation, visualization, and more advanced data analysis. The idea is that people will learn working with Python along with learning to carry out data analysis.
27
27
28
28
The material primarily consists of `Jupyter notebooks`, and is sometimes supplemented with additional data. In most cases, however, we used the [textbook's datasets](https://gabors-data-analysis.com/datasets/) to bring the course as close to the original textbook as possible.
29
29
30
-
Lectures 0 to 6 are general introductions to Python and its concepts. These notebooks focus on coding principles, Python's main building blocks, and introduce the data analyst's most important data structure: Pandas dataframes.
30
+
Lectures 0 to 9 mostly complements [Part I: Data Exploration (Chapter 1-6)](https://gabors-data-analysis.com/chapters/#part-i-data-exploration).Lectures 0 to 6 are general introductions to Python and its concepts. These notebooks focus on coding principles, Python's main building blocks, and introduce the data analyst's most important data structure: Pandas dataframes. Lecture 7 gives insight how to use Python for data exploration. Lectures 8 and 9 expands the toolkit for advanced data analytics techniques.
31
31
32
-
Lecture 7 gives insight how to use Python for data exploration. Lectures 8 and 9 expands the toolkit for advanced data analytics techniques.
33
-
34
-
Lectures 10 to 16 cover everything you need to know about linear regression in Python on an introductionary level. We start with simple linear regression on cross-sectional data, then we explore binary models, and multiple linear regression. Finally we discuss the basic time-series regression model and its intricacies.
32
+
Lecture 10 to 16 complements [PART II: Regression Analysis (Chapter 7-12)](https://gabors-data-analysis.com/chapters/#part-ii-regression-analysis) and cover everything you need to know about linear regression in Python on an introductionary level. We start with simple linear regression on cross-sectional data, then we explore binary models, and multiple linear regression. Finally we discuss the basic time-series regression model and its intricacies.
35
33
36
34
37
35
## Philosopy and how to use
38
36
39
37
We tried to put together a benchmark course to supplement the Data Analysis texbook and to help anyone, students and intructors alike, follow the book's material. Anyone is free to use the notebooks in their current or in any modified form, with proper reference to the original material.
40
38
41
-
While we try to teach the basics on Python, this is not a classical coding course material. The notebooks take the reader through the data analysis workflow of the first 12 chapters of the textbook providing assitance in Python along the way. It is possible to learn the very basics of Python using these notebooks, but simply completing the exercises won't make anyone a programmer. Using the codebase _and_ the textbook together however, does help in understanding statistical and data analytics concepts and see the theory in practice.
39
+
While we teach the basics on Python, this is not a classical coding course material. The notebooks take the reader through the data analysis workflow of the first 12 chapters of the textbook providing assitance in Python along the way. You will learn gradually what is needed to carry out analytical steps from loading data to running regressions. We will suggest additional resources to learn more coding tools and enhance your skills.
40
+
41
+
It is possible to learn the very basics of Python using these notebooks, but simply completing the exercises won't make anyone a programmer. Using the codebase _and_ the textbook together however, does help in understanding statistical and data analytics concepts and see the theory in practice.
42
42
43
43
The lectures are pre-written, which an educated reader can follow and understand. Nevertheless, instructors may want to modify and tailor-make the codes according to their own teaching habits and philosophy. Homeworks are not part of the codebase, giving instructors another task in the practical coding sessions of their data analytics courses.
44
44
45
-
The material's main focus is the manipulation and analysis of tabular data. Pandas dataframes provide most of the tools for these manipulation exercises, and we use the `statsmodels` package for running linear regressions. We added a basic a matplotlib intro but we use `plotnine`, the Python-implementation of _ggplot_, for visualization and graphical representation.
45
+
The material's main focus is the manipulation and analysis of tabular data. `Pandas` dataframes provide most of the tools for these manipulation exercises, and we use the `statsmodels` package for running linear regressions. As for data vizualization, we added a basic intro to the most popular `matplotlib`pacakge, but rely heavily on a new favorite: `plotnine`, the Python-implementation of R's_ggplot_, for visualization and graphical representation.
46
46
47
47
48
48
## Course content
@@ -70,6 +70,6 @@ The material's main focus is the manipulation and analysis of tabular data. Pand
70
70
71
71
72
72
73
-
## Note
73
+
## Technical Note: environment
74
74
75
-
Most data science courses use the Anaconda environment for Python. We, however, use `pip` and `pipenv`, and run Jupyter notebooks from the course's environment. Anaconda is a great tool for data analysis and data science, but once someone goes beyond ad-hoc adata analysis and needs to develop and deploy advanced data solutions in a production environment in Python, `pip` is going to be the way to go.
75
+
Most data science courses use the Anaconda environment for Python. We, however, use `pip` and `pipenv`, and run Jupyter notebooks from the course's environment. Anaconda is a great tool for data analysis and data science, but once someone goes beyond ad-hoc adata analysis and needs to develop and deploy advanced data solutions in a production environment in Python, `pip` is going to be the way to go.
0 commit comments