Skip to content

Commit

Permalink
Fix broken image links.
Browse files Browse the repository at this point in the history
  • Loading branch information
mbjones committed Jan 22, 2024
1 parent 70ac206 commit 90ce8e2
Showing 1 changed file with 7 additions and 6 deletions.
13 changes: 7 additions & 6 deletions materials/sections/data-modeling-socialsci.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -61,38 +61,39 @@ Before we learn how to create a relational data model, let's look at how to reco

This is a screenshot of an actual dataset that came across NCEAS. We have all seen spreadsheets that look like this - and it is fairly obvious that whatever this is, it isn't very tidy. Let's dive deeper in to exactly **why** we wouldn't consider it tidy.

![](images/excel-org-01.png)
![](images/tidy-data-images/tidy_data/excel-org-01.png)

#### Multiple tables {-}

Your human brain can see from the way this sheet is laid out that it has three tables within it. Although it is easy for us to see and interpret this, it is extremely difficult to get a computer to see it this way, which will create headaches down the road should you try to read in this information to R or another programming language.

![](images/excel-org-02.png)
![](images/tidy-data-images/tidy_data/excel-org-02.png)

#### Inconsistent observations {-}

Rows correspond to **observations**. If you look across a single row, and you notice that there are clearly multiple observations in one row, the data are likely not tidy.

![](images/excel-org-03.png)
![](images/tidy-data-images/tidy_data/excel-org-03.png)

#### Inconsistent variables {-}

Columns correspond to **variables**. If you look down a column, and see that multiple variables exist in the table, the data are not tidy. A good test for this can be to see if you think the column consists of only one unit type.

![](images/excel-org-04.png)
![](images/tidy-data-images/tidy_data/excel-org-04.png)

#### Marginal sums and statistics {-}

Marginal sums and statistics also are not considered tidy, and they are not the same type of observation as the other rows. Instead, they are a combination of observations.

![](images/excel-org-05.png)
![](images/tidy-data-images/tidy_data/excel-org-05.png)

### Good enough data modeling

#### Denormalized data {-}

When data are "denormalized" it means that observations about different entities are combined.

![](images/table-denorm-ss.png)
![](images/table-denorm-ss.png)

In the above example, each row has measurements about both the community in which observations occurred, as well as observations of two individuals surveyed in that community. This is *not normalized* data.

Expand Down

0 comments on commit 90ce8e2

Please sign in to comment.