It's election day in the United States! To celebrate, the data this week comes from the MIT Election Data and Science Lab (MEDSL). Hat tip this week to the RStudio GitHub Copilot integration, which suggested the MEDSL.
From the MEDSL's report New Report: How We Voted in 2022:
The Survey of the Performance of American Elections (SPAE) provides information about how Americans experienced voting in the most recent federal election. The survey has been conducted after federal elections since 2008, and is the only public opinion project in the country that is dedicated explicitly to understanding how voters themselves experience the election process.
We're specifically providing data on House elections from 1976-2022. Check out the MEDSL website for additional datasets and tools.
Be sure to cite the MEDSL in your work:
@data{DVN/IG0UN2_2017,
author = {MIT Election Data and Science Lab},
publisher = {Harvard Dataverse},
title = {{U.S. House 1976–2022}},
UNF = {UNF:6:A6RSZvlhh8eRZ4+mvT/HRQ==},
year = {2017},
version = {V12},
doi = {10.7910/DVN/IG0UN2},
url = {https://doi.org/10.7910/DVN/IG0UN2}
}
# Option 1: tidytuesdayR package
## install.packages("tidytuesdayR")
tuesdata <- tidytuesdayR::tt_load('2023-11-07')
## OR
tuesdata <- tidytuesdayR::tt_load(2023, week = 45)
house <- tuesdata$house
# Option 2: Read directly from GitHub
house <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-11-07/house.csv')
- Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
- Create a visualization, a model, a shiny app, or some other piece of data-science-related output, using R or another programming language.
- Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.
variable | class | description |
---|---|---|
year | double | year in which election was held |
state | character | state name |
state_po | character | U.S. postal code state abbreviation |
state_fips | double | State FIPS code |
state_cen | double | U.S. Census state code |
state_ic | double | ICPSR state code |
office | character | U.S. House (constant) |
district | character | district number. At-large districts are coded as 0 (zero) |
stage | character | electoral stage (gen = general elections, pri = primary elections) |
runoff | logical | runoff election |
special | logical | special election |
candidate | character | name of the candidate as it appears in the House Clerk report |
party | character | party of the candidate (always entirely lowercase) (Parties are as they appear in the House Clerk report. In states that allow candidates to appear on multiple party lines, separate vote totals are indicated for each party. Therefore, for analysis that involves candidate totals, it will be necessary to aggregate across all party lines within a district. For analysis that focuses on two-party vote totals, it will be necessary to account for major party candidates who receive votes under multiple party labels. Minnesota party labels are given as they appear on the Minnesota ballots. Future versions of this file will include codes for candidates who are endorsed by major parties, regardless of the party label under which they receive votes.) |
writein | logical | vote totals associated with write-in candidates |
mode | character | mode of voting; states with data that doesn't break down returns by mode are marked as "total" |
candidatevotes | double | votes received by this candidate for this particular party |
totalvotes | double | total number of votes cast for this election |
unofficial | logical | TRUE/FALSE indicator for unofficial result (to be updated later); this appears only for 2018 data in some cases |
version | double | date when this dataset was finalized |
fusion_ticket | logical | A TRUE/FALSE indicator as to whether the given candidate is running on a fusion party ticket, which will in turn mean that a candidate will appear multiple times, but by different parties, for a given election. States with fusion tickets include Connecticut, New Jersey, New York, and South Carolina. |
Clean data and dictionary downloaded from the Harvard Dataverse