Skip to content

Latest commit

 

History

History
65 lines (43 loc) · 3.26 KB

readme.md

File metadata and controls

65 lines (43 loc) · 3.26 KB

Diwali Sales Data

This week is Diwali, the festival of lights! The data this week comes from sales data for a retail store during the Diwali festival period in India. The data is shared on Kaggle by Saad Haroon.

This week we're sharing Python data analysis examples! There's a few out there, but these ones from Brushan Shelke or Vikas Vachheta (see the Diwali_Sales_Analysis.ipynb file for the code) are some data exploration analyses.

The Data

# Option 1: tidytuesdayR package 
## install.packages("tidytuesdayR")

tuesdata <- tidytuesdayR::tt_load('2023-11-14')
## OR
tuesdata <- tidytuesdayR::tt_load(2023, week = 46)

house <- tuesdata$diwali_sales_data

# Option 2: Read directly from GitHub

house <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2023/2023-11-14/diwali_sales_data.csv')

How to Participate

  • Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
  • Create a visualization, a model, a shiny app, or some other piece of data-science-related output, using R or another programming language.
  • Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.

Data Dictionary

diwali_sales_data.csv

variable class description
User_ID double User identification number
Cust_name character Customer name
Product_ID character Product identification number
Gender character Gender of the customer (e.g. Male, Female)
Age Group character Age group of the customer
Age double Age of the customer
Marital_Status double Marital status of the customer (e.g. Married, Single)
State character State of the customer
Zone character Geographic zone of the customer
Occupation character Occupation of the customer
Product_Category character Category of the product
Orders double Number of orders made by the customer
Amount double Amount in Indian rupees spent by the customer

Cleaning Script

Data was downloaded from Kaggle, and the Status and unnamed1 columns removed.

library(tidyverse)

diwali_sales_data <- read_csv("DiwaliSalesData.csv")

diwali_sales_data <- diwali_sales_data %>% select(!(c(Status, unnamed1)))

write_csv(diwali_sales_data, "diwali_sales_data.csv")