This repository contains an R Script and R Markdown files, the housing dataset and HTML output showcasing a comprehensive analysis of property prices and furnishing status, ideal for real estate insights. The analysis encompasses data cleaning, exploratory data analysis, regression modeling for price prediction, and binary logistic regression for furnishing classification. Originally created as academic coursework for Quantitative Data Analysis, this project analyzes a housing dataset to predict property prices and determine furnishing status. Users can leverage the code and analysis for their own real estate data analytics and modeling.
The dataset used for this analysis consists of various features associated with property listings:
id
: Numerical identifier for each property.price
: Numerical variable indicating the price of the property.mq
: Numerical variable representing the size of the property in square meters.floor
: Discrete numerical variable indicating the floor on which the property is located.n_rooms
: Discrete numerical variable representing the number of rooms in the property.n_bathrooms
: Discrete numerical variable indicating the number of bathrooms in the property.has_terrace
: Binary variable indicating the presence (1) or absence (0) of a terrace.has_alarm
: Binary variable indicating the presence (1) or absence (0) of an alarm system.heating
: Binary variable indicating the presence (1) or absence (0) of a heating system.has_air_conditioning
: Binary variable indicating the presence (1) or absence (0) of air conditioning.has_parking
: Binary variable indicating the presence (1) or absence (0) of a parking space.is_furnished
: Binary variable indicating whether the property is furnished (1) or not (0).
The dataset underwent a rigorous quality check involving:
- Verification against provided metadata.
- Confirmation of data dimensions, variable names, and data types.
- Review of data summaries and unique values in each column.
- Cleaning the data to remove/impute incorrect values and outliers.
The EDA involved individual analysis of each variable, including numerical summaries, skewness checks, and data visualizations. Correlations between numerical variables were also evaluated.
A regression model was developed for house price prediction, critiqued using various diagnostics, and enhanced based on the evaluations.
A logistic regression model was built to predict the likelihood of a property being furnished, utilizing all variables to determine the model’s maximum likelihood estimates.
Ensure R and RStudio are installed. Install the required R packages with the following commands:
install.packages("ggplot2")
install.packages("dplyr")
install.packages("tidyr")
# Add other package installation commands here
- Clone this repository.
- Open
CS5801_2246041.rmd
in RStudio. - Install the required dependencies.
- Run the RMD file for analysis, or open
CS5801_2246041.html
to view the precompiled output.
This project is under the MIT License - see LICENSE.md for details.