A comparative study of Linear Regression and Gradient Boosting Regression
This project performs a comparative analysis of Linear Regression and Gradient Boosting Regression techniques for predicting house prices using the King County housing dataset. The goal is to evaluate the performance of both models using R² and RMSE metrics and demonstrate the benefits of advanced ensemble methods in real estate forecasting.
- Name:
kc_house_data.csv
- Source: Kaggle - King County House Sales
- Size: 21,613 rows × 21 columns
- Target Variable:
price
- Features:
sqft_living
,bedrooms
,bathrooms
,floors
,zipcode
,waterfront
,view
, etc.
Real estate pricing is inconsistent due to varying economic conditions and location preferences. This study aims to build models that accurately predict house prices and compare regression techniques in terms of accuracy, complexity, and interpretability.
- 🔧 Data Preprocessing (missing values, feature selection, normalization)
- 📈 EDA using visualizations (scatter plots, heatmaps)
- 🤖 Model Implementation: Linear Regression & Gradient Boosting
- ⚙️ Hyperparameter Tuning: Grid Search, Cross-validation
- 📊 Model Evaluation: MSE, RMSE, R² Score
- 🔁 Comparative Analysis of performance, interpretability, scalability
- Assumes linear relationships
- Sensitive to outliers
- Multicollinearity issues
- Struggles with non-linear data
- Computationally expensive
- Sensitive to hyperparameters
- Risk of overfitting
- Less interpretable
Feature | Linear Regression | Gradient Boosting Regressor |
---|---|---|
Interpretability | High | Low |
Complexity | Low | High |
Handles Non-Linearity | Poor | Excellent |
Speed | Fast | Slower |
Hyperparameter Tuning | Minimal | Extensive |
Overfitting Risk | High | Lower |
Test Score (R²) | ~0.73 | ~0.91 |
Model | R² Score (Test) | RMSE (Approx) |
---|---|---|
Linear Regression | 0.73 | Varies |
Gradient Boosting | 0.91 | Lower RMSE |
- Price vs Square Feet
- Price vs Location (Latitude, Longitude)
- Bedrooms vs Price
- Zipcode vs Price
- Waterfront vs Price
- 🔧 Hyperparameter optimization
- 💡 Feature engineering (e.g., polynomial terms)
- 🧹 Regularization: L1 (Lasso), L2 (Ridge)
- 🌲 Compare with other models: XGBoost, LightGBM
- 🌐 Real-time deployment and streaming data support
- Clone the repository:
git clone https://github.com/<your-username>/house-price-prediction.git cd house-price-prediction