This repository contains a Jupyter Notebook for analyzing and predicting house prices using regression models. Using the popular Ames Housing dataset and performs data preprocessing, feature selections, and modeling to estimate property values.
- Feature selection
- Handling missing values
- Feature scaling and encoding
- Simplifying categorical features
- Model training and evaluation:
- Multiple Linear Regression (MLR)
- Ridge Regression
- Lasso Regression
- VIF (Variance Inflation Factor) analysis for multicollinearity
- Model comparison using RMSE, R², and Adjusted R²
- Streamlit price prediction app
Two modeling approaches were applied:
- A manual multiple linear regression (MLR) using a small set of handpicked features.
- A full-feature model comparison using regularized regressions (Ridge and Lasso).
Key takeaways:
Lasso Regression achieved the best performance on the test data, with the lowest RMSE and highest R², demonstrating its ability to generalize well while reducing overfitting.
The manual MLR model, though simpler, performed reasonably well and offered interpretability, making it a good baseline.
Linear Regression on all features overfit the training data, while Ridge Regression improved stability by controlling coefficient magnitudes.