Skip to content

This Repository have all the files used in House Price Dataset EDA

Notifications You must be signed in to change notification settings

joydityapaul/House-Price-EDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Exploratory Data Analysis

CORRELATION WITH SALES PRICE

SalePrice Probablity Density Function

dist_image

Histogram of diffrent Features

Hist_1

Scatter Plot of Every Feature with SalePrice

Scat_1 Scat_2 Scat_3 Scat_4 Scat_5 Scat_6 Scat_7

Strongly Coreelated Features with SalePrices

KitchenAbvGr: -0.1392006921778576 HalfBath: -0.08439171127179902 MSSubClass: -0.08428413512659509 OverallCond: -0.07785589404867797 YrSold: -0.028922585168736813 BsmtHalfBath: -0.02883456718548182 PoolArea: -0.014091521506356765 BsmtFullBath: 0.011439163340408606 MoSold: 0.046432245223819446 3SsnPorch: 0.06393243256889088 OpenPorchSF: 0.08645298857147718 MiscVal: 0.08896338917298921 Fireplaces: 0.12166058421363891 BsmtUnfSF: 0.16926100049514173 BedroomAbvGr: 0.18093669310848806 WoodDeckSF: 0.1937060123752066 BsmtFinSF2: 0.19895609430836594 EnclosedPorch: 0.24127883630117497 ScreenPorch: 0.2554300795487841 LotArea: 0.2638433538714051 LowQualFinSF: 0.30007501655501323 BsmtFinSF1: 0.47169042652357296 YearRemodAdd: 0.5071009671113866 YearBuilt: 0.5228973328794967 TotRmsAbvGrd: 0.5337231555820284 FullBath: 0.5745626737760822 1stFlrSF: 0.6058521846919153 GarageArea: 0.6084052829168346 TotalBsmtSF: 0.6096808188074374 GarageCars: 0.6370954062078923 2ndFlrSF: 0.6733048324568376 GrLivArea: 0.7086244776126515 OverallQual: 0.7909816005838053

CONCLUSION

There is 11 strongly correlated values with SalePrice: ['YearRemodAdd', 'YearBuilt', 'TotRmsAbvGrd', 'FullBath', '1stFlrSF', 'GarageArea', 'TotalBsmtSF', 'GarageCars', '2ndFlrSF', 'GrLivArea', 'OverallQual']

FEATURE TO FEATURE RELATION

corr

A lot of features seems to be correlated between each other but some of them such as YearBuild/GarageYrBlt may just indicate a price inflation over the years. As for 1stFlrSF/TotalBsmtSF, it is normal that the more the 1st floor is large (considering many houses have only 1 floor), the more the total basement will be large.

Now for the ones which are less obvious we can see that:

There is a strong negative correlation between BsmtUnfSF (Unfinished square feet of basement area) and BsmtFinSF2 (Type 2 finished square feet). There is definition of unfinished square feet here but as for a house of "Type 2", I can't tell what it really is.

HalfBath/2ndFlrSF is interesting and may indicate that people gives an importance of not having to rush downstairs in case of urgently having to go to the bathroom.

CONCLUSION

We can conclude that, by essence, some of those features may be combined between each other in order to reduce the number of features (1stFlrSF/ TotalBsmtSF, GarageCars/GarageArea) and others indicates that people expect multiples features to be packaged together.

QUANTATIVE TO QUANTITATIVE RELATIONSHIP

reg1

CONCLUSION

We can see that features such as TotalBsmtSF, 1stFlrSF, GrLivArea have a huge Spread. This shows that the as the SalePrice increases, Total Basement Area, 1st floor area size increases too and in a same as all these feature have approximately 1 as correlation between them.

CATEGORICAL TO QUANTITATIVE RELATIONSHIP

There is 39 non numerical features including: ['MSZoning', 'Street', 'LotShape', 'LandContour', 'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood', 'Condition1', 'Condition2', 'BldgType', 'HouseStyle', 'RoofStyle', 'RoofMatl', 'Exterior1st', 'Exterior2nd', 'MasVnrType', 'ExterQual', 'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond', 'BsmtExposure', 'BsmtFinType1', 'BsmtFinType2', 'Heating', 'HeatingQC', 'CentralAir', 'Electrical', 'KitchenQual', 'Functional', 'GarageType', 'GarageYrBlt', 'GarageFinish', 'GarageQual', 'GarageCond', 'PavedDrive', 'SaleType', 'SaleCondition']

Box Plot of Basement Exposure to SalePrice to find outlier

box1

Box Plot of Sale Condition to SalePrice to find outlier

box2

Count Plot of Features

Count

CONCLUSION

We can see that some categories are predominant for some features such as Utilities, Heating, GarageCond, Functional... These features may not be relevant for our predictive model

About

This Repository have all the files used in House Price Dataset EDA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published