- Published on
Property Investment Assessment
216 words2 min read–––
Views
In this project we process the Property Assessment Roll data of the city of Buffalo, NY. Based on the cleaned dataset we plan to analyze the distribution of properties in Buffalo and build a model which can be used to recommend properties for investment purposes based on the various features available in the data.
Data source
City of Buffalo 2021-2022 Property Assessment Roll
Raw data has about 94000 data points. Missing value correction; 3 steps:
- For columns with missing values in more than 80% of the rows: Dropped the column as a whole.
- For columns with missing values in less than 20% of the rows: Dropped the rows corresponding to those values.
- For other columns:
- Imputed using median for numerical features.
- Imputed using mode for categorical features.
Average Value of Property in the neighborhood
Feature Selection
- The dataset after imputing missing values as 73 features. (CURSE OF DIMENSIONALITY)
- Feature selection using Pearson Correlation.
- Selected 18 features which makes the most impact.
Modelling
Four regression models were developed:
- Multiple Linear Regression Model (MLR model).
- LASSO Regression Model.
- Random Forest Regression Model.
- RIDGE Regression Model.
Model | Train R-squared value | Test R-squared value | Mean Squared Error | Maximum Error |
---|---|---|---|---|
Multiple Linear Regression | 0.70918 | 0.68964 | 3881123589.51 | 689658.64 |
LASSO Regression | 0.70831 | 0.68854 | 3894879623.56 | 690620.07 |
Random Forest Regression | 0.98474 | 0.89058 | 1368317257.65 | 596768.00 |
Ridge Regression | 0.70917 | 0.68960 | 3881670650.45 | 689261.19 |