In a previous post in this series, we did an exploratory data analysis of the Ames Housing dataset.
In this post, we will build linear and non-linear models and see how well they predict the SalePrice of properties.
Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed SalePrice will be our evaluation criteria. Taking the log ensures that errors in predicting expensive and cheap houses will affect the result equally.
In a previous post in this series, we did an exploratory data analysis of the diamonds dataset and found that carat, x, y, z were strongly correlated with price. To some extent, clarity also appeared to provide some predictive ability.
In this post, we will build linear models and see how well they predict the price of diamonds.
Before we do any transformations, feature engineering or feature selections for our model, let’s see what kind of results we get from a base linear model, that uses all the features to predict price: