Description

Start here if...

You have some experience with R or Python and machine learning basics. This is a perfect competition for data science students who have completed an online course in machine learning and are looking to expand their skill set before trying a featured competition.

Competition Description

Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.

With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.

Practice Skills

Creative feature engineering Advanced regression techniques like random forest and gradient boosting Acknowledgments The Ames Housing dataset was compiled by Dean De Cock for use in data science education. It's an incredible alternative for data scientists looking for a modernized and expanded version of the often cited Boston Housing dataset. !

Evaluation

Goal

It is your job to predict the sales price for each house. For each Id in the test set, you must predict the value of the SalePrice variable.

Metric

Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price. (Taking logs means that errors in predicting expensive houses and cheap houses will affect the result equally.)

Submission File Format

The file should contain a header and have the following format:

Id,SalePrice 1461,169000.1 1462,187724.1233 1463,175221 etc. You can download an example submission file (sample_submission.csv) on the Data page.

Let's get stated!

## load the library

import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
%matplotlib inline
import missingno as msno

import scipy.stats as stats

Load the datasets

url_train = "https://raw.githubusercontent.com/lucastiagooliveira/lucastiagooliveira/master/Kaggle/house-prices-advanced-regression-techniques/train.csv"
url_test = "https://raw.githubusercontent.com/lucastiagooliveira/lucastiagooliveira/master/Kaggle/house-prices-advanced-regression-techniques/test.csv"

df_train = pd.read_csv(url_train)
df_test = pd.read_csv(url_test)

combine = [df_train, df_test]

df_train.head()

df_test.head()

print(df_train.head().info())
print('_'*50)
print(df_test.head().info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 81 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             5 non-null      int64  
 1   MSSubClass     5 non-null      int64  
 2   MSZoning       5 non-null      object 
 3   LotFrontage    5 non-null      float64
 4   LotArea        5 non-null      int64  
 5   Street         5 non-null      object 
 6   Alley          0 non-null      object 
 7   LotShape       5 non-null      object 
 8   LandContour    5 non-null      object 
 9   Utilities      5 non-null      object 
 10  LotConfig      5 non-null      object 
 11  LandSlope      5 non-null      object 
 12  Neighborhood   5 non-null      object 
 13  Condition1     5 non-null      object 
 14  Condition2     5 non-null      object 
 15  BldgType       5 non-null      object 
 16  HouseStyle     5 non-null      object 
 17  OverallQual    5 non-null      int64  
 18  OverallCond    5 non-null      int64  
 19  YearBuilt      5 non-null      int64  
 20  YearRemodAdd   5 non-null      int64  
 21  RoofStyle      5 non-null      object 
 22  RoofMatl       5 non-null      object 
 23  Exterior1st    5 non-null      object 
 24  Exterior2nd    5 non-null      object 
 25  MasVnrType     5 non-null      object 
 26  MasVnrArea     5 non-null      float64
 27  ExterQual      5 non-null      object 
 28  ExterCond      5 non-null      object 
 29  Foundation     5 non-null      object 
 30  BsmtQual       5 non-null      object 
 31  BsmtCond       5 non-null      object 
 32  BsmtExposure   5 non-null      object 
 33  BsmtFinType1   5 non-null      object 
 34  BsmtFinSF1     5 non-null      int64  
 35  BsmtFinType2   5 non-null      object 
 36  BsmtFinSF2     5 non-null      int64  
 37  BsmtUnfSF      5 non-null      int64  
 38  TotalBsmtSF    5 non-null      int64  
 39  Heating        5 non-null      object 
 40  HeatingQC      5 non-null      object 
 41  CentralAir     5 non-null      object 
 42  Electrical     5 non-null      object 
 43  1stFlrSF       5 non-null      int64  
 44  2ndFlrSF       5 non-null      int64  
 45  LowQualFinSF   5 non-null      int64  
 46  GrLivArea      5 non-null      int64  
 47  BsmtFullBath   5 non-null      int64  
 48  BsmtHalfBath   5 non-null      int64  
 49  FullBath       5 non-null      int64  
 50  HalfBath       5 non-null      int64  
 51  BedroomAbvGr   5 non-null      int64  
 52  KitchenAbvGr   5 non-null      int64  
 53  KitchenQual    5 non-null      object 
 54  TotRmsAbvGrd   5 non-null      int64  
 55  Functional     5 non-null      object 
 56  Fireplaces     5 non-null      int64  
 57  FireplaceQu    4 non-null      object 
 58  GarageType     5 non-null      object 
 59  GarageYrBlt    5 non-null      float64
 60  GarageFinish   5 non-null      object 
 61  GarageCars     5 non-null      int64  
 62  GarageArea     5 non-null      int64  
 63  GarageQual     5 non-null      object 
 64  GarageCond     5 non-null      object 
 65  PavedDrive     5 non-null      object 
 66  WoodDeckSF     5 non-null      int64  
 67  OpenPorchSF    5 non-null      int64  
 68  EnclosedPorch  5 non-null      int64  
 69  3SsnPorch      5 non-null      int64  
 70  ScreenPorch    5 non-null      int64  
 71  PoolArea       5 non-null      int64  
 72  PoolQC         0 non-null      object 
 73  Fence          0 non-null      object 
 74  MiscFeature    0 non-null      object 
 75  MiscVal        5 non-null      int64  
 76  MoSold         5 non-null      int64  
 77  YrSold         5 non-null      int64  
 78  SaleType       5 non-null      object 
 79  SaleCondition  5 non-null      object 
 80  SalePrice      5 non-null      int64  
dtypes: float64(3), int64(35), object(43)
memory usage: 3.3+ KB
None
__________________________________________________
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 80 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Id             5 non-null      int64  
 1   MSSubClass     5 non-null      int64  
 2   MSZoning       5 non-null      object 
 3   LotFrontage    5 non-null      float64
 4   LotArea        5 non-null      int64  
 5   Street         5 non-null      object 
 6   Alley          0 non-null      object 
 7   LotShape       5 non-null      object 
 8   LandContour    5 non-null      object 
 9   Utilities      5 non-null      object 
 10  LotConfig      5 non-null      object 
 11  LandSlope      5 non-null      object 
 12  Neighborhood   5 non-null      object 
 13  Condition1     5 non-null      object 
 14  Condition2     5 non-null      object 
 15  BldgType       5 non-null      object 
 16  HouseStyle     5 non-null      object 
 17  OverallQual    5 non-null      int64  
 18  OverallCond    5 non-null      int64  
 19  YearBuilt      5 non-null      int64  
 20  YearRemodAdd   5 non-null      int64  
 21  RoofStyle      5 non-null      object 
 22  RoofMatl       5 non-null      object 
 23  Exterior1st    5 non-null      object 
 24  Exterior2nd    5 non-null      object 
 25  MasVnrType     5 non-null      object 
 26  MasVnrArea     5 non-null      float64
 27  ExterQual      5 non-null      object 
 28  ExterCond      5 non-null      object 
 29  Foundation     5 non-null      object 
 30  BsmtQual       5 non-null      object 
 31  BsmtCond       5 non-null      object 
 32  BsmtExposure   5 non-null      object 
 33  BsmtFinType1   5 non-null      object 
 34  BsmtFinSF1     5 non-null      float64
 35  BsmtFinType2   5 non-null      object 
 36  BsmtFinSF2     5 non-null      float64
 37  BsmtUnfSF      5 non-null      float64
 38  TotalBsmtSF    5 non-null      float64
 39  Heating        5 non-null      object 
 40  HeatingQC      5 non-null      object 
 41  CentralAir     5 non-null      object 
 42  Electrical     5 non-null      object 
 43  1stFlrSF       5 non-null      int64  
 44  2ndFlrSF       5 non-null      int64  
 45  LowQualFinSF   5 non-null      int64  
 46  GrLivArea      5 non-null      int64  
 47  BsmtFullBath   5 non-null      float64
 48  BsmtHalfBath   5 non-null      float64
 49  FullBath       5 non-null      int64  
 50  HalfBath       5 non-null      int64  
 51  BedroomAbvGr   5 non-null      int64  
 52  KitchenAbvGr   5 non-null      int64  
 53  KitchenQual    5 non-null      object 
 54  TotRmsAbvGrd   5 non-null      int64  
 55  Functional     5 non-null      object 
 56  Fireplaces     5 non-null      int64  
 57  FireplaceQu    2 non-null      object 
 58  GarageType     5 non-null      object 
 59  GarageYrBlt    5 non-null      float64
 60  GarageFinish   5 non-null      object 
 61  GarageCars     5 non-null      float64
 62  GarageArea     5 non-null      float64
 63  GarageQual     5 non-null      object 
 64  GarageCond     5 non-null      object 
 65  PavedDrive     5 non-null      object 
 66  WoodDeckSF     5 non-null      int64  
 67  OpenPorchSF    5 non-null      int64  
 68  EnclosedPorch  5 non-null      int64  
 69  3SsnPorch      5 non-null      int64  
 70  ScreenPorch    5 non-null      int64  
 71  PoolArea       5 non-null      int64  
 72  PoolQC         0 non-null      object 
 73  Fence          2 non-null      object 
 74  MiscFeature    1 non-null      object 
 75  MiscVal        5 non-null      int64  
 76  MoSold         5 non-null      int64  
 77  YrSold         5 non-null      int64  
 78  SaleType       5 non-null      object 
 79  SaleCondition  5 non-null      object 
dtypes: float64(11), int64(26), object(43)
memory usage: 3.2+ KB
None

Print the dependent variable

sns.set()

sns.distplot(df_train.SalePrice, color = 'b')

<AxesSubplot:xlabel='SalePrice'>

print('Skewness: %f' % df_train.SalePrice.skew())
print('Kurtosis: %f' % df_train.SalePrice.kurt())

Skewness: 1.882876
Kurtosis: 6.536282

Separation the type - Between: quantitative and qualitative

quant = [i for i in df_train.columns if df_train[i].dtypes != object]
quali = [i for i in df_train.columns if df_train[i].dtypes == object]

quant.remove('Id')
quant.remove('SalePrice')

# quant = df_train[quant]
# quali = df_train[quali]

target = df_train.SalePrice

Print variable

# Print quantitative varibles

sns.set(style="darkgrid")

melted = pd.melt(df_train, value_vars= quant)

g = sns.FacetGrid(melted, col = 'variable', margin_titles=True, col_wrap = 3, sharex = False, sharey = False, height = 5)

g.map(sns.distplot, "value", color="steelblue")

<seaborn.axisgrid.FacetGrid at 0x1a367b5a2e8>

# Print qualitative varibles

def boxplot(x, y, **kwargs):
    sns.boxplot(x=x, y=y)
    x = plt.xticks(rotation = 90)
sns.set()

melted = pd.melt(df_train, value_vars= quali, id_vars = ['SalePrice'])

g = sns.FacetGrid(melted, col = 'variable', margin_titles=True, col_wrap = 2, sharex = False, sharey = False, height = 8)

g.map(boxplot, 'value', 'SalePrice')

<seaborn.axisgrid.FacetGrid at 0x1a36b712358>

Correlation with data

# sns.heatmap(df_train.corr())
sns.barplot(x = df_train.corr().SalePrice.sort_values(ascending = False).index ,y = df_train.corr().SalePrice.sort_values(ascending = False))
plt.xticks(rotation = 90)

(array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
        34, 35, 36, 37]),
 [Text(0, 0, 'SalePrice'),
  Text(1, 0, 'OverallQual'),
  Text(2, 0, 'GrLivArea'),
  Text(3, 0, 'GarageCars'),
  Text(4, 0, 'GarageArea'),
  Text(5, 0, 'TotalBsmtSF'),
  Text(6, 0, '1stFlrSF'),
  Text(7, 0, 'FullBath'),
  Text(8, 0, 'TotRmsAbvGrd'),
  Text(9, 0, 'YearBuilt'),
  Text(10, 0, 'YearRemodAdd'),
  Text(11, 0, 'GarageYrBlt'),
  Text(12, 0, 'MasVnrArea'),
  Text(13, 0, 'Fireplaces'),
  Text(14, 0, 'BsmtFinSF1'),
  Text(15, 0, 'LotFrontage'),
  Text(16, 0, 'WoodDeckSF'),
  Text(17, 0, '2ndFlrSF'),
  Text(18, 0, 'OpenPorchSF'),
  Text(19, 0, 'HalfBath'),
  Text(20, 0, 'LotArea'),
  Text(21, 0, 'BsmtFullBath'),
  Text(22, 0, 'BsmtUnfSF'),
  Text(23, 0, 'BedroomAbvGr'),
  Text(24, 0, 'ScreenPorch'),
  Text(25, 0, 'PoolArea'),
  Text(26, 0, 'MoSold'),
  Text(27, 0, '3SsnPorch'),
  Text(28, 0, 'BsmtFinSF2'),
  Text(29, 0, 'BsmtHalfBath'),
  Text(30, 0, 'MiscVal'),
  Text(31, 0, 'Id'),
  Text(32, 0, 'LowQualFinSF'),
  Text(33, 0, 'YrSold'),
  Text(34, 0, 'OverallCond'),
  Text(35, 0, 'MSSubClass'),
  Text(36, 0, 'EnclosedPorch'),
  Text(37, 0, 'KitchenAbvGr')])

# The most positive correlated variable with price in your dataset
sns.heatmap(df_train[df_train.corr().SalePrice.sort_values(ascending = False).index[0:10]].corr(),
            annot = True,
            linewidths=.3)

<AxesSubplot:>

Same observation about the resultant correlational heatmap:

- 'OverallCond': It's sound good, whether condition of the house is good the price incrise too;
- 'GrLivArea': It's make sense;
- 'GarageCars' and 'GarageArea': Those seems like twins and they have correlation about 0.88, we'll consider just one for analyses;
- 'TotalBsmtSF': Total square feet of basement area, it's make sense too;
- '1stFlrSF': First Floor square feet, it's sounds good, but it sound like 'TotalBsmtSF';
- 'FullBath': Usefull;
- 'TotRmsAbvGrd': is twin with 'GrLivArea';
- 'YearBuilt': Good, but sighly correlation with price

var_used = ['SalePrice', 'OverallCond', 'GrLivArea','GarageCars', 'TotalBsmtSF','FullBath','YearBuilt']

sns.pairplot(df_train[var_used], height = 7)

<seaborn.axisgrid.PairGrid at 0x1a378ed4f98>

train = df_train[var_used]
var_used_test = var_used.copy()
var_used_test.remove('SalePrice')
test = df_test[var_used_test]
test

Exclude the outliers

drop_out = list(train.loc[train.GrLivArea > 4500].index)
train = train.drop(labels = drop_out, axis = 0)

sns.set()

ax = sns.scatterplot(x = 'GrLivArea', y = 'SalePrice', data = train)

Regularizing the saleprice

sns.distplot(train.SalePrice)

<AxesSubplot:xlabel='SalePrice'>

print('Skewness of the SalePrice %f' % stats.skew(train.SalePrice))

Skewness of the SalePrice 1.879360

train.SalePrice = np.log1p(train.SalePrice)

sns.distplot(train.SalePrice)
print('Skewness of the SalePrice %f' % stats.skew(train.SalePrice))

Skewness of the SalePrice 0.121455

skewness = train.apply(lambda x:stats.skew(x))
skewness

SalePrice      0.121455
OverallCond    0.690324
GrLivArea      1.009951
GarageCars    -0.342025
TotalBsmtSF    0.511177
FullBath       0.031239
YearBuilt     -0.611665
dtype: float64

sns.distplot(train.GrLivArea)

<AxesSubplot:xlabel='GrLivArea'>

#Apply the logtransformation GrLivArea

train.GrLivArea = train.GrLivArea.apply(lambda x: np.log1p(x))
test.GrLivArea = test.GrLivArea.apply(lambda x: np.log1p(x))
sns.distplot(train.GrLivArea)

<AxesSubplot:xlabel='GrLivArea'>

Preprocessing data

from sklearn.preprocessing import StandardScaler

X = train.drop('SalePrice', axis = 1)

y = train.SalePrice

# X = pd.get_dummies(X, columns = ['OverallCond', 'GarageCars', 'FullBath'])

X.reset_index(inplace = True, drop = True)

scaler = StandardScaler()

X

def diff(li1, li2): 
    li_dif = [abs(i - j) for i, j in zip(li1, li2)]
    return tuple(li_dif)

def get_dummy(data, col, dim):
    dummy = pd.DataFrame()
    for i, column in enumerate(col):
        dummy = pd.get_dummies(data[column])
#         dummy = dummy.merge(pd.get_dummies(X[column]), how = 'inner', left_index= True, right_index=True)
        if pd.get_dummies(data[column]).shape[1] < dim[i]:
            lack = diff([0, dim[i]], list(pd.get_dummies(data[column]).shape))
            zeros = pd.DataFrame(np.zeros(lack))
            dummy = pd.concat([dummy, zeros], axis = 1, join = 'inner', ignore_index = True)       
        data = pd.concat([data, dummy], axis = 1)
        data = data.drop(column, axis = 1)
#         data = data.merge(dummy, how = 'inner', left_index= True, right_index=True)
    return data

X = get_dummy(X, ['OverallCond', 'GarageCars', 'FullBath'], [9, 6, 5])
X.shape

(1458, 23)

X.columns

Index([  'GrLivArea', 'TotalBsmtSF',   'YearBuilt',             1,
                   2,             3,             4,             5,
                   6,             7,             8,             9,
                   0,             1,             2,             3,
                   4,             5,             0,             1,
                   2,             3,             4],
      dtype='object')

X = scaler.fit_transform(X)
X.shape

(1458, 23)

Models

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3)

Gradient Boosting Regressor

from sklearn.ensemble import GradientBoostingRegressor
reg = GradientBoostingRegressor(random_state=0)

reg = GradientBoostingRegressor(random_state=0).fit(X_train, y_train)

reg.score(X_test, y_test)

0.8442008349303682

from sklearn.metrics import r2_score

yhat = reg.predict(X_test)

print("Mean absolute error: %.4f" % np.mean(np.absolute(yhat - y_test)))
print("Residual sum of squares (MSE): %.4f" % np.mean((yhat - y_test) ** 2))
print("R2-score: %.4f" % r2_score(yhat , y_test) )

Mean absolute error: 0.1128
Residual sum of squares (MSE): 0.0245
R2-score: 0.8064

sns.scatterplot(x = np.expm1(yhat), y = np.expm1(y_test))

<AxesSubplot:ylabel='SalePrice'>

Ridge

from sklearn.linear_model import Ridge

clf = Ridge(alpha = 8)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)

0.8596645522643142

yhat = clf.predict(X_test)

print("Mean absolute error: %.4f" % np.mean(np.absolute(yhat - y_test)))
print("Residual sum of squares (MSE): %.4f" % np.mean((yhat - y_test) ** 2))
print("R2-score: %.4f" % r2_score(yhat , y_test) )

Mean absolute error: 0.1086
Residual sum of squares (MSE): 0.0220
R2-score: 0.8244

Lasso

from sklearn.linear_model import Lasso

lasso = Lasso(alpha = 0.000002, max_iter = 1e4).fit(X_train, y_train)
lasso.score(X_test, y_test)

0.8593129526718498

yhat = lasso.predict(X_test)

print("Mean absolute error: %.4f" % np.mean(np.absolute(yhat - y_test)))
print("Residual sum of squares (MSE): %.4f" % np.mean((yhat - y_test) ** 2))
print("R2-score: %.4f" % r2_score(yhat , y_test) )

Mean absolute error: 0.1090
Residual sum of squares (MSE): 0.0221
R2-score: 0.8255

Missing values

print(train.isnull().sum())
print('_'*20)
print(test.isnull().sum())

SalePrice      0
OverallCond    0
GrLivArea      0
GarageCars     0
TotalBsmtSF    0
FullBath       0
YearBuilt      0
dtype: int64
____________________
OverallCond    0
GrLivArea      0
GarageCars     1
TotalBsmtSF    1
FullBath       0
YearBuilt      0
dtype: int64

Imputation for missing values

sns.distplot(test.GarageCars)

<AxesSubplot:xlabel='GarageCars'>

sns.distplot(test.TotalBsmtSF)

<AxesSubplot:xlabel='TotalBsmtSF'>

test.fillna(value = 0, inplace = True)

print(test.isnull().sum())

OverallCond    0
GrLivArea      0
GarageCars     0
TotalBsmtSF    0
FullBath       0
YearBuilt      0
dtype: int64

train.groupby(by = 'GarageCars').count()

test.groupby(by = 'GarageCars').count()

train.groupby(by = 'FullBath').count()

test.groupby(by = 'FullBath').count()

# x_test = get_dummy(test, ['OverallCond', 'GarageCars', 'FullBath'], [9, 6, 5])
x_test_2 = pd.get_dummies(test, columns = ['OverallCond', 'GarageCars', 'FullBath'])

test

x_test_2

x_test_2 = scaler.fit_transform(x_test_2)

Final Model

from sklearn.linear_model import Lasso

lasso_2 = Lasso(alpha = 0.000002, max_iter = 1e4).fit(X, y)
lasso_2.score(X, y)

0.8526452970022899

yhat_final = lasso_2.predict(x_test_2)

sns.distplot(yhat_final)

<AxesSubplot:>

dicto = {'Id': list(df_test.Id), 'SalePrice': np.expm1(yhat_final).tolist()}
len(dicto['Id'])

1459

submission = pd.DataFrame(dicto)
submission

submission.to_csv('submission.csv',index=False)

	Id	MSSubClass	MSZoning	LotFrontage	LotArea	Street	Alley	LotShape	LandContour	Utilities	...	PoolQC	Fence	MiscFeature	MoSold	YrSold	SaleType	SaleCondition	SalePrice
0	1	60	RL	65.0	8450	Pave	NaN	Reg	Lvl	AllPub	...	NaN	NaN	NaN	2	2008	WD	Normal	208500
1	2	20	RL	80.0	9600	Pave	NaN	Reg	Lvl	AllPub	...	NaN	NaN	NaN	5	2007	WD	Normal	181500
2	3	60	RL	68.0	11250	Pave	NaN	IR1	Lvl	AllPub	...	NaN	NaN	NaN	9	2008	WD	Normal	223500
3	4	70	RL	60.0	9550	Pave	NaN	IR1	Lvl	AllPub	...	NaN	NaN	NaN	2	2006	WD	Abnorml	140000
4	5	60	RL	84.0	14260	Pave	NaN	IR1	Lvl	AllPub	...	NaN	NaN	NaN	12	2008	WD	Normal	250000

	Id	MSSubClass	MSZoning	LotFrontage	LotArea	Street	Alley	LotShape	LandContour	Utilities	...	ScreenPorch	PoolQC	Fence	MiscFeature	MiscVal	MoSold	YrSold	SaleType	SaleCondition
0	1461	20	RH	80.0	11622	Pave	NaN	Reg	Lvl	AllPub	...	120	NaN	MnPrv	NaN	0	6	2010	WD	Normal
1	1462	20	RL	81.0	14267	Pave	NaN	IR1	Lvl	AllPub	...	0	NaN	NaN	Gar2	12500	6	2010	WD	Normal
2	1463	60	RL	74.0	13830	Pave	NaN	IR1	Lvl	AllPub	...	0	NaN	MnPrv	NaN	0	3	2010	WD	Normal
3	1464	60	RL	78.0	9978	Pave	NaN	IR1	Lvl	AllPub	...	0	NaN	NaN	NaN	0	6	2010	WD	Normal
4	1465	120	RL	43.0	5005	Pave	NaN	IR1	HLS	AllPub	...	144	NaN	NaN	NaN	0	1	2010	WD	Normal

	OverallCond	GrLivArea	GarageCars	TotalBsmtSF	FullBath	YearBuilt
0	6	896	1.0	882.0	1	1961
1	6	1329	1.0	1329.0	1	1958
2	5	1629	2.0	928.0	2	1997
3	6	1604	2.0	926.0	2	1998
4	5	1280	2.0	1280.0	2	1992
...	...	...	...	...	...	...
1454	7	1092	0.0	546.0	1	1970
1455	5	1092	1.0	546.0	1	1970
1456	7	1224	2.0	1224.0	1	1960
1457	5	970	0.0	912.0	1	1992
1458	5	2000	3.0	996.0	2	1993

	OverallCond	GrLivArea	GarageCars	TotalBsmtSF	FullBath	YearBuilt
0	5	7.444833	2	856	2	2003
1	8	7.141245	2	1262	2	1976
2	5	7.488294	2	920	2	2001
3	5	7.448916	3	756	1	1915
4	5	7.695758	3	1145	2	2000
...	...	...	...	...	...	...
1453	5	7.407318	2	953	2	1999
1454	6	7.637234	2	1542	2	1978
1455	9	7.758333	1	1152	2	1941
1456	6	6.983790	1	1078	1	1950
1457	6	7.136483	1	1256	1	1965

	SalePrice	OverallCond	GrLivArea	TotalBsmtSF	FullBath	YearBuilt
GarageCars
0	81	81	81	81	81	81
1	369	369	369	369	369	369
2	823	823	823	823	823	823
3	180	180	180	180	180	180
4	5	5	5	5	5	5

	OverallCond	GrLivArea	GarageCars	TotalBsmtSF	FullBath	YearBuilt
0	6	6.799056	1.0	882.0	1	1961
1	6	7.192934	1.0	1329.0	1	1958
2	5	7.396335	2.0	928.0	2	1997
3	6	7.380879	2.0	926.0	2	1998
4	5	7.155396	2.0	1280.0	2	1992
...	...	...	...	...	...	...
1454	7	6.996681	0.0	546.0	1	1970
1455	5	6.996681	1.0	546.0	1	1970
1456	7	7.110696	2.0	1224.0	1	1960
1457	5	6.878326	0.0	912.0	1	1992
1458	5	7.601402	3.0	996.0	2	1993

	Id	SalePrice
0	1461	113976.301594
1	1462	158335.287355
2	1463	191894.306849
3	1464	202444.764972
4	1465	174599.506222
...	...	...
1454	2915	121639.299342
1455	2916	117651.621741
1456	2917	172350.601297
1457	2918	117082.978333
1458	2919	250747.137098

	OverallCond	GrLivArea	TotalBsmtSF	FullBath	YearBuilt
GarageCars
0.0	77	77	77	77	77
1.0	407	407	407	407	407
2.0	770	770	770	770	770
3.0	193	193	193	193	193
4.0	11	11	11	11	11
5.0	1	1	1	1	1

	SalePrice	OverallCond	GrLivArea	GarageCars	TotalBsmtSF	YearBuilt
FullBath
0	9	9	9	9	9	9
1	650	650	650	650	650	650
2	767	767	767	767	767	767
3	32	32	32	32	32	32

	OverallCond	GrLivArea	GarageCars	TotalBsmtSF	YearBuilt
FullBath
0	3	3	3	3	3
1	659	659	659	659	659
2	762	762	762	762	762
3	31	31	31	31	31
4	4	4	4	4	4