# Gradient Boosting Of Regression Trees in R

Gradient boosting is a machine learning tool for “boosting” or improving model performance. How this works is that you first develop an initial model called the base learner using whatever algorithm of your choice (linear, tree, etc.).

Gradient boosting looks at the error and develops a second model using what is called da loss function. The loss function is the difference between the current  accuracy and the desired prediction whether it’s accuracy for classification or error in regression. This process of making additional models based only on the misclassified ones continues until the level of accuracy is reached.

Gradient boosting is also stochastic. This means that it randomly draws from the sample as it iterates over the data. This helps to improve accuracy and or reduce error.

In this post, we will use gradient boosting for regression trees. In particular, we will use the “Sacramento” dataset from the “caret” package. Our goal is to predict a house’s price based on the available variables. Below is some initial code

``library(caret);library(gbm);library(corrplot)``
``````data("Sacramento")
str(Sacramento)``````
``````## 'data.frame':    932 obs. of  9 variables:
##  \$ city     : Factor w/ 37 levels "ANTELOPE","AUBURN",..: 34 34 34 34 34 34 34 34 29 31 ...
##  \$ zip      : Factor w/ 68 levels "z95603","z95608",..: 64 52 44 44 53 65 66 49 24 25 ...
##  \$ beds     : int  2 3 2 2 2 3 3 3 2 3 ...
##  \$ baths    : num  1 1 1 1 1 1 2 1 2 2 ...
##  \$ sqft     : int  836 1167 796 852 797 1122 1104 1177 941 1146 ...
##  \$ type     : Factor w/ 3 levels "Condo","Multi_Family",..: 3 3 3 3 3 1 3 3 1 3 ...
##  \$ price    : int  59222 68212 68880 69307 81900 89921 90895 91002 94905 98937 ...
##  \$ latitude : num  38.6 38.5 38.6 38.6 38.5 ...
##  \$ longitude: num  -121 -121 -121 -121 -121 ...``````

Data Preparation

Already there are some actions that need to be made. We need to remove the variables “city” and “zip” because they both have a large number of factors. Next, we need to remove “latitude” and “longitude” because these values are hard to interpret in a housing price model. Let’s run the correlations before removing this information

``corrplot(cor(Sacramento[,c(-1,-2,-6)]),method = 'number')`` There also appears to be a high correlation between “sqft” and beds and bathrooms. As such, we will remove “sqft” from the model. Below is the code for the revised variables remaining for the model.

``````sacto.clean<-Sacramento
sacto.clean[,c(1,2,5)]<-NULL
sacto.clean[,c(5,6)]<-NULL
str(sacto.clean)``````
``````## 'data.frame':    932 obs. of  4 variables:
##  \$ beds : int  2 3 2 2 2 3 3 3 2 3 ...
##  \$ baths: num  1 1 1 1 1 1 2 1 2 2 ...
##  \$ type : Factor w/ 3 levels "Condo","Multi_Family",..: 3 3 3 3 3 1 3 3 1 3 ...
##  \$ price: int  59222 68212 68880 69307 81900 89921 90895 91002 94905 98937 ...``````

We will now develop our training and testing sets

``````set.seed(502)
ind=sample(2,nrow(sacto.clean),replace=T,prob=c(.7,.3))
train<-sacto.clean[ind==1,]
test<-sacto.clean[ind==2,]``````

We need to create a grid in order to develop the many different potential models available. We have to tune three different parameters for gradient boosting, These three parameters are number of trees, interaction depth, and shrinkage. Number of trees is how many trees gradient boosting g will make, interaction depth is the number of splits, shrinkage controls the contribution of each tree and stump to the final model. We also have to determine the type of cross-validation using the “trainControl”” function. Below is the code for the grid.

``````grid<-expand.grid(.n.trees=seq(100,500,by=200),.interaction.depth=seq(1,4,by=1),.shrinkage=c(.001,.01,.1),
.n.minobsinnode=10)
control<-trainControl(method = "CV")``````

Model Training

We now can train our model

```gbm.train<-train(price~.,data=train,method='gbm',trControl=control,tuneGrid=grid)
gbm.train```
```Stochastic Gradient Boosting

685 samples
4 predictors

No pre-processing
Resampling: Cross-Validated (25 fold)
Summary of sample sizes: 659, 657, 658, 657, 657, 657, ...
Resampling results across tuning parameters:

shrinkage  interaction.depth  n.trees  RMSE       Rsquared
0.001      1                  100      128372.32  0.4850879
0.001      1                  300      120272.16  0.4965552
0.001      1                  500      113986.08  0.5064680
0.001      2                  100      127197.20  0.5463527
0.001      2                  300      117228.42  0.5524074
0.001      2                  500      109634.39  0.5566431
0.001      3                  100      126633.35  0.5646994
0.001      3                  300      115873.67  0.5707619
0.001      3                  500      107850.02  0.5732942
0.001      4                  100      126361.05  0.5740655
0.001      4                  300      115269.63  0.5767396
0.001      4                  500      107109.99  0.5799836
0.010      1                  100      103554.11  0.5286663
0.010      1                  300       90114.05  0.5728993
0.010      1                  500       88327.15  0.5838981
0.010      2                  100       97876.10  0.5675862
0.010      2                  300       88260.16  0.5864650
0.010      2                  500       86773.49  0.6007150
0.010      3                  100       96138.06  0.5778062
0.010      3                  300       87213.34  0.5975438
0.010      3                  500       86309.87  0.6072987
0.010      4                  100       95260.93  0.5861798
0.010      4                  300       86962.20  0.6011429
0.010      4                  500       86380.39  0.6082593
0.100      1                  100       86808.91  0.6022690
0.100      1                  300       86081.65  0.6100963
0.100      1                  500       86197.52  0.6081493
0.100      2                  100       86810.97  0.6036919
0.100      2                  300       87251.66  0.6042293
0.100      2                  500       88396.21  0.5945206
0.100      3                  100       86649.14  0.6088309
0.100      3                  300       88565.35  0.5942948
0.100      3                  500       89971.44  0.5849622
0.100      4                  100       86922.22  0.6037571
0.100      4                  300       88629.92  0.5894188
0.100      4                  500       91008.39  0.5718534

Tuning parameter 'n.minobsinnode' was held constant at a value of 10
RMSE was used to select the optimal model using  the smallest value.
The final values used for the model were n.trees = 300, interaction.depth = 1, shrinkage = 0.1 and n.minobsinnode = 10.```

The printout shows you the values for each potential model. At the bottom of the printout are the recommended parameters for our model. We take the values at the bottom to create our model for the test data.

``````gbm.price<-gbm(price~.,data=train,n.trees = 300,interaction.depth = 1,
shrinkage = .1,distribution = 'gaussian')``````

Test Model

Now we use the test data, below we predict as well as calculate the error and make a plot.

``````gbm.test<-predict(gbm.price,newdata = test,n.trees = 300)
gbm.resid<-gbm.test-test\$price
mean(gbm.resid^2)``````
``##  8721772767``
``plot(gbm.test,test\$price)`` The actual value for the mean squared error is relative and means nothing by its self. The plot, however, looks good and indicates that our model may be doing well. The mean squared error is only useful when comparing one model to another it does not mean much by its self.