Linear Regression vs Bayesian Regression

In this post, we are going to look at Bayesian regression. In particular, we will compare the results of ordinary least squares regression with Bayesian regression.

Bayesian Statistics

Bayesian statistics involves the use of probabilities rather than frequencies when addressing uncertainty. This allows you to determine the distribution of the model parameters and not only the values. This is done through averaging over the model parameters through marginalizing the joint probability distribution.

Linear Regression

We will now develop our two models. The first model will be a normal regression and the second a Bayesian model. We will be looking at factors that affect the tax rate of homes in the “Hedonic” dataset in the “Ecdat” package. We will load our packages and partition our data. Below is some initial code

library(ISLR);library(caret);library(arm);library(Ecdat);library(gridExtra)
data("Hedonic")
inTrain<-createDataPartition(y=Hedonic$tax,p=0.7, list=FALSE)
trainingset <- Hedonic[inTrain, ]
testingset <- Hedonic[-inTrain, ]
str(Hedonic)
## 'data.frame':    506 obs. of  15 variables:
##  $ mv     : num  10.09 9.98 10.45 10.42 10.5 ...
##  $ crim   : num  0.00632 0.02731 0.0273 0.03237 0.06905 ...
##  $ zn     : num  18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
##  $ indus  : num  2.31 7.07 7.07 2.18 2.18 ...
##  $ chas   : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ nox    : num  28.9 22 22 21 21 ...
##  $ rm     : num  43.2 41.2 51.6 49 51.1 ...
##  $ age    : num  65.2 78.9 61.1 45.8 54.2 ...
##  $ dis    : num  1.41 1.6 1.6 1.8 1.8 ...
##  $ rad    : num  0 0.693 0.693 1.099 1.099 ...
##  $ tax    : int  296 242 242 222 222 222 311 311 311 311 ...
##  $ ptratio: num  15.3 17.8 17.8 18.7 18.7 ...
##  $ blacks : num  0.397 0.397 0.393 0.395 0.397 ...
##  $ lstat  : num  -3 -2.39 -3.21 -3.53 -2.93 ...
##  $ townid : int  1 2 2 3 3 3 4 4 4 4 ...

We will now create our regression model

ols.reg<-lm(tax~.,trainingset)
summary(ols.reg)
## 
## Call:
## lm(formula = tax ~ ., data = trainingset)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -180.898  -35.276    2.731   33.574  200.308 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 305.1928   192.3024   1.587  0.11343    
## mv          -41.8746    18.8490  -2.222  0.02697 *  
## crim          0.3068     0.6068   0.506  0.61339    
## zn            1.3278     0.2006   6.618 1.42e-10 ***
## indus         7.0685     0.8786   8.045 1.44e-14 ***
## chasyes     -17.0506    15.1883  -1.123  0.26239    
## nox           0.7005     0.4797   1.460  0.14518    
## rm           -0.1840     0.5875  -0.313  0.75431    
## age           0.3054     0.2265   1.349  0.17831    
## dis          -7.4484    14.4654  -0.515  0.60695    
## rad          98.9580     6.0964  16.232  < 2e-16 ***
## ptratio       6.8961     2.1657   3.184  0.00158 ** 
## blacks      -29.6389    45.0043  -0.659  0.51061    
## lstat       -18.6637    12.4674  -1.497  0.13532    
## townid        1.1142     0.1649   6.758 6.07e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 63.72 on 341 degrees of freedom
## Multiple R-squared:  0.8653, Adjusted R-squared:  0.8597 
## F-statistic: 156.4 on 14 and 341 DF,  p-value: < 2.2e-16

The model does a reasonable job. Next, we will do our prediction and compare the results with the test set using correlation, summary statistics, and the mean absolute error. In the code below, we use the “predict.lm” function and include the arguments “interval” for the prediction as well as “se.fit” for the standard error

ols.regTest<-predict.lm(ols.reg,testingset,interval = 'prediction',se.fit = T)

Below is the code for the correlation, summary stats, and mean absolute error. For MAE, we need to create a function.

cor(testingset$tax,ols.regTest$fit[,1])
## [1] 0.9313795
summary(ols.regTest$fit[,1])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   144.7   288.3   347.6   399.4   518.4   684.1
summary(trainingset$tax)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   188.0   279.0   330.0   410.4   666.0   711.0
MAE<-function(actual, predicted){
        mean(abs(actual-predicted))
}
MAE(ols.regTest$fit[,1], testingset$tax)
## [1] 41.07212

The correlation is excellent. The summary stats are similar and the error is not unreasonable. Below is a plot of the actual and predicted values

We now need to combine some data into one dataframe. In particular, we need the following actual dependent variable results predicted dependent variable results The upper confidence value of the prediction THe lower confidence value of the prediction

The code is below

yout.ols <- as.data.frame(cbind(testingset$tax,ols.regTest$fit))
ols.upr <- yout.ols$upr
ols.lwr <- yout.ols$lwr

We can now plot this

p.ols <- ggplot(data = yout.ols, aes(x = testingset$tax, y = ols.regTest$fit[,1])) + geom_point() + ggtitle("Ordinary Regression") + labs(x = "Actual", y = "Predicted")
p.ols + geom_errorbar(ymin = ols.lwr, ymax = ols.upr)

1.png

You can see the strong linear relationship. However, the confidence intervals are rather wide. Let’s see how Bayes does.

Bayes Regression

Bayes regression uses the “bayesglm” function from the “arm” package. We need to set the family to “gaussian” and the link to “identity”. In addition, we have to set the “prior.df” (prior degrees of freedom) to infinity as this indicates we want a normal distribution

bayes.reg<-bayesglm(tax~.,family=gaussian(link=identity),trainingset,prior.df = Inf)
bayes.regTest<-predict.glm(bayes.reg,newdata = testingset,se.fit = T)
cor(testingset$tax,bayes.regTest$fit)
## [1] 0.9313793
summary(bayes.regTest$fit)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   144.7   288.3   347.5   399.4   518.4   684.1
summary(trainingset$tax)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   188.0   279.0   330.0   410.4   666.0   711.0
MAE(bayes.regTest$fit, testingset$tax)
## [1] 41.07352

The numbers are essentially the same. This leads to the question of what is the benefit of Bayesian regression? The answer is in the confidence intervals. Next, we will calculate the confidence intervals for the Bayesian model.

yout.bayes <- as.data.frame(cbind(testingset$tax,bayes.regTest$fit))
names(yout.bayes) <- c("tax", "fit")
critval <- 1.96 #approx for 95% CI
bayes.upr <- bayes.regTest$fit + critval * bayes.regTest$se.fit
bayes.lwr <- bayes.regTest$fit - critval * bayes.regTest$se.fit

We now create our Bayesian regression plot.

p.bayes <- ggplot(data = yout.bayes, aes(x = yout.bayes$tax, y = yout.bayes$fit)) + geom_point() + ggtitle("Bayesian Regression Prediction") + labs(x = "Actual", y = "Predicted")

Lastly, we display both plots as a comparison.

ols.plot <-  p.ols + geom_errorbar(ymin = ols.lwr, ymax = ols.upr)
bayes.plot <-  p.bayes + geom_errorbar(ymin = bayes.lwr, ymax = bayes.upr)
grid.arrange(ols.plot,bayes.plot,ncol=2)

1

As you can see, the Bayesian approach gives much more compact confidence intervals. This is because the Bayesian approach a distribution of parameters is calculated from a posterior distribution. These values are then averaged to get the final prediction that appears on the plot. This reduces the variance and strengthens the confidence we can have in each individual example.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.