In this post, we are going to look at Bayesian regression. In particular, we will compare the results of ordinary least squares regression with Bayesian regression.
Bayesian Statistics
Bayesian statistics involves the use of probabilities rather than frequencies when addressing uncertainty. This allows you to determine the distribution of the model parameters and not only the values. This is done through averaging over the model parameters through marginalizing the joint probability distribution.
Linear Regression
We will now develop our two models. The first model will be a normal regression and the second a Bayesian model. We will be looking at factors that affect the tax rate of homes in the “Hedonic” dataset in the “Ecdat” package. We will load our packages and partition our data. Below is some initial code
library(ISLR);library(caret);library(arm);library(Ecdat);library(gridExtra)
data("Hedonic")
inTrain<-createDataPartition(y=Hedonic$tax,p=0.7, list=FALSE)
trainingset <- Hedonic[inTrain, ]
testingset <- Hedonic[-inTrain, ]
str(Hedonic)
## 'data.frame': 506 obs. of 15 variables:
## $ mv : num 10.09 9.98 10.45 10.42 10.5 ...
## $ crim : num 0.00632 0.02731 0.0273 0.03237 0.06905 ...
## $ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
## $ indus : num 2.31 7.07 7.07 2.18 2.18 ...
## $ chas : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ nox : num 28.9 22 22 21 21 ...
## $ rm : num 43.2 41.2 51.6 49 51.1 ...
## $ age : num 65.2 78.9 61.1 45.8 54.2 ...
## $ dis : num 1.41 1.6 1.6 1.8 1.8 ...
## $ rad : num 0 0.693 0.693 1.099 1.099 ...
## $ tax : int 296 242 242 222 222 222 311 311 311 311 ...
## $ ptratio: num 15.3 17.8 17.8 18.7 18.7 ...
## $ blacks : num 0.397 0.397 0.393 0.395 0.397 ...
## $ lstat : num -3 -2.39 -3.21 -3.53 -2.93 ...
## $ townid : int 1 2 2 3 3 3 4 4 4 4 ...
We will now create our regression model
ols.reg<-lm(tax~.,trainingset)
summary(ols.reg)
##
## Call:
## lm(formula = tax ~ ., data = trainingset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -180.898 -35.276 2.731 33.574 200.308
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 305.1928 192.3024 1.587 0.11343
## mv -41.8746 18.8490 -2.222 0.02697 *
## crim 0.3068 0.6068 0.506 0.61339
## zn 1.3278 0.2006 6.618 1.42e-10 ***
## indus 7.0685 0.8786 8.045 1.44e-14 ***
## chasyes -17.0506 15.1883 -1.123 0.26239
## nox 0.7005 0.4797 1.460 0.14518
## rm -0.1840 0.5875 -0.313 0.75431
## age 0.3054 0.2265 1.349 0.17831
## dis -7.4484 14.4654 -0.515 0.60695
## rad 98.9580 6.0964 16.232 < 2e-16 ***
## ptratio 6.8961 2.1657 3.184 0.00158 **
## blacks -29.6389 45.0043 -0.659 0.51061
## lstat -18.6637 12.4674 -1.497 0.13532
## townid 1.1142 0.1649 6.758 6.07e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 63.72 on 341 degrees of freedom
## Multiple R-squared: 0.8653, Adjusted R-squared: 0.8597
## F-statistic: 156.4 on 14 and 341 DF, p-value: < 2.2e-16
The model does a reasonable job. Next, we will do our prediction and compare the results with the test set using correlation, summary statistics, and the mean absolute error. In the code below, we use the “predict.lm” function and include the arguments “interval” for the prediction as well as “se.fit” for the standard error
ols.regTest<-predict.lm(ols.reg,testingset,interval = 'prediction',se.fit = T)
Below is the code for the correlation, summary stats, and mean absolute error. For MAE, we need to create a function.
cor(testingset$tax,ols.regTest$fit[,1])
## [1] 0.9313795
summary(ols.regTest$fit[,1])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 144.7 288.3 347.6 399.4 518.4 684.1
summary(trainingset$tax)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 188.0 279.0 330.0 410.4 666.0 711.0
MAE<-function(actual, predicted){
mean(abs(actual-predicted))
}
MAE(ols.regTest$fit[,1], testingset$tax)
## [1] 41.07212
The correlation is excellent. The summary stats are similar and the error is not unreasonable. Below is a plot of the actual and predicted values
We now need to combine some data into one dataframe. In particular, we need the following actual dependent variable results predicted dependent variable results The upper confidence value of the prediction THe lower confidence value of the prediction
The code is below
yout.ols <- as.data.frame(cbind(testingset$tax,ols.regTest$fit))
ols.upr <- yout.ols$upr
ols.lwr <- yout.ols$lwr
We can now plot this
p.ols <- ggplot(data = yout.ols, aes(x = testingset$tax, y = ols.regTest$fit[,1])) + geom_point() + ggtitle("Ordinary Regression") + labs(x = "Actual", y = "Predicted")
p.ols + geom_errorbar(ymin = ols.lwr, ymax = ols.upr)
You can see the strong linear relationship. However, the confidence intervals are rather wide. Let’s see how Bayes does.
Bayes Regression
Bayes regression uses the “bayesglm” function from the “arm” package. We need to set the family to “gaussian” and the link to “identity”. In addition, we have to set the “prior.df” (prior degrees of freedom) to infinity as this indicates we want a normal distribution
bayes.reg<-bayesglm(tax~.,family=gaussian(link=identity),trainingset,prior.df = Inf)
bayes.regTest<-predict.glm(bayes.reg,newdata = testingset,se.fit = T)
cor(testingset$tax,bayes.regTest$fit)
## [1] 0.9313793
summary(bayes.regTest$fit)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 144.7 288.3 347.5 399.4 518.4 684.1
summary(trainingset$tax)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 188.0 279.0 330.0 410.4 666.0 711.0
MAE(bayes.regTest$fit, testingset$tax)
## [1] 41.07352
The numbers are essentially the same. This leads to the question of what is the benefit of Bayesian regression? The answer is in the confidence intervals. Next, we will calculate the confidence intervals for the Bayesian model.
yout.bayes <- as.data.frame(cbind(testingset$tax,bayes.regTest$fit))
names(yout.bayes) <- c("tax", "fit")
critval <- 1.96 #approx for 95% CI
bayes.upr <- bayes.regTest$fit + critval * bayes.regTest$se.fit
bayes.lwr <- bayes.regTest$fit - critval * bayes.regTest$se.fit
We now create our Bayesian regression plot.
p.bayes <- ggplot(data = yout.bayes, aes(x = yout.bayes$tax, y = yout.bayes$fit)) + geom_point() + ggtitle("Bayesian Regression Prediction") + labs(x = "Actual", y = "Predicted")
Lastly, we display both plots as a comparison.
ols.plot <- p.ols + geom_errorbar(ymin = ols.lwr, ymax = ols.upr)
bayes.plot <- p.bayes + geom_errorbar(ymin = bayes.lwr, ymax = bayes.upr)
grid.arrange(ols.plot,bayes.plot,ncol=2)
As you can see, the Bayesian approach gives much more compact confidence intervals. This is because the Bayesian approach a distribution of parameters is calculated from a posterior distribution. These values are then averaged to get the final prediction that appears on the plot. This reduces the variance and strengthens the confidence we can have in each individual example.