Regression with Shrinkage Methods

One problem with least squares regression is determining what variables to keep in a model. One solution to this problem is the use of shrinkage methods. Shrinkage regression involves constraining or regularizing the coefficient estimates towards zero. The benefit of this is that it is an efficient way to either remove variables from a model or significantly reduce the influence of less important variables.

In this post, we will look at two common forms of regularization and these are.

Ridge

Ridge regression includes a tuning parameter called lambda that can be used to reduce to weak coefficients almost to zero. This shrinkage penalty helps with the bias-variance trade-off. Lambda can be set to any value from 0 to infinity. A lambda set to 0 is the same as least square regression while a lambda set to infinity will produce a null model. The technical term for lambda when ridge is used is the “l2 norm”

Finding the right value of lambda is the primary goal when using this algorithm,. Finding it involves running models with several values of lambda and seeing which returns the best results on predetermined metrics.

The primary problem with ridge regression is that it does not actually remove any variables from the model. As such, the prediction might be excellent but explanatory power is not improve if there are a large number of variables.

Lasso

Lasso regression has the same characteristics as Ridge with one exception. The one exception is the Lasso algorithm can actually remove variables by setting them to zero. This means that lasso regression models are usually superior in terms of the ability to interpret and explain them. The technical term for lambda when lasso is used is the “l1 norm.”

It is not clear when lasso or ridge is superior. Normally, if the goal is explanatory lasso is often stronger. However, if the goal is prediction, ridge may be an improvement but not always.

Conclusion

Shrinkage methods are not limited to regression. Many other forms of analysis can employ shrinkage such as artificial neural networks. Most machine learning models can accommodate shrinkage.

Generally, ridge and lasso regression is employed when you have a huge number of predictors as well as a larger dataset. The primary goal is the simplification of an overly complex model. Therefore, the shrinkage methods mentioned here are additional ways to use statistical models in regression.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s