One problem with least squares regression is determining what variables to keep in a model. One solution to this problem is the use of shrinkage methods. Shrinkage regression involves constraining or regularizing the coefficient estimates towards zero. The benefit of this is that it is an efficient way to either remove variables from a model or significantly reduce the influence of less important variables.
In this post, we will look at two common forms of regularization and these are.
Ridge
Ridge regression includes a tuning parameter called lambda that can be used to reduce to weak coefficients almost to zero. This shrinkage penalty helps with the bias-variance trade-off. Lambda can be set to any value from 0 to infinity. A lambda set to 0 is the same as least square regression while a lambda set to infinity will produce a null model. The technical term for lambda when ridge is used is the “l2 norm”
Finding the right value of lambda is the primary goal when using this algorithm,. Finding it involves running models with several values of lambda and seeing which returns the best results on predetermined metrics.
The primary problem with ridge regression is that it does not actually remove any variables from the model. As such, the prediction might be excellent but explanatory power is not improve if there are a large number of variables.
Lasso
Lasso regression has the same characteristics as Ridge with one exception. The one exception is the Lasso algorithm can actually remove variables by setting them to zero. This means that lasso regression models are usually superior in terms of the ability to interpret and explain them. The technical term for lambda when lasso is used is the “l1 norm.”
It is not clear when lasso or ridge is superior. Normally, if the goal is explanatory lasso is often stronger. However, if the goal is prediction, ridge may be an improvement but not always.
Conclusion
Shrinkage methods are not limited to regression. Many other forms of analysis can employ shrinkage such as artificial neural networks. Most machine learning models can accommodate shrinkage.
Generally, ridge and lasso regression is employed when you have a huge number of predictors as well as a larger dataset. The primary goal is the simplification of an overly complex model. Therefore, the shrinkage methods mentioned here are additional ways to use statistical models in regression.