Statistical Learning

Statistical learning is a discipline that focuses on understanding data. Understanding data can happen through classifying or making a numeric prediction which is called supervised learning or finding patterns in data which is called unsupervised learning,

In this post, we will examine the following

  • History of statistical learning
  • The purpose of statistical learning
  • Statistical learning vs Machine learning

History Of Statistical Learning

The early pioneers of statistical learning focused exclusively on supervised learning. Linear regression was developed in the 19th century by  Legendre and Gauss. In the 1930’s, Fisher created linear discriminant analysis. Logistic regression was created in the 1940’s as an alternative the linear discriminant analysis.

The developments of the late 19th century to the mid 20th century were limited due to the lack of computational power. However, by the 1970’s things began  to change and new algorithms emerged, specifically ones that can handle non-linear relationships

In the 1980’s Friedman and Stone developed classification and regression trees. The term generalized additive models were first used by Hastie and Tibshirani for non-linear generalized models.

Purpose of Statistical Learning

The primary goal of statistical learning is to develop a model of data you currently have to make decisions about the future. In terms of supervised learning with a numeric dependent variable, a teacher may have data on their students and want to predict future academic performance. For a categorical variable, a doctor may use data he has to predict whether someone has cancer or not. In both situations, the goal is to use what one knows to predict what one does not know.

A unique characteristic of supervised learning is that the purpose can be to predict future values or to explain the relationship between the dependent variable and another independent variable(s). Generally, data science is much more focused on prediction while the social sciences seem more concerned with explanations.

For unsupervised learning, there is no dependent variable. In terms of a practical example, a company may want to use the data they have to determine several unique categories of customers they have. Understanding large groups of customer behavior can allow the company to adjust their marketing strategy to cater to the different needs of their vast clientele.

Statistical Learning vs Machine Learning

The difference between statistical learning and machine learning is so small that for the average person it makes little difference. Generally, although some may disagree, these two terms mean essentially the same thing. Often statisticians speak of statistical learning while computer scientist speak of machine learning

Machine learning is the more popular term as it is easier to conceive of a machine learning rather than statistics learning.

Conclusion

Statistical or machine learning is a major force in the world today. With some much data and so much computing power, the possibilities are endless in terms of what kind of beneficial information can be gleaned. However, all this began with people creating a simple linear model in the 19th century.

1 thought on “Statistical Learning

Leave a Reply