Machine learning is used for a variety of task today with a multitude of algorithms that can each do one or more of these tasks well. In this post, we will look at some of the most common tasks that machine learning algorithms perform. In particular, we will look at the following task.
- Association rules
- Dimension reduction
Numbers 1-3 are examples of supervised learning, which is learning that involves a dependent variable. Numbers 4-6 are unsupervised which is learning that does not involve a clearly labeled dependent variable.
Regression involves understanding the relationship between a continuous dependent variable and categorical and continuous independent variables. Understanding this relationship allows for numeric prediction of the dependent continuous variable.
Example algorithms for regression include linear regression, numeric prediction random forest as well as support vector machines and artificial neural networks.
Classification involves the use of a categorical dependent variable with continuous and or categorical independent variables. The purpose is to classify examples into the groups in the dependent variable.
Examples of this are logisitic regression as well as all the algorithms mentioned in regression. Many algorithms can do both regression and classification.
Forecasting is similar to regression. However, the difference is that the data is a time series. The goal remains the same of predicting future outcomes based on current available data. As such, a slightly different approach is needed because of the type of data involved.
Common algorithms for forecasting is ARIMA even artificial neural networks.
Clustering involves grouping together items that are similar in a dataset. This is done by detecting patterns in the data. The problem is that the number of clusters needed is usually no known in advanced which leads to a trial and error approach if there is no other theoretical support.
Common clustering algorithms include k-means and hierarchical clustering. Latent Dirichlet allocation is used often in text mining applications.
Associations rules find items that occur together in a dataset. A common application of association rules is market basket analysis.
Common algorithms include Apriori and frequent pattern matching algorithm.
Dimension reduction involves combining several redundant features into one or more components that capture the majority of the variance. Reducing the number of features can increase the speed of the computation as well as reduce the risk of overfitting.
In machine learning, principal component analysis is often used for dimension reduction. However, factor analysis is sometimes used as well.
In machine learning, there is always an appropriate tool for the job. This post provided insight into the main task of machine learning as well as the algorithm for the situation.