Decision trees are used for classifying examples into distinct classes or categories. Such as pass/fail, win/lose, buy/sell/trade, etc. However, as we all know, categories are just one form of outcome in machine learning. Sometimes we want to make numeric predictions.
The use of trees in making predictions numeric involves the use of regression trees or model trees. In this post, we will look at each of these forms of numeric prediction with the use of trees.
Regression Trees and Modal Trees
Regression trees have been around since the 1980’s. They work by predicting the average value of specific examples that reach a given leaf in the tree. Despite their name, there is no regression involved with regression trees. Regression trees are straightforward to interpret but at the expense of accuracy.
Modal trees are similar to regression trees but employs multiple regression with the examples at each leaf in a tree. This leads to many different regression models being used to split the data throughout a tree. This makes model trees hard to interpret and understand in comparison to regression trees. However, they are normally much more accurate than regression trees.
Both types of trees have the goal of making groups that are as homogeneous as possible. For decision trees, entropy is used to measure the homogeneity of groups. For numeric decision trees, the standard deviation reduction (SDR) is used. The detail of SDR are somewhat complex and technical and will be avoided for that reason.
Strengths of Numeric Prediction Trees
Numeric prediction trees do not have the assumptions of linear regression. As such, they can be used to model non-normal and or non-linear data. In addition, if a dataset has a large number of feature variables, a numeric prediction tree can easily select the most appropriate ones automatically. Lastly, numeric prediction trees also do not need the the model to be specific in advance of the analysis.
Weaknesses of Numeric Prediction Trees
This form of analysis requires a large amounts of data in the training set in order to develop a testable model. It is also hard to tell which variables are most important in shaping the outcome. Lastly, sometimes numeric prediction trees are hard to interpret. This naturally limits there usefulness among people who lack statistical training.
Numeric prediction trees combine the strength of decision tress with the ability to digest la large amount of numerical variables. This form of machine learning is useful when trying to rate or measure something that is very difficult to rate or measure. However, when possible, it is usually wise to allows try to use simpler methods if permissible.