In evaluating a model when employing machine learning techniques, there are three common types of data used for evaluation.

- The actual classification values
- The predicted classification values
- The estimated probability of the prediction

The first two types of data (actual and predicted) are used for assessing the accuracy of a model in several different ways such as error rate, sensitivity, specificity, etc.

The benefit of the probabilities of prediction is that it is a measure of a model’s confidence in its prediction. If you need to compare to models and one is more confident in it’s prediction of its classification of examples, the more confident model is the better learner.

In this post, we will look at examples of the probability predictions of several models that have been used in this blog in the past.

**Prediction Probabilities for Decision Trees**

Our first example come from the decision tree we made using the C5.0 algorithm. Below is the code for calculating the probability of the correct classification of each example in the model followed by an output of the first

```
Wage_pred_prob<-predict(Wage_model, Wage_test, type="prob")
```

```
head(Wage_pred_prob)
female male
497 0.2853016 0.7146984
1323 0.2410568 0.7589432
1008 0.5770177 0.4229823
947 0.6834378 0.3165622
695 0.5871323 0.4128677
1368 0.4303364 0.5696636
```

The argument “type” is added to the “predict” function so that R calculates the probability that the example is classified correctly. A close look at the results using the “head” function provides a list of 6 examples from the model.

- For example 497, there is a 28.5% probability that this example is female and a 71.5% probability that this example is male. Therefore, the model predicts that this example is male.
- For example 1322, there is a 24% probability that this example is female and a 76% probability that this example is male. Therefore, the model predicts that this example is male.
- etc.

**Prediction Probabilities for KNN Nearest Neighbor**

Below is the code for finding the probilities for KNN algorithm.

College_test_pred_prob<-knn(train=College_train, test=College_test, + cl=College_train_labels, k=27, prob=TRUE)

`College_test_pred_prob`

The print for this is rather long. However, you can match the predict level with the actual probability by looking carefully at the data.

- For example 1, there is a 77% probability that this example is a yes and a 23% probability that this example is a no. Therefore, the model predicts that this example as yes.
- For example 2, there is a 71% probability that this example is no and a 29% probability that this example is yes. Therefore, the model predicts that this example is a no.

**Conclusion **

One of the primary purposes of the probabilities option is in comparing various models that are derived from the same data. This information combined with other techniques for evaluating models can help a researcher in determining the most appropriate model of analysis.

Pingback: Using Probability of the Prediction to Evaluate...

Pingback: Improving the Performance of Machine Learning Model | educational research techniques

Pingback: Developing an Automatically Tuned Model in R | educational research techniques