Using Probability of the Prediction to Evaluate a Machine Learning Model

In evaluating a model when employing machine learning techniques, there are three common types of data used for evaluation.

  • The actual classification values
  • The predicted classification values
  • The estimated probability of the prediction

The first two types of data (actual and predicted) are used for assessing the accuracy of a model in several different ways such as error rate, sensitivity, specificity, etc.

The benefit of the probabilities of prediction is that it is a measure of a model’s confidence in its prediction. If you need to compare two models and one is more confident in its prediction of its classification of examples, the more confident model is the better learner.

In this post, we will look at examples of the probability predictions of several models that have been used in this blog in the past.

Prediction Probabilities for Decision Trees

Our first example comes from the decision tree we made using the C5.0 algorithm. Below is the code for calculating the probability of the correct classification of each example in the model followed by an output of the first

Wage_pred_prob<-predict(Wage_model, Wage_test, type="prob")
 head(Wage_pred_prob)
        female      male
497  0.2853016 0.7146984
1323 0.2410568 0.7589432
1008 0.5770177 0.4229823
947  0.6834378 0.3165622
695  0.5871323 0.4128677
1368 0.4303364 0.5696636

The argument “type” is added to the “predict” function so that R calculates the probability that the example is classified correctly. A close look at the results using the “head” function provides a list of 6 examples from the model.

  • For example 497, there is a 28.5% probability that this example is female and a 71.5% probability that this example is male. Therefore, the model predicts that this example is male.
  • For example 1322, there is a 24% probability that this example is female and a 76% probability that this example is male. Therefore, the model predicts that this example is male.
  • etc.

Prediction Probabilities for KNN Nearest Neighbor

Below is the code for finding the probabilities for KNN algorithm.

College_test_pred_prob<-knn(train=College_train, test=College_test, 
+                        cl=College_train_labels, k=27, prob=TRUE)
College_test_pred_prob

The print for this is rather long. However, you can match the predicted level with the actual probability by looking carefully at the data.

  • For example 1, there is a 77% probability that this example is a yes and a 23% probability that this example is a no. Therefore, the model predicts that this example as yes.
  • For example 2, there is a 71% probability that this example is no and a 29% probability that this example is yes. Therefore, the model predicts that this example is a no.

Conclusion 

One of the primary purposes of the probabilities option is in comparing various models that are derived from the same data. This information combined with other techniques for evaluating models can help a researcher in determining the most appropriate model of analysis.

3 thoughts on “Using Probability of the Prediction to Evaluate a Machine Learning Model

  1. Pingback: Using Probability of the Prediction to Evaluate...

  2. Pingback: Improving the Performance of Machine Learning Model | educational research techniques

  3. Pingback: Developing an Automatically Tuned Model in R | educational research techniques

Leave a Reply