For many, especially beginners, making a machine learning model is difficult enough. Trying to understand what to do, how to specify the model, among other things, is confusing in itself. However, after developing a model it is necessary to assess ways in which to improve performance.
This post will serve as an introduction to understanding how to improving model performance. In particular, we will look at the following
- When it is necessary to improve performance
- Parameter tuning
When to Improve
It is not always necessary to try and improve the performance of a model. There are times when a model does well and you know this through the evaluating it. If the commonly used measures are adequate there is no cause for concern.
However, there are times when improvement is necessary. Complex problems, noisy data, and trying to look for subtle/unclear relationships can make improvement necessary. Normally, real-world data has the problems so model improvement is usually necessary.
Model improvement requires the application of scientific means in an artistic manner. It requires a sense of intuition at times and also brute trial-and-error effort as well. The point is that there is no singular agreed upon way to improve a model. It is better to focus on explaining how you did it if necessary.
Parameter tuning is the actual adjustment of model fit options. Different machine learning models have different options that can be adjusted. Often, this process can be automated in r through the use of the “caret” package.
When trying to decide what to do when tuning parameters it is important to remember the following.
- What machine learning model and algorithm you are using for your data.
- Which parameters you can adjust.
- What criteria you are using to evaluate the model
Naturally, you need to know what kind of model and algorithm you are using in order to improve the model. There are three types of models in machine learning, those that classify, those that employ regression, and those that can do both. Understanding this helps you to make a decision about what you are trying to do.
Next, you need to understand what exactly you or r are adjusting when analyzing the model. For example, for C5.0 decision trees “trials” is one parameter you can adjust. If you don’t know this, you will not know how the model was improved.
Lastly, it is important to know what criteria you are using to compare the various models. For classifying models you can look at the kappa and the various information derived from the confusion matrix. For regression-based models, you may look at the r-square, the RMSE (Root mean squared error), or the ROC curve.
As you can perhaps tell there is an incredible amount of choice and options in trying to improve a model. As such, model improvement requires a clearly developed strategy that allows for clear decision-making.
In a future post, we will look at an example of model improvement.