Cross-validation improves accuracy

Improve ML.NET model accuracy

  • 2 minutes to read

Learn how to improve the accuracy of your model.

Define the problem

Sometimes improving a model has nothing to do with the data or techniques used to train the model. Instead, they may just be asking the wrong question. Look at the problem from different angles and use the data to extract latent indicators and hidden relationships to refine the question.

Provision of further data examples

As with humans, the more algorithms are trained, the more likely they will perform better. One way to improve model performance is to make more training data samples available to the algorithms. The larger the amount of data that an algorithm learns from, the more cases it can correctly identify.

Add context to the data

A single data point can be difficult to interpret. Building context around the data points helps algorithms and experts make better decisions. For example, the fact that a home has three bedrooms is not alone a good indicator of price. However, if you add context and now you know that it is in the suburbs outside of a major metropolitan area, where the median age is 38, the median household income is $ 80,000, and the schools are in the upper 20th percentile, then the algorithm has more Information on which to base his decisions. These contexts can be added as features as input to the machine learning model.

Use meaningful data and features

While more data samples and features can help improve the accuracy of the model, they can also introduce noise because not all data and features make sense. It is therefore important to understand which features have the greatest influence on the decisions of the algorithm. Using techniques like Permutation Feature Importance (PFI) can help identify these important features. And with it, not only can you explain the model, but you can also use the output as a method for selecting features to reduce the number of noisy features that go into the training process.

For more information about using PFI, see Explain Model Prediction Using Permutation Feature Importance (PFI).

Cross validation

Cross-validation is a training and model evaluation technique that divides the data into multiple partitions and trains multiple algorithms on those partitions. This method improves the stability of the model by keeping data from the training process available. Aside from improving performance on undisplayed observations, it can be an effective tool in data-constrained environments to train models on a smaller dataset.

Visit the link below to learn how to use cross-validation in ML.NET.

Hyperparameter optimization

Training machine learning models is an iterative exploration process. For example, what is the optimal number of clusters when training a model with the K-Means algorithm? The answer depends on many factors, such as the structure of the data. To determine this number, you would have to experiment with different values ​​for k and then evaluate performance to see which value is best. The process of optimizing these parameters in order to find an optimal model is called hyperparameter optimization.

Choosing a different algorithm

Machine learning tasks such as regression and classification models contain different algorithm implementations. The problem you are trying to solve and the structure of your data may not be a good fit with the current algorithm. In this case, consider using a different algorithm for your task to see if it learns better with your data.

The following link provides further instructions on how to choose the algorithm.