New article: Evaluation and selection of variables for machine learning models

 

New article Evaluation and selection of variables for machine learning models has been published on mql5.com:

This article focuses on specifics of choice, preconditioning and evaluation of the input variables for use in machine learning models. Multiple methods of normalization and their features will be described here. Important moments of the process greatly influencing the final result of training models will also be revealed. We will have a closer look and evaluate new and little-known methods for determining the informativity and visualization of the input data.

With the "RandomUniformForests" package we will calculate and analyze the importance concept of a variable at different levels and in various combinations, the correspondence of predictors and a target, as well as the interaction between predictors, and the selection of an optimal set of predictors taking into account all aspects of importance.

With the "RoughSets" package we will look at the same issue of choosing predictors from a different angle and based on other concept. We will show that it's not only a set of predictors that can be optimal, a set of examples for training can also be optimized.

All calculations and experiments will be executed in the R language, to be specific - in Revolution R Open 3.2.1 .

OOB error

Fig. 2. Training error depending on the number of trees

Author: Vladimir Perervenko