Discussing the article: "Cross-validation and basics of causal inference in CatBoost models, export to ONNX format"
However, what a catch this proposal comes with! If the models are ranked by error rate and the best ones are taken from them, it is again an overtraining.
1) I would like to see the performance of the model on the third sample, which was neither a trace nor a test and was not involved in any way in the creation and selection of the model.
2) Noise detection and label relabelling or meta-labelling was described by Vladimir in his 2017 article where he used the NoiseFiltersR package for this purpose.
- www.mql5.com
1) I would like to see the performance of the model on the third sample, which was neither a t-train nor a test and was not involved in any way in the creation and selection of the model.
2) Noise detection and label relabelling or meta-labelling was described by Vladimir in his 2017 article where he used the NoiseFiltersR package for this purpose.
The bot is attached to the article
It describes a few of tens or hundreds of similar methods, there is no desire to delve into each of them, especially without verifying the results. I'm more interested in self-designs and testing them immediately, now converting to ONNX allows this to be done even faster. The core approach is easy to add/rewrite without changing the rest of the code, which is also very cool. This example of finding bugs via cv has a flaw that doesn't allow to talk about causal inference fully, so this is an introduction. I'll try to explain it some other time.
The article is useful even already because it is a ready-made solution for experimenting with MO. The functions are optimised and work fast.More MO's are only welcome :) I'm an amateur too.
.
int k = ArraySize(Periods) - 1; for(int i = 0; i < ArraySize(Periods); i++) { f[i] = features[i]; k--; }It should be
f[k] = features[i];Why reverse the order at all?
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
You agree to website policy and terms of use
Check out the new article: Cross-validation and basics of causal inference in CatBoost models, export to ONNX format.
The article proposes the method of creating bots using machine learning.
Just as our conclusions are often wrong and need to be verified, the results of predictions from machine learning models should be double-checked. If we turn the process of double-checking on ourselves, we get self-control. Self-control of machine learning model comes down to checking its predictions for errors many times in different but similar situations. If the model makes few errors on average, it means it is not overtrained, but if it makes mistakes often, then there is something wrong with it.
If we train the model once on selected data, then it cannot perform self-control. If we train a model many times on random subsamples, and then check the quality of the prediction on each and add up all the errors, we get a relatively reliable picture of the cases where it actually turns out to be wrong and the cases it often gets right. These cases can be divided into two groups and separated from each other. This is similar to conducting walk-forward validation or cross-validation, but with additional elements. This is the only way to achieve self-control and obtain a more robust model.
Therefore, it is necessary to conduct cross-validation on the training dataset, compare the model’s predictions with training labels and average the results across all folds. Those examples that were predicted incorrectly on average should be removed from the final training set as erroneous. We should also train a second model on all the data, which distinguishes well-predictable cases from poorly predictable ones, allowing us to cover all possible outcomes more fully.
Author: Maxim Dmitrievsky