Machine learning in trading: theory, models, practice and algo-trading - page 1297

 
Aleksey Vyazmikin:

Interesting saying. However, subsequent trees are built in order to reduce error from existing tree composition, but I don't understand why they don't use sampling then, tell me in more detail, maybe I don't understand something deep...

Aren't you there by any chance the leaves from boosting trees with number > 1, feeding the input data to them? If so, the result should be random, because those trees were not learning from input data, but from errors. That is, you are solving a problem that the tree and its leaves were not trained to solve.
If those trees were from a random forest - then they are all trained on the raw data and you can use them (but there is no point, because 1 tree is strongly inferior to the forest in terms of error). In boosting - no, because 1 tree without all the others makes no sense to consider.

 
elibrarius:

Are you there by any chance the leaves from boosting trees with number > 1, by feeding them input data? If so, the result should be random, because those trees were not trained on the raw data, but on errors. That is, you are solving a problem that the tree and its leaves were not trained to solve.
If those trees were from a random forest - then they are all trained on the raw data and you can use them (but there is no point, because 1 tree is strongly inferior to the forest in terms of error). In boosting - no, because 1 tree without all the others makes no sense to consider.

So this follows from the definition of boosting as a sequential method of improvement, where each successive algorithm seeks to compensate for the shortcomings of the composition of the previous ones.
 
elibrarius:

Yes, in order to reduce the error, they take exactly the error as the target, then subtract it.

Here is the algorithm of boosting, I'm just studying it myself https://neurohive.io/ru/osnovy-data-science/gradientyj-busting/.


I understand this is classic boosting. Maybe catbust has something of its own...

I took a look at the article. Yes, it does get there, as I understand it, that a tree is built, applied on the sample and delta between actual targets and predicted targets is calculated, then the next tree is built in order to reduce error, i.e. to predict delta. But, in fact, new trees are built the same way on the same sample and new and new connections are formed, i.e. actually only the target changes. But such approach gives an opportunity to find new connections (leaves) which will not be obvious at a random forest, and these connections depend on the first tree, and it depends on sample (which is not news), but the subsequent trees in CatBoost will be constructed either to the set number of iterations, or to the sign of stopping, and this sign is defined on a test sample. The sign is any estimated indicator of model quality (there is a list of different indicators). I choose to stop training exactly on the sign, because I would like to get improvement on two samples at once, and if it is only on the training sample, then it is clearly overtraining. This is the reason for the question of sample size as directly affecting learning, i.e. if even the size of the training sample is constant, then the size of the test sample will affect learning.


elibrarius:

Are you there by any chance the leaves from boosting trees with number > 1, by feeding the initial data to them? If so, then the result should be random, because those trees were not learning from raw data, but from errors. That is, you are solving a problem that the tree and its leaves were not trained to solve.
If those trees were from a random forest - then they are all trained on the raw data and you can use them (but there is no point, because 1 tree is strongly inferior to the forest in terms of error). In boosting - no, because 1 tree without all the others makes no sense to consider.

Good point, I'm just preparing a platform to study leaves (in catbust it's more like binary trees). The result may be negligible if there are very many trees, but there may well be decent connections, even in theory, if the first tree had a large error in its leaf, and the fourth tree corrected this error in its leaf, then in fact the new connection will make logical sense and will itself correctly classify the sample. Technically, catbust has an array with the result of each binary tree(leaf) response, then these responses are added up, but the trick is that per sample line only a small fraction of binary trees(leaves) give an answer. So theoretically it is possible in general to fork out (zero out) the binary trees (leaves), which have a very low predictive ability, because they are either trees with an initial error or small fits (on the fact retraining) and leave leaves only with meaningful values. Another direction is to use these connections to assess the importance of predictors, where there is a large weight in the final array, those connections, and hence predictors are significant, the rest sift out by threshold, as less significant. If we perform such sifting, the model can first be trained on more significant predictors, and then to train on less significant predictors, which should improve the training result, since less significant predictors will not prevent the construction of more stable relations, and will only contribute to them when possible.

As a result, garbage binary trees are really not of interest for fitting, and micro-ansamples (2-3 binary trees (leaves)) with a common high weight or single binary trees are also of great importance and can be used separately for classification.

Unfortunately, I have no mechanism for pulling individual binary trees (in the usual sense of leaves) and converting them into a normal readable rule, so it's still only in theory, but I am open for cooperation.

 
Aleksey Vyazmikin:

Unfortunately, I don't have a mechanism for pulling individual binary trees (in the normal understanding of leaves) and converting them into a normal readable rule, so everything is just in theory, but I'm open to cooperation.

And how do you pull the model in mt5? If I want to trade it or run it in tester. I've been working on this for a long time but i've got no idea, i should either use Python and use mt5 or kathbust binary.

 
Maxim Dmitrievsky:

How do you draw the model in MT5? What to trade there or run in the tester. I'm thinking about the best way to build it, until I figure it out: either I should switch completely to Python and connect it to mt5 or use kathbust binary.

I will convert a model for C++ to MQL5 - in fact, only arrays are taken there, and there is an interpreter of this model in MQL (not my code). Accordingly the models are loaded in the Expert Advisor and now I can load hundreds of models through the file and look them in the terminal at once, including the run through the optimizer.

 
Aleksey Vyazmikin:

I converted a model for C++ to MQL5 - in fact, only arrays are taken there, and there is an interpreter of this model in MQL (the code is not mine). Accordingly, the models are loaded into the Expert Advisor and now I can load hundreds of models through the file and look them in the terminal, as well as run them through the optimizer.

If I had an idea for an article with some framework and some idea (the idea should be no less than space), and what help is needed, or some kind of co-op

I understand that the community is divided: some pull, others on the contrary generalize. I. for example, do not agree with this approach, can be just not fully grasped the idea
 
Maxim Dmitrievsky:

well here is an article would write with some framework and idea (the idea should be no less than space), and what help is needed, or co-op what is there

I don't have deep theoretical knowledge for articles, I make up different concepts and change interpretations of established phenomena - it's not an academic approach.

I think an interpreter of the model would be interesting, but I can't publish it because the code is not written by me.

And everything left in theory, with code that cannot be applied (because of closed classes), I think, will not be interesting. And the process of model creation and selection, I think all solved, and there is no interest.

 
Maxim Dmitrievsky:

I understand that the community is divided: some are pulling out, while others are generalizing. I, for one, don't agree with this approach, maybe I just didn't fully grasp the idea.

And I don't know about the community, i.e., I don't know how other people in other fields do it?

Pulling data seems logical to me, because I'm looking for a model of human behavior (or algorithm) with the help of MO, there can be many such patterns of behavior and they can be independent, so it makes sense to pull as much as we can, because it is impossible to generalize them all together. And for someone the market is something whole, the result of collective mind work, some kind of a voting organ with no rules. Apparently, for this situation they are looking for a single model describing the market behavior as a separate organism.

 
Aleksey Vyazmikin:

if the first tree had a large error in its leaf, and the fourth tree corrected this error in its leaf, then in fact the new link will make logical sense and will itself correctly classify the sample.

I'm not sure about this, 4th tree corrects errors of first tree with its leaves. I think that only in pairs they make sense. But I could be wrong. Since I have not experimented with such things.

 
elibrarius:

I'm not sure about that, the 4th tree corrects errors of the first one with its leaves. I think they only make sense as a pair. But I could be wrong. Since I have not experimented with such things.

If to be quite primitive, the first tree has no response to the sample, it returns zero, and the fourth tree has this response and considers "probability" 0.6 - technically we corrected the error of the first tree, but in fact revealed a connection that previously did not exist at all. Even if we imagine that all trees share the entire sample (which is not the case, by all appearances), then let it be 0.1, not zero, and the subsequent tree has 0.5, then the same effect here. There though probability is not exactly probability, the values in the array are then converted to something similar to probability only after all values of activated binary trees are added together.