Machine learning in trading: theory, models, practice and algo-trading - page 3530

 
Maxim Dmitrievsky #:

I admit that there is a rational grain in your research, but it is very difficult to perceive. Perhaps some thesaurus is needed. For example, what is probability structure damage :)

Personally, I've given up and just flip through
 
mytarmailS #:
Personally, I've given up and am just flipping
😀
 
Maxim Dmitrievsky #:

I admit that there is a rational grain in your research, but it is very difficult to perceive. Perhaps some thesaurus is needed. For example, what is probability structure damage :)

Look, in the figure below I have hastily depicted the process of the phenomenon I am talking about.


We have three predictors P0, P1, P2, we have estimated their quantum segments to bias the probability to one of the two classes with respect to the subsample, strong biases to one are marked in green and to zero in blue, red are those that have not passed the estimated threshold - it is not known at the current iteration to which class they should be assigned.

After evaluation we need to choose which quantum segment to exclude (to make a split) - I consider in the example two conditional variants - V1 and V2.

The quantum segment on which the split was made becomes grey in the figure. That quantum segment, which contains the same responses (line indices) as the grey one - is crossed out with a grey line. The loss of examples in the undefined region (red) is not reflected in the figure.

If we choose option V1, we can see that we have two quantum segments, which will lose a part of examples and will have a small probability shift with respect to the subsample when evaluating them - this loss is called Damage. After Damage these quantum segments will fall into the uncertainty zone and become red in colour.

If we choose option V2, we see that only one quantum segment on predictor P1 is damaged, while on predictor P2 a new quantum segment of blue colour has appeared, which has taken a part of examples from green quantum segment and red uncertainty area. This is the kind of gradual opening of the range that I called a structure that is initially hidden but gradually reveals itself, and choosing to just split on the best metric can cause it to become corrupted and fail to reveal itself. This is essentially branching the tree within the predictor on new iterations. Earlier I showed how such quantum splits disappear with each iteration, now I was able to understand the cause and am trying to control this process.

And, although the figure shows the process for quantum segments as a whole (two splits limiting the range of the predictor at once), but a similar process happens in any tree construction, only there one quantum segment is selected, but a group of quantum segments - from and to. And, the choice of a split only on the basis of the best indicators of such metrics as Gini, entropy or logloss can significantly worsen the choice from among the obtained results of split estimation at the subsequent iterations of tree model building.

 
Aleksey Vyazmikin #:

See, in the picture below I've hastily depicted the process of the phenomenon I'm talking about.

So isn't clustering of features the same thing? You divide it into subsamples with similar samples and see what the probability is. You don't have to go into the hollow of the tree.

 
Maxim Dmitrievsky #:

Well, isn't clustering traits the same thing? You split into subsamples with similar samples and see what the probability is. You don't have to go into the hollow of the tree.

Heh.

Here's a look at what the predictor looks like in terms of clusters - from your own sample - see how much overlap there is in the ranges of the predictor? I don't think that's good.

In general, well, different methods, although at first glance something similar happens.

In general, I wanted to show about "damage" in the last post - well, if it's not clear, then fine....

 
Aleksey Vyazmikin #:

If we choose option V2, we see that only one quantum band on predictor P1 is corrupted, while on predictor P2 there is a new quantum band in blue colour, which took part of examples from green quantum band and red uncertainty region. This is the kind of gradual opening of the range that I called a structure that is initially hidden but gradually reveals itself, and choosing to just split on the best metric can cause it to become corrupted and fail to reveal itself. This is essentially branching the tree within the predictor on new iterations. Earlier I showed how such quantum splits disappear with each iteration, now I was able to understand the reason and I am trying to control this process.

Again your ideas about splits/quanta are not true.
The quantum will not appear in the P2 predictor in the form of consecutive examples (which you have in blue).
Each column will have a different row order after sorting (for both quant and split detection) (except for duplicate columns). For example:
P1: 9,6,4,7,1,8,5,0,3,2
P2: 0,2,4,6,8,1,3,5,7,9
So, if you removed the split/quant from P1 with rows 4,7,1, P2 will have the true numbers of those rows removed and they will not be in a row, but chaotic - and cannot form a quantum.
P2: 0,2,4,6,8,1,3,5,7,9

You should make printouts by strings and check your ideas, otherwise you are wasting your time for nothing ... It's obvious that the order of the lines will be different. But apparently it is obvious only when you look through a lot of printouts.

 
Aleksey Vyazmikin #:

Heh...

Here's a look at what the predictor looks like by cluster - based on your own sample - see how much overlap there is in the predictor ranges? I don't think that's good.

In general, well, different methods, although at first glance something similar happens.

In general, I wanted to show about "damage" in the last post - well, if it's not clear, then fine....

So clustering was done by another predictor, and these are just values of other predictors corresponding to clusters.

Accordingly, it is possible to create various complicated schemes based on clustering.

That is, you have before your eyes a ready-made quantum table corresponding to some cluster.

It is spelled Cluster, not Klaster :)
 
Maxim Dmitrievsky #:
It's spelt Cluster, not Klaster :)
I chuckled too)))

Well, it's normal, he has his own terminology :)
 
Maxim Dmitrievsky #:
So the clustering was done on a different predictor, and these are just the values of other predictors corresponding to the clusters.

So the ones that were clustered, they're not in the sample? I don't understand the strangeness of the results myself....

Maxim Dmitrievsky #:
Accordingly, you can create various complicated schemes based on clustering.

Who can argue - you can. It's just that these are two different approaches that do not exclude each other, but in my concept add...

Maxim Dmitrievsky #:
It is spelled Cluster, not Klaster :)

Yes, it probably is in English. The main thing is that you understood what we are talking about and I have not forgotten that after a while.

 
mytarmailS #:
I chuckled too)))

It's okay, he has his own terminology :)

I think you'll have a similar reaction to looking at your finger....