Machine learning in trading: theory, models, practice and algo-trading - page 3481

 
Aleksey Vyazmikin #:

I don't even know if there were already synonyms in his native language back then.....

Synonyms are somewhere in poetry, in novels. But in the exact sciences, there are no synonyms. Although there are plenty of dilettantes who don't know the exact meaning of terms.... and start talking rubbish.

 
Aleksey Vyazmikin #:

So I thought, and ordered the values by probability shift after quantisation, training became 7 times faster - instead of 400 trees - only 60, but the financial result on the other two samples became much worse. It turns out that due to the haus of the class membership probability distribution, randomisation makes learning slightly better.

This is the result of your mixing. In essence, you have introduced additional noise. Just like the Features Permutation Importance method of estimating predictors, which shuffles a column, thereby making it noise. You also shuffled it, but in blocks/quanta.

 
СанСаныч Фоменко #:

Synonyms are somewhere in poetry, in novels. But in the exact sciences, there are no synonyms. Although there are plenty of amateurs who don't know the exact meaning of terms. and start talking rubbish.

Even if you don't talk rubbish, but something clever, you can't understand it.

 
Forester #:

That's the result of your stirring. In essence, you've introduced additional noise. Just like the Features Permutation Importance method of estimating predictors, which shuffles a column, thereby making it noise. You also shuffled it, but in blocks/quanta.

What does that have to do with noise? I simplified the training for the algorithm. The algorithm uses fewer splits/trees to arrive at the "same" result.

 
СанСаныч Фоменко #:

0.57/0.448 = 1.2723, i.e. 27% difference. the model can be discarded.

I might agree if we were talking about stationary systems and had a representative sample. Otherwise, it is an empty heuristic that can be easily fitted.

 
Aleksey Vyazmikin #:

What does noise have to do with it? I simplified the training for the algorithm. The algorithm started using fewer splits/trees to arrive at the "same" result.

Shuffling the column makes it noisy. You wrote yourself that the financial result on the other two samples is significantly worse

What is the same result?

 
Forester #:
shuffling the column makes it noisy. You wrote yourself that the financial result on the other two samples is significantly worse

How is that the same result?

The noise would be with a random change. Here we are, in effect, renaming the information. As you know, CatBoost does splits on a quant table obtained once. I.e. we get discrete polyhedral (by number of dimensions) cubes with content range of values. The tree deals with the fact that it groups them by one of the faces or set - as they go in order.

The cubes are originally scattered with no order, and I just grouped them immediately.

As I have shown earlier, the probability of choosing the right cube is within 20% in this sample. So, it turns out that with a complex structure of dice arrangement you have to make more iterations than with an ordered one, which accidentally allows you to find some complex dependencies, but worsens the efficiency of learning. The efficiency here is the increase of logloss index from iteration to iteration.

As for the result - on the train sample everything is quite good even in financial terms, but on the test and exam samples the model very rarely produces a probability greater than 0.5, so there are mostly zeros.

I will try to reduce the learning rate, but the nature of this phenomenon is not quite clear yet, because the logloss results are comparable.

 
Aleksey Vyazmikin #:

The noise could be due to random modification. Here we, in fact, rename the information. As it is known, CatBoost makes splits on the quantum table received once. That is, we get discrete polyhedral (by the number of dimensions) cubes with the contents of the range of values. The tree is engaged in that Groups them by one of faces or by set - as they go in order.

The cubes/quanta are initially sorted among themselves. You change their order, i.e. you shuffle them. The OOS clearly shows you that. No pattern is found. And traine will learn well on any rubbish.

 
Forester #:

The cubes/quanta are initially sorted among themselves. You change their order, i.e. you mix them. The OOS is clearly showing you that. No patterns are found. And traine will learn well on any rubbish.

Learning is the rules for selecting cubes. Their order is set by the predictor algorithm. If I swap them around, the information isn't lost. My algorithm finds all the cubes, and for the tree it becomes an important change, because the algorithm works not with one cube, but with a group of cubes at once, and for it the content of the group of cubes has changed. By splitting groups into subgroups it will choose different changes to cut through the split, since the group statistics have changed.

We need a more pattern-rich sample to test this hypothesis.

 
Aleksey Vyazmikin #:

Training, these are the rules for dice allocation. Their order is set by the predictor algorithm. If I swap them around, the information is not lost. My algorithm finds all the cubes, and for the tree it becomes an important change, because the algorithm works not with a single cube, but with a group of cubes at once, and the content of the group of cubes has changed for it. By splitting groups into subgroups, it will choose different changes to cut through the split, since the group statistics have changed.

We need a more pattern-rich sample to test this hypothesis.

The information is mixed/randomised. Like permutation. Only there they don't mix in groups, but specifically each element in a column... which kind of turns off the predictor and then compare how much the model result has changed, and this is the estimation of column importance.

It's up to you what to spend your time on. Nothing more to say on this topic.
Reason: