Machine learning in trading: theory, models, practice and algo-trading - page 3335

 
mytarmailS #:
Yes, interesting.

I inform you that on a separate sample test - 7467, and on exam - 7177, but there is not a small number of leaves with no activations at all - I did not count them at once.

0

This is the distribution of leaves that changed class by their value for the test sample.

1

and this is exam.

And this is the breakdown into classes - there are three of them, the third one is "-1" - no activation.


For the train sample.


For sample test


For the exam sample.

In general, you can see that the leaf weights no longer correspond to the class logic - below is the graph from the test sample - there is no clear vector.


In general, this method of training approximates anything, but it does not guarantee the quality of predictors.

In general, I admit that the distinct "bars" on the graph above are very similar leaves by place and frequency of activation.


mytarmailS #:

In fact, I found a way to find such signs that do not shift in relation to the target neither on the traine nor on the test... But the problem is that such signs are catastrophically few and the method of screening is wildly expensive in terms of power and in general the method is implemented by training without a teacher, only in this way I managed to avoid fitting.

It's hard to discuss what you don't know. Therefore, I can only be happy for your success. If I had such a method, I would use it :)

My method, so far, does not give such qualitative results, but it parallels well enough.

 
Maxim Dmitrievsky #:
And what role did quantisation play in this? On a 10-point scale

It's hard to completely isolate thought processes.

There are problems on various fronts - so looking at what can be improved with less effort and more results. Periodically jumping from "data" to "learning process" and experimenting.

The original idea is to estimate correlation, but I haven't found any ready-made methods, I'm modernising mine. I think that if leaves are similar, they distort the estimation.

Maxim Dmitrievsky #:
I've passed Starfield and there shortly singularity began. I got into the multiverse and met a copy of myself. Now I'm running around in different versions of universes. And there's no way out of it. Now I have to find new meanings.

When the brain or neural network reaches the limits of reasonableness, the singularity begins.

That's an interesting idea. This game, maybe I'll play it sometime later I treat games as creativity, games are much slower to become outdated graphically now.

I ran God of War (2018) on an old HD7950 graphics card (threw it in a separate computer, which is purely for calculations) under ten, put the graphics to the minimum and just shocked by the picture. But the main interest is the elaboration of the relationship between father and son - it is difficult to find analogues in the computer industry, where this topic is raised.

Maxim Dmitrievsky #:
Divide the main track into 5-10 subtrains, each of which is divided into a track and shaft. On each you train on type cv, then predict on the whole main track. You compare the original labels for all the models with the predicted labels. The ones that didn't guess are put on the blacklist. Then you remove all bad examples when training the final model by calculating the average aspiration for each sample. Optionally, you can teach the second model to separate white samples from black samples, either via 3rd class.
.

3 lines of code, results on the level of... well, I have nothing to compare with... well, on some level.

The goat here is cv, meaning you statistically determine which samples are bad and which are good, using multiple models, each trained on different pieces of history. This is called propensity score, that is, the propensity of each sample to play a role in training.

Of course, the labels can be very rubbish, and this approach can remove almost everything. That's why I used random sampling of transactions back in the beginning to add different markup variants. Given that we don't want to or don't know how to think about how to mark up a chart.

This is roughly what an AMO with kozol elements that searches for TCs on its own should look like.

But here we also work with data through models. Or do you see any difference?

 
Aleksey Vyazmikin #:

But it's also working with data through models. Or do you see any difference?

It's kind of automatic, you don't have to think up and (importantly) do anything :)
 
Maxim Dmitrievsky #:
Well, like on automatic, it is not necessary to think up and (that is important) to do anything :)

Taking into account excessive randomness at CatBoost in the method of training - it is difficult to evaluate the approach itself. There they interfere with strings when building a tree, and they feed data with bachami, well, if it is not forbidden all ...

It is interesting to evaluate how many sheets change classes on new data, by analogy, as I wrote above in the branch. This could be a metric of the quality of the approach/model.

 
Aleksey Vyazmikin #:

Taking into account excessive randomness of CatBoost in the training method itself, it is difficult to evaluate the approach itself. There they interfere with strings when building a tree, and feed data with bacham, well, if it is not forbidden...

It is interesting to evaluate how many sheets change classes on new data, by analogy, as I wrote above in the branch. This could be a metric for the quality of the approach/model.

Ideally, this randomness is not as bad as the dataset randomness
 
Maxim Dmitrievsky #:
Ideally, this randomness is not as bad as the randomness in the dataset

It is possible to get a beautiful model by chance from randomness - that's the trouble, but if it were impossible, it wouldn't matter.

It is not a problem to train a model - the problem is to choose the one that has more potential to work correctly on new data.

This is the interesting approach that allows increasing this potential. And to evaluate the effectiveness of the model, we need some kind of metric, not just a statistic of classification accuracy, but something else, for example, the evaluation of leaves individually. It is clear that the indicators in the predictors change - that is why the leaves are so noisy, "changing" the actual classes. That's why it's a complex task - you need good labels and stable predictors, and combinations of them should not create leaves with rare values in the model.

In production, it is already required to monitor changes in the distribution of values of predictors used in the model and to brake the model if the changes are significant and extensive. However, this approach requires accumulation of statistics, which for us equals accumulation of losses, and this is not good. We need a faster method to exclude the model, but a reasonable one, not purely based on drawdown.

There are a lot of problems, and without solving them, we do not want to give money to the model for management.

 
Aleksey Vyazmikin #:

Yes, binary is more complicated. But I don't get the idea how rationing can help here.

Binary sign with 0 and 1 is already normalised, and the rest should be normalised too.

 
Forester #:

The binary feature with 0and1 is already normalised, and the others need to be normalised as well.

I hope I understand your thought.

But, at uniform quantisation by the same 32 segments, we can consider that segment "1" is 0 and segment "32" is 1. And so it is with any other digits. That's why I don't understand what is the fundamental difference here.

 
Aleksey Vyazmikin #:

Understood your train of thought, I hope.

But, at uniform quantisation on the same 32 segments, it is possible to consider that segment "1" is 0, and segment "32" is 1. And so it is with any other digits. That's why I don't understand what is the fundamental difference here.


If you reduce to 32, then stretch the binary 0 and 1 to 0 and 32 (and others, for example, with 5 quanta from 0...5 to 0...32). To make everything proportional. Or classically compress everything to a single hypercube (as for neural networks, which require normalisation.) The essence is the same - in both variants we get the same scale.

 
СанСаныч Фоменко #:

Labels (teacher, target variable) can NOT be rubbish by definition.

Sanych, don't embarrass yourself

You haven't even started studying to express your opinion.