Machine learning in trading: theory, models, practice and algo-trading - page 3334

 
So the problem is at both ends at once. From one end you do not know your target function, from the other you do not know what errors of approximation by a particular model it has. You need to find the f- and the errors. With only a subsample, often biased.

And you can do all this without multiple oos testing. But there are no inhibitions within the subsample.
 
Forester #:

You have plenty of binary predictors with 0 and 1. They won't divide by 32. But if you normalise them, you may get something with Uniform quantisation. If non-uniform quanta, then just by numbers all distances will be distorted, you need to abs values after normalisation.


Yes, with binary ones it is more complicated. But I don't understand the idea how normalisation can help here.

In general, I guess, it is necessary to reduce dimensionality. But, then it is not exactly what the authors intended. So far I am far from realisation.

Forester #:

There will be an error in prediction if you can't get rid of noise like in training.

It's a different concept - the data is divided into two parts - like "can predict" and "can't predict" - one model is responsible for that. And when new data comes in, it is evaluated whether to make a prediction or not. Thus predictions are made only on data that was "easily" separable and tightly clustered during training, i.e. had a sign of validity.

Forester #:
It doesn't matter if it's tree, forest or bush. If the model prediction is 50% means there will be 50% 0's and 50% 1's in the prediction.

That's not the point, at all. Forest and bousting have forced tree construction, i.e. there is no algorithm to discard if the tree is lousy. In either case, the tree is given weights. It can be lousy because of excessive randomness in the algorithm, both when selecting features and when selecting examples (subsamples).

 
Maxim Dmitrievsky #:
No, I haven't. I'll see what it is tonight.
These methods are model-dependent. The data itself is not split or separated. I don't know how to explain it. I tried it once and got into trouble with the optimisers again. It's in the books.
If you go left here, you lose a horse. If you go right, you lose the two-headed dragon.

That's right - it's a way of isolating examples that degrade learning - that's the theory.

The idea is to train 100 models and see which examples on average "hinder" reliable classification, and then try to detect them with another model.

 

So I took the model and I looked at the leaf count. The model is unbalanced with only 12.2% units. 17k leaves.

I made a markup of leaves into classes - if the sample of responses with target "1" was more than the initial value - 12.2%, then the class is "1", otherwise it is "0". The idea of class here is to have useful information to improve classification.

In the histogram we see the values in the leaves of the model (X) and their % in the model (Y) - without classing them.

0

And here it's the same, but the class is only "0".


The class is only "1."

These coefficients in the leaves are summed and transformed via logit, which means a "+" sign increases the probability of class "1" and a "-" decreases it. Overall the breakdown by class looks valid, but there is bias in the model.

Now we can look at the percentage distribution just (in terms of classification accuracy) - separately for sheets with "1" and with "0".


The histogram for "0" is a huge number of leaves with accuracy near "100%".


And here there is a larger cluster near the initial separation value, i.e. there are a lot of low-informative leaves, but at the same time there are also those near 100%.

Looking at the Recall it becomes clear that these leaves are all leaves with a small number of activations - less than 5% of their class.


Recall for class "0"


Recall for class "1".

Next we can look at the dependence of the weight in the leaf on its classification accuracy - also separately for each class.

00

For target "0"


For target "1".

The presence of linearity, albeit with such a large range, is noteworthy. But the "column" with a probability of 100 is out of logic, spreading very wide over the range of the sheet value.

Maybe this ugliness should be removed?

Also, if we look at the value in the leaves depending on the Recall indicator, we see a small weight in the leaves (near 0), which sometimes has a very large value of responses. This situation indicates that the leaf is not good, but the weight is attached to it. So can these leaves also be considered as noise and zeroed out?

000

For target "0."


For target "1."

I wonder what percentage of leaves on the new sample (not train) will "change" their class?

 

And in addition, a classic - the interdependence of completeness and accuracy.

0

Class 0.


Class one.

Anyway, I'm thinking about how to weigh that....

 

And this is what the model looks like in terms of probabilities.

train

On the train sample - as much as 35% profit starts to be made - like in a fairy tale!


On the test sample - on the range from 0.2 to 0.25 we lose a fat chunk of profit - the points of class maximums are mixed up.


On the exam sample - it is still earning, but it is already corroding the model.

 
Aleksey Vyazmikin #:

I wonder what percentage of leaves on a new sample (not train) will "change" their class?

Yes, I wonder....

________________________

In fact, I found a way to find such features that do not shift with respect to the target neither on the traine nor on the test... But the problem is that such features are catastrophically few and the method of screening itself is wildly expensive in terms of power and in general the method itself is implemented by training without a teacher, only in this way we managed to avoid fitting


 
And what role did quantisation play in this? On a scale of 10.
I went through starfield and it's like the singularity started. I went into a multiverse and met a copy of myself. Now I'm running around in different versions of universes. And there's no way out of it. Now I have to find new meanings.

When the brain or neural network reaches the limits of reasonableness, the singularity begins.
 
Aleksey Vyazmikin #:

That's right - it's a way of highlighting examples that degrade learning - that's in theory.

The idea is to train 100 models and see which examples on average "interfere" with reliable classification, and then try to detect them with a different model.

Divide the main track into 5-10 subtrains, each of which is divided into a track and a shaft. On each, train on the type of cv, then predict on the entire maintrain. You compare the original labels for all the models with the predicted labels. The ones that didn't guess are put on the blacklist. Then you remove all bad examples when training the final model by calculating the average aspiration for each sample. Optionally, you can teach the second model to separate white samples from black samples, either via 3rd class.

3 lines of code, results on the level of... well, I don't have much to compare to... well, on some level.

The kozol here is in cv, meaning you statistically determine which examples are bad and which are good, using multiple models, each trained on different pieces of history. This is called propensity score, that is, the propensity of each sample to play a role in training.

Of course, the labels can be very rubbish, and this approach can remove almost everything. So I used random sampling of transactions back in the beginning to add different markup variants. Given that we don't want to or don't know how to think about how to mark up a chart.

This is roughly what an AMO with kozol elements that searches for TCs on its own should look like.
 
Maxim Dmitrievsky #:
Divide the maintrain into 5-10 subtrains, each of which you divide into a trail and a shaft. Train on each by cv type, then predict on the whole maintrain. You compare the original labels for all the models with the predicted labels. The ones that didn't guess are put on the blacklist. Then you remove all bad examples when training the final model by calculating the average aspiration for each sample. Optionally, you can teach the second model to separate the white samples from the black samples, either through the 3rd class.
.

3 lines of code, results on the level of... well, I have nothing to compare with... well, on some level.

The goat here is cv, meaning you statistically determine which samples are bad and which are good, using multiple models, each trained on different pieces of history. This is called propensity score, that is, the propensity of each sample to play a role in training.

Of course, the labels can be very rubbish, and this approach can remove almost everything. So I used random sampling of transactions back in the beginning to add different markup variants. Given that we don't want to or don't know how to think about how to mark up a chart.

This is roughly what an AMO with kozol elements that searches for TCs on its own should look like.

Labels (teacher, target variable) can NOT be rubbish by definition. The quote is marked up from some considerations external to the predictors. Once the labels have been decided, there is the problem of predictors that are relevant to the set of labels found. It is easy to have a problem that a set of labels is beautiful, but we cannot find predictors for them and have to look for another set of labels. For example, marks are ZZ reversals. Beautiful marks. And how to find predictors for such labels?

As soon as we start filtering labels by predictors - this is super fitting, which is what everything you show here, including the market - does not work on an external, new file in a natural step-by-step mode.