Is there a pattern to the chaos? Let's try to find it! Machine learning on the example of a specific sample. - page 22

 
Aleksey Vyazmikin #:
But I got this model

.

There is no need to hope that the best model on the exam will be profitable in the future. The average or majority should be profitable.

It's just like in the tester optimiser - the best models will be plum on the forward 99% of the time.

 
elibrarius #:

Splits are made only up to the quantum. Everything inside the quantum is considered to be the same value and is not split further.

You have not understood - the point is that each split reduces the sample for the next split, which occurs according to the quantum table, but the metric will change each time.

Well there are algorithms that make a new quantum table after each split when training the model.

elibrarius #:

I don't understand why you are looking for something in quantum, its primary purpose is to speed up calculations (secondary purpose is to load/generalise the model so that there is no further splitting, but you can also just limit the depth of float data) I don't use it, I just do models on float data. I did quantisation on 65000 parts - the result is absolutely the same as the model without quantisation.

Apparently I see the efficiency, that's why I use it. 65000 parts is too much, I see the point of quantisation is to generalise the data to create a categorical feature, so it is desirable that 2%-5% of the whole sample should be quantised. It is possible that this is not true for all predictors - experiments have not been completed.

elibrarius #:

There will be 1 split that divides the data into 2 sectors - one has all 0's, the other has all 1's. I don't know what is called quanta, I think that quanta is the number of sectors obtained after quantisation. Perhaps it is the number of splits, as you mean.

Yes it is clear, you are right about splitting, I smiled rather. In general there is a concept of quantum table at CatBoost, there exactly splits, and I for myself use segments - two coordinates, and perhaps they can be called quanta or quantum segments. I do not know the real terminology, but I call them so for myself.

 
elibrarius #:

There is no need to hope that the best model on the exam will be profitable in the future. The average or most should be profitable.

It's just like in the tester optimiser - the best models will be plum on the forward 99% of the time.

The goal now is to understand the potential to which we can aspire. I will not trade on these models.

And I expect the number of models selected to increase due to reduced variability in split selection - we'll see later today.

 
Aleksey Vyazmikin #:

And I expect the number of models that have been screened to increase due to reduced variability in split selection - we'll see later today.

Turns out I was wrong - the number of models is only 79, average profit on exam -1379

 
elibrarius #:

There is no need to hope that the best model on the exam will be profitable in the future. The average or most should be profitable.

It's just like in the tester optimiser - the best models will be plum on the forward 99% of the time.

By the way, I decided to look at the sample of another one, which was also not in training - the one that was cut earlier.

And here is what the same model looks like on this data (2014-2018).

Balance

I think it's not bad, at least not a 45 degree plum. I.e. can we still expect a good model to continue to be good?

 
Aleksey Vyazmikin #:

By the way, here I decided to look at a sample of the other one that also wasn't in training - the one that was cut earlier.

And the same model looks like this on this data (2014-2018).

I think it's not bad, at least not a 45 degree plum. I.e. can we still expect a good model to continue to be good?

maybe)

 
elibrarius #:

perhaps)

Alas, I checked on all models - those that earned more than 3000 on train and exam sample - 39 pieces were, on new- old sample only 18 (46%) showed profitable result. This is certainly more than 1/3, but still not enough.

This is the difference in the balances of the selected models between the regular exam sample and the discarded one (2014-2018).

 
Aleksey Vyazmikin #:

Alas, I checked on all models - those that earned more than 3000 on the train and exam sample - 39 pieces were, on the new-old sample only 18 (46%) showed profitable results. This is certainly more than 1/3, but still not enough.

This is the difference in the balances of the selected models between the regular exam sample and the discarded one (2014-2018).

In general, it is not even 50/50 yet (in terms of profit). If it is already difficult to come up with new features related to the target, maybe the target should be changed?
 
elibrarius #:
In general, even 50/50 does not work out yet (in terms of profit). If it is difficult to come up with new features related to the target, maybe the target should be changed?

New predictors can be invented, there are still ideas, but I'm not sure that training will be based on them taking into account the principle of greed.... Perhaps we need to change the approach to model training, to make our own transformations of known algorithms.

The target can be changed, but to what, any ideas?

 

I took the sample from the sixth step I described here and swapped exam and test.

In fact, the training was carried out according to the same rules, with the same seeds, but another sample - later in chronology - was responsible for stopping the creation of new trees.

As a result, the average profit value on the test (former exam) sample is -730.5 - let me remind you that during the chronological training on the test sample the average value was 982.5, and on the exam (former test ) sample the average balance value was 922.49 points, while in the initial variant it was -1114.27 points.

Figure 1 Histogram of the balance distribution of the original test sample when used as the exam sample.

Figure 2 Histogram of the distribution of the balance of the test sample when used as the exam sample.

When the samples were chronologically arranged, the average value of trees in the model was 11.47, when the sequence of two samples was changed, the average value of trees in the model was 9.11, i.e. it can be said that the patterns became less apparent after the samples were swapped, so fewer trees were needed to describe them.

At the same time, due to the control of stopping by actually sampling the patterns on it became more qualitative, and, as I indicated above, on average became more profitable.

On the one hand, the experiment confirms that samples contain similar patterns that last for years, but at the same time some of them become less pronounced or even shift their probability to the negative zone of the event outcome. It was previously revealed that not only the predictors themselves, but also their use in the model influence the result of training.

As a result, what we have:

1. An unrepresentative sample.

2. Random patterns that may "overshadow" the stable ones when building the model, or the method of model building itself is not reliable enough.

3. Dependence of the model result on the sample area (former sample train showed good results in the role of exam).