Machine learning in trading: theory, models, practice and algo-trading - page 2114

 
Aleksey Vyazmikin:

Maxim, how do you set this thing up?

What is id_tl ?

I don't know, I need a link.

Probably the id_tl of the transformed examples simply

 
Aleksey Vyazmikin:

Thank you! It all worked out.

I think the right thing - just train convert, because on the test just goes control - so I did, but the result is very strange - the error logloss exceeds 1 on the test sample and grows - how can this be - I'm shocked.

you can try different ways, just to see

here is a good notebook https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets

you can copy and check

Resampling strategies for imbalanced datasets
Resampling strategies for imbalanced datasets
  • www.kaggle.com
Explore and run machine learning code with Kaggle Notebooks | Using data from Porto Seguro’s Safe Driver Prediction
 
Maxim Dmitrievsky:

I don't know, I need a link.

Probably the idiosyncrasies of the transformed examples just

It's the same article - nothing is clear there.

 
Aleksey Vyazmikin:

It's still the same article - nothing is clear there.

it's a copyedit, I gave you a link to the original

 
Maxim Dmitrievsky:

You can try different things, just to see

here's a good notebook https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets

you can copy and check.

So this is the original of the article I looked in Russian.

 
Maxim Dmitrievsky:

It's a copyedit, I gave you a link to the original.

But what's the point - there is no information anyway - the code was ripped out.

 
Aleksey Vyazmikin:

But what's the point - there's still no information - the code is ripped out.

Everything is perfectly written there. I don't have imbalance classes, but I was making them artificially, just to look at

 
Maxim Dmitrievsky:

It's all perfectly written there. I don't have imbalance classes, but I was making them artificially, just to look at


It turned out that "Tomek links" method just doesn't equalize the sample - it reduced the number of null lines from 4005 to 3402, that's why I thought it didn't work.
 
Aleksey Vyazmikin:


It turned out that "Tomek links" method just doesn't equalize the sample - it reduced the number of null lines from 4005 to 3402, that's why I thought it didn't work.
Uh-huh. You have to do over-sampling first, then toomek
 
Maxim Dmitrievsky:
Uh-huh. First you should do oversampling, then tomik

Oversampling does not give us anything yet, but "tomik" has slightly improved the result - it means that there is something in the data, the main thing is to dig properly.

Histogram of models with different quantization settings on the sample.