How to reduce the number of major classes in models? - General

Maxim Dmitrievsky 2020.11.13 16:27 #21131

Aleksey Vyazmikin:

Maxim, how do you set this thing up?

What is id_tl ?

I don't know, I need a link.

Probably the id_tl of the transformed examples simply

Maxim Dmitrievsky 2020.11.13 16:27 #21132

Aleksey Vyazmikin:

Thank you! It all worked out.

I think the right thing - just train convert, because on the test just goes control - so I did, but the result is very strange - the error logloss exceeds 1 on the test sample and grows - how can this be - I'm shocked.

you can try different ways, just to see

here is a good notebook https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets

you can copy and check

Resampling strategies for imbalanced datasets

www.kaggle.com

Explore and run machine learning code with Kaggle Notebooks | Using data from Porto Seguro’s Safe Driver Prediction

Aleksey Vyazmikin 2020.11.13 16:39 #21133

Maxim Dmitrievsky:

I don't know, I need a link.

Probably the idiosyncrasies of the transformed examples just

It's the same article - nothing is clear there.

Maxim Dmitrievsky 2020.11.13 16:40 #21134

Aleksey Vyazmikin:

It's still the same article - nothing is clear there.

it's a copyedit, I gave you a link to the original

Aleksey Vyazmikin 2020.11.13 16:41 #21135

Maxim Dmitrievsky:

You can try different things, just to see

here's a good notebook https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets

you can copy and check.

So this is the original of the article I looked in Russian.

Aleksey Vyazmikin 2020.11.13 16:42 #21136

Maxim Dmitrievsky:

It's a copyedit, I gave you a link to the original.

But what's the point - there is no information anyway - the code was ripped out.

Maxim Dmitrievsky 2020.11.13 16:47 #21137

Aleksey Vyazmikin:

But what's the point - there's still no information - the code is ripped out.

Everything is perfectly written there. I don't have imbalance classes, but I was making them artificially, just to look at

Aleksey Vyazmikin 2020.11.13 17:04 #21138

Maxim Dmitrievsky:

It's all perfectly written there. I don't have imbalance classes, but I was making them artificially, just to look at

It turned out that "Tomek links" method just doesn't equalize the sample - it reduced the number of null lines from 4005 to 3402, that's why I thought it didn't work.

Maxim Dmitrievsky 2020.11.13 17:31 #21139

Aleksey Vyazmikin:

It turned out that "Tomek links" method just doesn't equalize the sample - it reduced the number of null lines from 4005 to 3402, that's why I thought it didn't work.

Uh-huh. You have to do over-sampling first, then toomek

Aleksey Vyazmikin 2020.11.13 19:29 #21140

Maxim Dmitrievsky:
Uh-huh. First you should do oversampling, then tomik

Oversampling does not give us anything yet, but "tomik" has slightly improved the result - it means that there is something in the data, the main thing is to dig properly.

Histogram of models with different quantization settings on the sample.

Optimization Types - Algorithmic Gator Oscillator - Bill Gator Oscillator - Bill

Machine learning in trading: theory, models, practice and algo-trading - page 2114