Machine learning in trading: theory, models, practice and algo-trading - page 1236

 
Dear foresters. Is it necessary to do class balancing for trees and forests (equalize the number of examples of different classes)?
 
elibrarius:
Dear Foresters. Is it necessary to do a balancing of classes for trees and forests (equalize the number of examples of different classes)?

No

 
Dimitri:

No

I'm reading here: Flach P. - Machine Learning. The Science and Art of Building Algorithms that Extract Knowledge from Data - 2015

There are several pages devoted to this topic there. Here's the bottom line:

Point 1 noted says that balancing is useful.

But there is also point 2. From which we can conclude that with a large sample, when there are enough examples of a small class, then the sample on it will become representative. And then balancing is not necessary.
How many examples can be considered representative of BP?

And then there's para. 3. But it is difficult to know whether there is such a correction in the particular implementation of the tree in the program chosen to use.

 
elibrarius:

I'm reading here: Flach P. - Machine Learning. The Science and Art of Building Algorithms that Extract Knowledge from Data - 2015

There are several pages devoted to this topic there. Here's the bottom line:

Point 1 noted says that balancing is useful.

But there is also point 2. From which we can conclude that with a large sample, when there are enough examples of a small class, then the sample for it will become representative. And then the balancing is unnecessary.

And then there's para. 3. But it's hard to know if there is such a correction in the particular implementation of the tree in the program chosen to use.

In my opinion, the afftar is laying out the law of large numbers for MO.

Clearly, if you have 10 observations to the first class and 6 to the second, adding 4 to the second will change the model (not necessarily improve it), but it will still not be representative.

 
Dimitri:

In my opinion, the afftar is stating the law of large numbers for MO.

Clearly, if you have 10 observations to the first class and 6 to the second, adding 4 to the second will change the model (not necessarily improve it), but it will still not be representative.

No not large, he explained on small numbers of 10: 8:2 vs. 6:4. But we have a lot of data.


How many examples can be considered representative of BP? I usually don't use less than 10000, small class it should have at least 1000

 
elibrarius:
Yes, he was just looking at examples of 10 vs. 8:2 vs. 6:4. But we have a lot of data.


How many examples can be considered representative of BP?

HZ. I took the maximum, but I was working on daily data for trees and forests - at least 2 years.

Ask A_K - he determined the optimum through Chebyshev's inequality (if I remember correctly), but it is only for continuous variables.

Try to start from the number of variables - at least 100 for each.

In general, if you are trying to find a "perpetual" pattern, the more the better. If the "pattern" is floating, you have to look for the optimal window.

 
elibrarius:
No not large, he explained on small numbers by 10: 8:2 vs. 6:4. But we have a lot of data.


How many examples can be considered representative of BP? I usually don't use less than 10000, small class it should have at least 1000

Although we will be adding thousands, and then the model may change as well.

And maybe it's right. The market, as they say, changes, let the model change.

 
elibrarius:
Although we'll be making additions by the thousands, and then the model might change too.

And for that one do you use wood?

 
Dimitri:

And for it, do you use wood?

For BP analysis, in order to make money.
I don't use it yet, but I'm getting ready to do it. I read the theory for now to understand its pros and cons. I am not satisfied with the results, so I decided to work with forest. It seems to me that for BP it is better suited.
 
elibrarius:
To analyze BP, in order to make money.
I don't use it yet, but I'm getting ready to do it. I read theory for now in order to understand its pros and cons. I am not satisfied with the results, so I decided to go for the forest. It seems to me that for BP it is better suited.

Two years ago I wrote here Maximka that NS is a toy like a nuclear bomb. That if ANY other model gives at least satisfactory results, it is not recommended to use NS - they find something that does not exist and can not do anything with it.

Trees are a good thing, but it's better to use scaffolding.

Reason: