Machine learning in trading: theory, models, practice and algo-trading - page 1376

 
Aleksey Vyazmikin:

Why study for less than 10% of the entire sample, shouldn't increasing the sample lead to improvement?

And why more than 5k? If you can't train on them, you can't train on more.

 
Aleksey Vyazmikin:

Why training at less than 10% of the entire sample, shouldn't increasing the sample lead to improvement?

And what about an over-trained system, what do you think it will lead to?

 
Farkhat Guzairov:

What about the retrained system, what do you think it will lead to?

The larger the sample, the harder it is to fit the model, in terms of more leaves/trees required.

 
Yuriy Asaulenko:

x - trade number, y - the amount of profit in pps .


Is this a 4-digit, or a 5-digit?

 
sibirqk:


Is that a 4-sign, or a 5-sign?

It's not a sign at all.)
 
Yuriy Asaulenko:
This is not a sign at all.)
And what does it mean - the sum of the profit in pips?
 
sibirqk:
Then what does it mean - the amount of profit in pps?
This is a stock instrument. It shows the possibility of profit on a simple system. The rest is not important.
 
Vladimir Perervenko:

That's not quite right. You have for example train[2000, ] and test[500, ]. You train train with initial example weights = 1.0, make test[] a trained model predicate. Based on the quality of each test prefix you give it a weight. Then combine train and test and form a new training sample, train the model, test it and so on until the whole training sample has weights obtained this way. You can apply a reduction factor to them for older bars, but I haven't checked it. All this is for classification, of course.

I checked it with ELM, it gives good results.

Good luck

I don't quite see how this can improve the performance of the model on new data.

For example if class is not correctly defined, we put decreasing weight, as an extreme option 0. So in subsequent training it will be tantamount to dropping these strings from the sample and on the tray everything will be fine with 100% accuracy, on tests, which are also circularly marked - also everything will be fine. But with completely new data we won't be able to discard strings and it will already be what the model is really capable of doing.

Or did you increase the weight for wrong examples?

 
elibrarius:

I do not quite understand how this can improve the performance of the model on new data.

For example, if class is not defined correctly, we put decreasing weight, as an extreme variant 0. So in subsequent training it will be equal to throwing out these strings from sample and everything will be fine with 100% accuracy, on tests, which also marked in circle - everything will be fine too. But on completely new data we will not be able to discard strings and there will be already what the model is really capable of.

Or did you instead increase the weight for the wrong examples?

Of course downgrading is for "bad" examples. If you raise - it's a classic boosting.

Just do the experiment and check.

I don't do that now. I remove or highlight noisy examples in preprocessing before training.

Good luck

 
Yuriy Asaulenko:

Why do you need more than 5 thousand? You can't train on them, you can't train on more.

This goes in the vault of silly remarks.

Aleksey Vyazmikin:

The larger the sample, the harder it is to fit the model, in terms of more leaves/trees required.

Exactly right, the more the better (less than 100k is noise), but we must consider that the market changes, and how to take it into account in training is a big secret.