Machine learning in trading: theory, models, practice and algo-trading - page 375
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
1) Do I correctly understand the meaning of dividing the dataset into training, validation, and test:?
2) By the way, what is the error rate for training/validation/test to aim for? 15/20/20% or maybe 5/10/15%? or other?
3) I don't quite understand why it is recommended to mix training examples? We will process each example anyway.
By the way, what is the error rate for learning/validation/testing to aim for? 15/20/20% or maybe 5/10/15%?
Previous, yes, something like that.
As for error, it depends on the specifics. If, say, MO or NS determine entry into a trade, then 50% error may be enough. For example, in a successful trade you get an average of 2-3 pts profit, and in an unsuccessful one pt loss. In this case 0.5 is not a bad probability.
The previous one, yes, something like that.
As for error, it depends on the specifics. If, say, MO or NS determines the entry into the trade, then 50% errors may be enough. For example, in a successful trade you get an average of 2-3 points of profit, and in an unsuccessful one point of loss. In this case 0.5 is not a bad probability.
Because I want to learn up to 10%, but if this figure is unrealistic, then I'll be wasting my own time and CPU time. Let's say - what is the best error you have achieved and at what level can you stop and not look for improvement?
0.5 may be a little low... And what values should I strive for? Which ones can I really reach in practice (not some other NS tasks, but for trading)?
Because I want to learn up to 10%, but if this figure is unrealistic, then I'll be wasting my time and my CPU. Let's say - what is the best error you have achieved and at what level can you stop and not look for improvement?
0.5 is not enough? Oh, come on.) I already gave this example: poker player has probability of winning 1/9-1/6, and good players are always in profit.
And all my systems worked at ~0.5 probability, and were always in the plus. As far as I know, many TS work with probability close to 0.5 - it was mentioned at autotrading conference, in particular.
"And then I want to train up to 10%, but if it is an unrealistic figure" - whether real or unrealistic depends on the specific task. For example, I trained NS for MAC crossing - so it's almost 100% reliable)).
0.5 is not enough? Oh, come on.) I already gave this example: poker player has probability of winning 1/9-1/6, and good players are always in profit.
And all my systems worked at ~0.5 probability, and were always in the plus. As far as I know, many TS work with probability close to 0.5 - it was mentioned at autotrading conference, in particular.
"And then I want to train up to 10%, but if it is an unrealistic figure" - whether real or unrealistic depends on the specific task. For example, I trained NS for MACs crossing - so it's almost 100% reliable)).
It's true, you can do it without any forecast (50%), you just need to take the stop more than the stop. Actually you can't forecast anything, nobody knows where the price will go in Forex, only insiders know.
In fact, it is impossible to predict anything, no one knows where the price will go in the Forex market, only the insiders, the puppeteers can know that.
1) Do I correctly understand the meaning of dividing the dataset into training, validation, and test:?
2) By the way, what is the error rate for training/validation/test to aim for? 15/20/20% or maybe 5/10/15%?
3) I don't quite understand why it is recommended to mix training examples? We are going to process each example anyway.
1) Not all and it's fundamental.
We take one large file. We divide it into two unequal parts.
Split the larger part like you described. We get the errors, which should be approximately equal.
Then we test the model on the second part of the file. The error in this part again should not be very different.
This is the most important proof of the absence of feathering (overfitting).
The magnitude of the error? This is a kind of constant, which is derived from the set of predictors that can be reduced by fitting the kind of model.
For example.
If you have all four errors around 35%, then by selecting a model, God forbid you reduce the error to 30%.
PS.
An error of less than 10% is a clear sign of overtraining. If you have such an error, you should double-check.
I found an early-stop training with validation section in ALGLIB:
Judging by the code, it does not compare the error in the training and validation sections, but searches for the minimal error in the validation section. And it stops if it doesn't find a better one after 30 iterations, or if all iterations have passed.
But I'm not sure if this method is better/more accurate than the usual one... Unless the number of training cycles is several times higher...
Here's what came out:
It feels like there was a fit to the validation plot. The test one is generally successful, but it wasn't in training and wasn't compared, apparently just a coincidence.
It's an ensemble count with this function, and there's a 2/3 split and everything is mixed between both plots, I'll try to do the same...
Shuffled it:
Due to the mixing, the error is equalized on the training and validation sections.
Something seems wrong to me, because in real trading the bars will follow their own order, and not mixed up with those of an hour and a day ago.
And if the "nature" of the market changes, it means that it is necessary to re-learn or look for new NS models.
Actually it is possible. It's not the same as 50/50 with forecast and Buy more than Stop, God willing)) - I.e., without a forecast. We flip a completely different coin.))
1) Not all and it is fundamental.
We take one big file. Divide it into two unequal parts.
Divide the larger part in the way you described. We get the errors, which should be about equal.
After that, the model is checked on the second part of the file. The error in this part again should not be very different.
This is the most important proof of the lack of feathering (overfitting).
How many cycles of training/validation do you need to do? Haven't seen any information about that anywhere... 1 cycle total? - and right after that we either approve or change something in the predictor set or network scheme? More precisely for N cycles of training we will be shown one best.