Machine learning in trading: theory, models, practice and algo-trading - page 30
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
2. See video:
Sorry, but the usual nonsense of an uneducated graduate student ...
As they say, for what I bought, for what I sell. I was asked a question, and I gave you a video with a detailed answer. The lecturer there isn't being clever, he's sparring on statistical learning theory.
see. Vapnik V.N. Statistical Learning Theory. NY: John Wiley, 1998.
SanSanych Fomenko:
1. Noticeable deterioration of generalizability, if we remove at least one informative predictor from the sample
Believe me, unfortunately this proves nothing. Moreover, if the set of predictors is bad (a lot of noise) then this effect will be the stronger the more noise. This is explained very simply: the more noise, the easier it is for the algorithm to pick a "convenient" value.
As they say, to taste and color ...
SanSan Fomenko:
Regarding your file.
1. I could not build 6 classification models on your data: The error is more than 50%. If you want I can post the results here.
SanSanych Fomenko:
2. The reason for this result is that you have a very bad set of predictors - noise, i.e. predictors that have nothing to do with the target variable. Predictors 6, 7, and 8 have some predictive power, but very little. I don't work with these predictors. The others are just noise.Any fool can correctly classify without noise. The noise is there and in decent quantities. There is also useful information. For example, here are results of old libVMR 3.01:
/**
* The quality of modeling in out of sample:
*
* TruePositives: 245
* TrueNegatives: 113
* FalsePositives: 191
* FalseNegatives: 73
* Total patterns in out of samples with statistics: 622
* Total errors in out of sample: 264
* Sensitivity of generalization abiliy: 56.19266055045872%
* Specificity of generalization ability: 60.752688172043015%
* Generalization ability: 16.94534872250173%
* Indicator by Reshetov: 0.1075044213677977
*/
I.e. I have almost 17% more information in generalization ability.
The new guy's performance is noticeably better.
Use crossvalidation to pick up the number of components. The best value on crossvalidation then check on validation set.
It came out not good again. I would have taken 20 components, because min(trainwinrate. validate1winrate, validate2winrate) is the largest compared to other component quantities. And would have gotten a fronttest result of ~55%, even worse than it was. Strange model came out - winning percentage is a little over 50% (not suitable for forex), crossvalidation does not work, importance of predictors cannot be extracted. I can only print it out and hang it on the wall :)
* TruePositives: 245
* TrueNegatives: 113
* FalsePositives: 191
* FalseNegatives: 73
It came out not good again. I would have taken 20 components, because min(trainwinrate. validate1winrate, validate2winrate) is the largest compared to other component quantities. And would have gotten a fronttest result of ~55%, even worse than it was. Strange model came out - winning percentage is a little over 50% (not suitable for forex), crossvalidation does not work, importance of predictors cannot be extracted. I can only print it out and hang it on the wall :)
These are the results of my big experiment. Due to an error in the Windows the code was interrupted and I did not finish training on all the characters. But it is enough for me now. Good results on eurusd.
I have only obtained the maximal values without parameters. There are already good results. The subtleties of GBM setting help.
By predicting the eurusd 512 minutes ahead, you can earn 1.5 pips per trade, am I correct? And the spread is also taken into account? It is also important to know the maximal drawdown for that time, it makes no sense to trade for even 10 pips, if the drawdown for that time was 200 pips. To estimate the trade, it would be good to use the Sharpe Ratio but I haven't seen it in R, to start with we can limit ourselves to the following factor: (final profit)/(maximal drawdown of equity over all time).
For example, let's have a signal 1. The trader has earned 1000% during the year but his maximal drawdown was 50%. At the same time, there is a signal 2 where the author has earned only 600% over a year, but his maximal drawdown was 25%. It may seem that trader 1 is better (in terms of profit), but in fact it is not, it just risks twice as much. The first trader has 1000/50 = 20, the second has 600/25 = 24. So it is better to subscribe to the second signal, and double the risk if you want to risk 50% of the deposit.
It is also important to assess the risks in your experiment. Trading in a small interval can be much more profitable, because the model can react to price jumps in time and make money on them, rather than waiting through a huge drawdown at the risk of catching a stop loss.
Colleagues, if you have time, can you ask me questions under the article?https://habrahabr.ru/company/aligntechnology/blog/303750/
Because Habr is silent at all!
There are too many letters.
By predicting the eurusd 512 minutes ahead, you can earn 1.5 pips per trade, am I correct? And the spread is also taken into account? It is also important to know the maximal drawdown for that time, it makes no sense to trade for even 10 pips, if the drawdown for that time was 200 pips. To estimate the trade, it would be good to use the Sharpe Ratio but I haven't seen it in R, to start with we can limit ourselves to the following factor: (final profit)/(maximal drawdown of equity over all time).
For example, let's have a signal 1. The trader has earned 1000% during the year but his maximal drawdown was 50%. At the same time, there is a signal 2 where the author has earned only 600% over a year, but his maximal drawdown was 25%. It may seem that trader 1 is better (in terms of profit), but in fact it is not, it just risks twice as much. The first trader has 1000/50 = 20, the second has 600/25 = 24. So it is better to subscribe to the second signal, and double the risk if you want to risk 50% of the deposit.
It is also important to assess the risks in your experiment. Trading in a small interval can be much more profitable, because the model can react to price jumps in time and earn on them, rather than waiting through a huge drawdown at the risk of catching a stop loss.
So you have a 57.6% winning percentage on test.csv, right? I will try my method to sift out the predictors and train the neuron, and I will report the results tomorrow.
Not the gain, but the number of correct predictions of the future price direction. The classifier produces one of two values on the test sample: Positive - future price increase is expected, Negative - future price decrease is expected. If it predicts any test sample correctly, it is assigned True. If he/she made a mistake, he/she gets False.
Sensitivity of generalization abiliy: 56.19266055045872% - the future price growth is correct: 100% * TP / (TP + FP) = 100% * 245 / (245 + 191) = 100% * 245 / 436 = 56.192660550458715%.
Specificity of generalization ability: 60.752688172043015% - future price decrease is correctly predicted: 100% * TN / (TN + FN) = 100% * 113 / (113 + 73) = 100% * 113 / 186 = 60.75268817204302%
Not the gain, but the number of correct predictions of the future price direction. The classifier on the test sample produces one of two values: Positive - future price growth is assumed, Negative - future price decline is assumed. If it predicts any test sample correctly, it is assigned True. If he/she made a mistake, he/she gets False.
Sensitivity of generalization abiliy: 56.19266055045872% - the future price growth is correct: 100% * TP / (TP + FP) = 100% * 245 / (245 + 191) = 100% * 245 / 436 = 56.192660550458715%.
Specificity of generalization ability: 60.752688172043015% - correctly predicted the future price decrease: 100% * TN / (TN + FN) = 100% * 113 / (113 + 73) = 100% * 113 / 186 = 60.75268817204302%
Yuri, the first sample on your data:
Two different sets of parameter values for training. It is noteworthy that on crossvalidation the AUC is below the plinth.
All in all, 51.5% accuracy on the test is the best we got.
I don't even know how you get about 60%.