Machine learning in trading: theory, models, practice and algo-trading - page 102

 
Alexey Burnakov:

There is a cyme to what you are doing.

But you should also try delayed sampling. This is a classic. Train, Test, Validation.

And make the procedure even more complicated. For each model that seems to work well in terms of training and testing, let's call this model X, do validation on the deferred data. You thus get an idea of whether or not you're choosing the model correctly, using only training and testing. Make many models with different parameters, pick the best ones (10, 100, 1000). Fail. You will understand if your "best" metric is reflected in future data or not. Only after that you can go to war.

If there are many values of the same random variable, you can count confidence intervals, and then instead of "close values" operate with "intersection/convergence of confidence intervals.
 
mytarmailS:

Can I see the results of yesterday's trading?

And this is for today. Not without mistakes of course, but in the end quite even....

 
Mihail Marchukajtes:

And this is for today. Not without mistakes, of course, but in the end quite even....

Not bad, what are these green circles and what do the arrows mean?

 
SanSanych Fomenko:
If there are many values of the same random variable, you can count the confidence intervals, and then instead of "close values" operate "intersection/convergence of confidence intervals.

SanSan, let me explain again. And I think it will be clearer to everyone.

Below is a table - the log of the experiment. Each experiment is written in the table. Before column J are the variables. Model, loss function of training, instrument, forecast horizon, model parameters (GBM), parameters which are not optimized in caret, but I optimize them in the loop too: the number of crossvalid fouls, number of predictors selected for training, randomization for tree, cutting off a part of forecasts falling into grey zone of uncertainty.

Then come my quality metrics: on training (the whole array of 10 years), on crossvalid test fouls, and on delayed samples. I marked the most interesting columns in red.

Next. I can show you the best models that made a hell of a difference on deferred samples. But - it's a tough fit!

With consistent data and the right training method, I just expect to get a relationship between the metric on the deferred samples and the metric on the crossvalidation (test). Let's see what I got:

Objectively - the quality of the selected models on the deferred sample (which emulates a period of real trading) is almost completely unrelated to the quality metric on the test (crossvalidation test fouls.).

Conclusions, friends: if I choose the best model according to the following heuristic "the model must be better on the test," I get zero certainty about how the model will perform in the future.

This definition extends to such a scenario: I choose a model according to the heuristic "the best model will show good quality on a delayed sample"; such a choice, friends, will also lead to uncertainty. Everything is probabilistic, of course; you can get lucky, but you can't cheat statistics.

That, and only that, is the benefit of delayed sampling. Checking the performance of the model, checking the heuristics of selecting the best model.

PS: I'm thinking about how to improve the results. A good idea is to have an eliptic slanted cloud. From it you can take committees from the right edge, etc., and it will work on average.

 

You have developed a good toolkit for evaluating heuristics, solid. You have proved that the way you have developed for training the model (committee) is not suitable for forex, but what next?

You need to work out a way to build a model so that there is a correlation between the results on the training data itself, the results on the test, and the results on a pending sample.

I have a similar situation, for example I'm trying different ways to preprocess the data, different training/prediction packages, different functions to evaluate the quality of the prediction. It's all important, and there are endless combinations of it all. I try to stick to Occam's razor rule - the less predictors you need, and the less parameters a model has, the better.

 

Also my subjective opinion is that your predictors can not be used to predict your target values. At least working with your file dat_train_final_experimental1.csv - I cannot get a positive result for my fitness function when selecting gbm parameters. That is, whatever model, with whatever parameters I build - I am not satisfied with the results on crossvalidation. I can not prove it, just a personal opinion, I advise to take more predictors and try to reduce their number when building a model.

For example, in my training table I have 150 predictors for each bar, 100 bars in total, the total of 15000 predictors. Then I use genetics to search for predictors and model parameters searching for the best result of the fitness function. Thus, I select those predictors, which really have some relation to the target values and on the basis of which the model can predict something. At the end of the selection I have only 10-20 predictors. The result of fitness function necessarily goes down a bit for each used predictor. I wrote approximate R code of fitness function on forum here yesterday, it is much more clear.

 
mytarmailS:

Not bad, but what are these green circles and what do the arrows mean?

Green dots indicate a signal to be, each series of green dots ends either with a blue or red dot, which means a Sequence signal to buy or sell correspondingly. Well, the arrows are the work of Reshetov's classifier, which says true signal or false one....

By the way, Sequenta is in atacha, use it on health....

Files:
 
Dr.Trader:

It is also my subjective opinion that your predictors cannot be used to predict your target values.

I think I could articulate it better -

Prediction results on the training samples themselves correlate poorly, on average, with results on the test samples.

There is a ForeCA package, and it has an Omega function that estimates the "predictability" of the signal. If the score is 100%, the signal meets some requirements and is easy to predict. A score of 0% means the signal is just noise, impossible to predict.

I still have your table dat_test_features_experimental.RData, where the last column is the price increase. For example the estimate for eurusd = 0.83% (not 83%, but exactly 0.83%, less than one). According to ForeCA it is impossible to predict this time series. Not that I really trust this package, but its author obviously understands something, I would listen to it.

Omega(dat_test_features[dat_test_features[,109] == "eurusd", 110])

I don't remember what timeframe you're working with, but if it's M1 - there's a good reason to try more, H1 for example.

 
Dr.Trader,

Hear you. I work with a horizon of a few hours.

On the minutes the regression is good, but it lacks the MO of the trade. On the hour the absolute price difference is about 8 pips. What the heck.... You know what I mean? You need 65_70% guessing accuracy. And at 9 o'clock 53-53% is enough to overcome the spread.
 
Dr.Trader:

...

There is a ForeCA package, and it has an Omega function that evaluates the "predictability" of the signal. If it estimates 100% - the signal meets some requirements, and it is easy to predict. A score of 0% means the signal is just noise and impossible to predict.

...

What does "predictability" mean in this package? I thought it meant the ability to extrapolate (that's the word) previous (previous) values. If we take increments, it is a widely used tool, very well worked out with a lot of nuances: ARIMA, if this model does not pass, then all sorts of ARCH. And you have to compare the ForeCA package with these models.

In general it seems to me that the original idea is lost. To me, that original idea was that we need methods that DO NOT depend on the model to determine the ability of each of the predictors used to predict the target variable. It was imperative that the predictor(s)/target variable be linked. And when we have screened out the noise, we use the models or their committees ..... But only after noise is sifted out. And the fact that there is no noise is determined by the approximate invariance of the model's performance across samples. Not the absolute value of the prediction error, but the fact of approximate equality of performance indicators, which (equality) shall be interpreted as a proof of the absence of model overtraining. Absence of overtraining is our everything. If the model is retrained on a given set of predictors, then everything else is a numbers game. Only models that are not retrained are of interest.