Retraining - page 4

 
Aliaksandr Hryshyn:
Who uses what techniques to minimise retraining of the model/advisor?
We must proceed from the fact that, in this case, we cannot do without the analysis of historical data, and the concept of "fitting to history" must be completely eliminated. What should be primary is the idea of the TS itself, what it is based on. Each TS is based on the readings of an indicator or indicators. The latter have such parameter as the sampling volume N, which reaches its verdict after analyzing the whole sampling volume. It is natural to suppose that no indicator can handle all the historical data equally well. Therefore, parameters such as SL and TP come to the rescue. Here are 3 basic parameters that need to be optimized. It is necessary to find their values so that the TS could pass the whole range of historical data on the selected TF. Having obtained the build-up chart for the entire period, we need to analyze and accept the fact that, in the worst sections, the original idea behind the TS does not work, due to the multiple faces of the market. Trying to search for and divide the entire plot into "bad" and "good", as well as into "training" and "forward" is a waste of time. The TS needs to be evaluated on the full amount of available historical material.
 
Youri Tarshecki:

So tell me, how do you decide whether a forward is over-trained or under-trained. Does overtraining degrade in some way differently from undertraining?

The only way to determine the quality of training conditions is to see the corresponding quality of the out-of-sample test. And only by comparing the results can you tell if the optimization is over-optimized or under-optimized. But I don't see any topic dealing with under-optimization anywhere. For some reason everyone sees the root of the problem in the mythical overoptimization instead of code quality.

Forward degradation assesses the generalizability of a system (i.e. stability, suitability for unknown data). When it (the ability) is low, then from the practical point of view it makes no difference whether we over-train or under-train - in any case we will throw out. But if the Expert Advisor is trained classically - using the set of parameters and the MT tester - then we cannot be undertrained (from a technical point of view). And if learning is done using an algorithm of its own, we can use well-known methods (e.g., early stopping) to find the optimal point when the "error is minimal" (the best result in OOS): in this point underoptimization is over, while overoptimization has not yet taken place. Unfortunately, the standard tester does not provide such automatic functionality to compare system performance at the optimization period and OOS. The OOS in MT is the forward test. So the highlighted in bold is actually the same as what I am saying. The topic-starter asked a question about retraining, so the answers here are about that. As for underlearning, if by that we mean not the degree of optimization but rather abstract "code quality" which seems to include the choice of predictors, methods of input data preparation, depth of training sample and other meta-knowledge, then underlearning is defined very simply - by absence of positive optimization results.

 

I don't agree about undertraining - technically it's not difficult to do. I.e. the fact of degradation itself does not say whether it is overtrained or undertrained.

If the author takes his mind off the term and understands that the optimum is something in Gauss, then he will formulate the question in a slightly different way - how to determine the optimal amount and density of training.

Then we can be more specific.

Personally, I define it based on the results of the test.

 

Stanislav Korotky:
Термин этот не дурацкий, а давно устоявшийся и "одобренный лучшими собаководами"

In fact, overfitting is not really overtraining, it's overfitting.

 

Let's try to look at the question from a slightly different perspective.

The model/TS when trained (if it has such a possibility) is able to remember some regularities from historical data, also, due to its imperfection, it remembers "noise". We will consider noise as that the TS can "learn" and not give good results on new data, noise may be different for other strategies.

The interesting question is how can we, at least partially, limit the TS/model so that it cannot "remember" the noise? Or how to "cleanse" the historical data from the noise, again, noise in relation to the TS.

Here's a simple example in the picture, let's assume lines 1 and 2 are the same model.

Here's another example, in the area between the two vertical lines, the model(red lines) is very wrong.

 
What are these pictures about, anyway? What about the axes? The number of optimisations?
 
Комбинатор:

In fact, overfitting is not really overtraining, it's overfitting.

Exactly right. Fitting or overfitting is the same thing as overfitting.
 
Youri Tarshecki:
Exactly right. Fitting or re-fitting is the same as the dick
If the fit or re-fit is carried out on the full amount of historical data, it only improves the quality of the TS.
 
Youri Tarshecki:
What are these pictures about anyway? What on the axes? The number of optimisations?
They are not specific to trading. They show simple examples of how a model (lines 1 and 2 in the first picture, and the red line in the second) can be "wrong" on the data (small squares and, below, dots).

Let's assume that on the horizontal axis will be the value of the indicator, and on the vertical axis, for the points - how TS should predict a certain property of the price, using this value of the indicator, in order to be in the plus. For the lines - how the TS predicts. On the basis of this prediction, TS can trade. This is such a conditional example. The predicted property can be the presence of a trend or flat, volatility, etc.
 
Aliaksandr Hryshyn:
They are not specific to trading. They show simple examples of how a model (lines 1 and 2 in the first figure, and the red line in the second) can "err" on data (small squares and, below, dots).

Let the horizontal axis be the indicator value, and the vertical axis, for the points - how TS should predict a certain price property using this indicator value in order to be profitable. For the lines - how the TS predicts. On the basis of this prediction, TS can trade. This is such a conditional example. The predicted property can be the presence of a trend or flat, volatility, etc.

Horizontally, there should be the number of optimization operations of the indicator values or group of indicators, coming at the same moment of the history. And on the vertical line, the result of the checking of the whole system. Then we can talk about the overtraining and undertraining and what they are. And then the curve will be quite different. It will be almost a normal distribution with the only difference that at some point adding the number of passes (or decreasing the step for the indicator parameter) will not give anything new.

How can the number of passes be adjusted?

1. By simply changing the number of training sessions, or by changing the distance (step) of indicator parameters.

2. By changing the depth of history for training in relation to the "step" of OOS check. Then the same part of the history will be optimized a different number of times, though with a different number of "neighbours".

3. By changing a genetic optimization algorithm, if we use it. For example, by changing number of generations, randomness etc.

Here, perhaps, are all available tools to fight PERE and NEDO.

I will also note that if we focus exactly on the check and not on the result during optimization (fitting), then the nature of the curve will not depend on whether the system itself is unprofitable or not. I.e. if the system is losing, it will simply lose even more when over-trained, that's all. The task of training is to find the optimum of parameters. And the task of coding is to find workable variants, the training optimum does not create anything new by itself.