Optimization, statistics, and avoidance of curve fitting.

 

I have been curious about this subject, and I'm interested in hearing some educated opinions on the matter:

Curve fitting is the major pitfall of optimization - anyone who has tried to move on to forward-testing knows this.

The way I try to avoid curve fitting is by optimizing on one length of time and back-testing on another. Some people call this a "forward walk." Of course, if you do this 10,000 times and pick the best result you are no better than the automatic optimizer.

I try to find value ranges within my variables that, when tested with a "forward walk," produce a normal distribution (statistically) of +profit.

I am sure that much more sophisticated methods than this exist; what are they? What statistical tests / optimization practices are relevant to the avoidance of curve fitting?

 

try to reduce the number of optimizable parameters, the degrees of freedom. For example if you backtest 200 trades with 5 optimized parameters the result is completely meaningless, 5 parameters leave enough room to fit the model to any random noise of this short length. If you backtest 2000 trades with only 2 parameters you will already come much closer to something meaningful. I'm not a statistician, so I cannot say anything definite, but I know that there exist methods to calculate the needed trials for a given DoF so that the result can be considered significant enough (and it also depends on many factors *how* significant you need it to be). This is not an easy thing and there are many hidden pitfalls.

You should try to find answers in a statistics forum, this question is not strictly trading specific, it is a problem that has been plaguing statisticians for ages and the answers also depend on how you ask the question or how you formulate the problem and how much noise is in your data and how suitable the type of model is you are trying to fit.

For an example just think about how many scientists are still arguing about whether we have a model to predict a trend (or even a man-made influence on it) in the global climate to make *any* assumptions about it or whether there are just too few measurements, too much noise and too many arbitrarily guessed parameters in the models to fit only random noise so that nothing of this can ever be of any statistical significance and further debating it would be just a waste of time.