Machine learning in trading: theory, models, practice and algo-trading - page 185

 
Yury Reshetov:
Don't talk nonsense. In jPrediction I implemented an algorithm that reduces dimensionality of inputs to avoid getting a model in the output that was trained on noisy or unimportant predictors. That is, a choice is made from a set of models with different combinations of predictors, of which only the one with the best generalizability remains.

hypothetical situation....

We have 100 potential predictors, for simplicity let it be indicators.

Let's assume that we initially know that all these predictors have only one profitable situation, it is when the MSI has crossed 90 and the stochastic has just become below zero (a situation from the ceiling of course), Such situation gives the price fall with 90% probability, all other predictors are total noise, all other situations in RSI and stochastics predictors are also total noise, and there are hundreds and hundreds of different situations....

i.e. we have approximately 0.01% of useful signal to 99.9% of noise

Suppose, by some miracle, your MO rejects all 98 predictors and leaves only two - RSI and stochastic

there are hundreds of situations of RSI>0, RSI>13, RSI<85, RSI=0, RSI<145, ............ Since you train MO to recognize all the price movements, MO will build models taking into account all possible situations that are present in the RSI and stochastics, and the probability in those situations that they work is almost zero, but MO must take them into account and build some models by them, despite the fact that this is real noise, and that one working situation will simply get lost among hundreds of other solutions, that's retraining ....

How do you finally get it?

 

You've mixed it all up.

There are different, independent problems. BUT THEY ARE LIKE BRICKS FOR A HOUSE: ONLY ALL TOGETHER WILL GIVE A TRADING SYSTEM.

1. Preparation of predictors. At this stage there is quite a large number of goals and tools corresponding to them. I deliberately skewed the whole problematics of this stage towards getting rid of noise, i.e. searching for such predictors that have predictive ability for this particular target variable. Let me describe the ideal. Taken from an article on genetics. But using my own example.

Let's take the target variable "Muslims" (for clarity). The predictor "clothing", which has two values "pants" and "skirt". Part of the values of the predictor "clothing" with the value "pants" unambiguously predicts the class "male," and the other part predicts women. We also have predictors such as the RSI for the target variable "buy/sell". We all know that the indicator often lies, but part of it predicts one class and part of it predicts another. Therefore it is necessary to look for predictors, some of which predict one class, and some of which predict another. The lesser is the overlap (false positives), the higher is the quality of the predictor. The ideal is "pants/skirts" when the predictor can be divided into two parts without intersections. But this only works for Muslims, and for Europeans....

There are algorithmic methods (PCA, for example, but not the classical one, but with refinement), but you have to start with the content of the predictors. Based on content considerations, one must initially discard Saturn rings, coffee grounds, and the like.... What matters is NOT the correlation of these predictors. For example, we take some derivatives from cotier - they are all from cotier, but we take open interest, volumes... Then for some reason other currency pairs, macroeconomics...

2. Model fitting. This is a separate problem and the first one cannot be solved using the model used. The confusion here is caused by the fact that a lot of model algorithms have a built-in algorithm for selecting predictors. I personally don't know of any built-in algorithms that can solve the first problem.

Reshetov claims to have such a built-in algorithm. But he has never provided evidence for the lack of overtraining using his algorithm.

The first step is mandatory. But it does not exclude, and maybe even implies the use of built-in predictor selection algorithms. But these algorithms must be rid of "coffee grounds" in the first step.

3. binary-alternative classifier. Reshetov, as always, confused the issue with his understanding of the ternary classifier. Ternary is when the target variable has three values, but generally speaking any number of qualitative (nominal, category) values. Reshetov has two binary from which he gets a working signal, which on forex is VERY desirable - to have a ternary buy/flet/sell signal. I use a binary target variable for classification, and for trading from the results of the two binary classifications I get three signals - exactly like Reshetov.

4. Combining the results of several models into a signal to trade is a separate problem. There is a solution proposed by Reshetov. But other solutions were suggested above in this thread. Dik suggested above to take into account the values from which the class is obtained. You can get into this problem too, especially if you remember that classification algorithms give the value of the VARIABILITY of the class from which the class is derived. When we lump the results of several models into one result, taking these probabilities into account is self-evident. There are algorithms that divide these probabilities not by half, but otherwise, which reduces the classification error.

5. Final model evaluation. This is something about which I could not reach an understanding with Burnakov. We take the model and run it "outside the sample", and "outside" is understood as outside the time interval, in which we conducted training, testing, crossvalidation... This step is not constructive, because it doesn't tell us what to do. At this step, a verdict is passed: keep or throw away. The reason for "discard" is not too big a mistake, but its VARIABILITY in comparison to the previous steps. Discard because the model is overtrained, it is hopeless and dangerous. If this step is overcome, then move to the tester, getting the same "keep - discard" result from it.

 
mytarmailS:

hypothetical situation....

...

then MO will build models considering all possible situations ...

Since you train the MO to recognize all price movements, the MO ...

...

but MO has to take them into account and build some models based on them, despite the fact that this is the real noise, and that one working situation will just get lost among hundreds of other solutions, that's the overtraining ....

jPrediction doesn't have to account for every possible situation. It works much simpler than you made up.

The principle of sequential selection of predictors (not a complete enumeration of combinations as you are trying to make up) is available in my post on pg. 109

If you have amnesia, let me remind you that you already clarified the order of selection of predictors on p. 110

 

SanSanych Fomenko:

Reshetov, as always, confused the question with his understanding of the ternary classifier.

...

I use a binary target variable for classification, and for trading from the results of two binary classifications I get three signals-exactly like Reshetov.


Isn't this Reshetov a scoundrel?

He confused the question so much that now even Fomenko is forced to do exactly as Reshetov.

Shurik Shurikovich, take a pie from the shelf. After all, you honestly earned it in the field of criticism radish and bad man - Reshetov.

 
Yury Reshetov:

jPrediction doesn't have to take every possible situation into account. It works much easier than you made up.

The principle of sequential selection of predictors (not a complete enumeration of combinations, as you are trying to make up) is in my post on pg. 109

If you have amnesia, let me remind you that you have already clarified the order of selection of predictors on p. 110

I'm telling you why MO (any) can't select features properly, and you're telling me about green...

 
mytarmailS:

I'm talking to you about why MO (any) can't properly select features, and you're talking to me about green...

jPrediction is fine at culling predictors. Probably not the most ideal method, but ok for applied tasks. Most likely the limit of perfection has not been reached yet and there is potential for further research? The most important thing is that there is a positive result and you can start dancing from it.

The point is, don't project your own biases onto any machine learning methods (and not just in the field of MO).

If something does not work for you and the same thing works for others, it does not mean that there are no normal methods. This just means that you do not use these very normal methods, or you use them incorrectly, due to some personal prejudices.

 
Yury Reshetov:

Isn't this Reshetov a scoundrel?

He confused the question so much that now even Fomenko is forced to do exactly as Reshetov.

Shurik Shurikovich, take a pie from the shelf. After all you fairly earned it on a field of criticism of radish and bad man - Reshetov.

Calm down.

I have NEVER even thought of offending you personally, because you and I are of the same blood.

But your "on the fence" is of undeniable interest to me.

Here's the thing. On the example of a binary.

Suppose the probability of one class is 0.49 and the probability of the second, respectively, 0.51. Is this two classes or "on the fence"?

 
Yury Reshetov:

The dumbest and most unpromising ternary bicycle, though the most primitive in implementation: this is an ANN with three outputs. If each such output has its own classification threshold, then they produce not three but eight potentially possible states, of which only three are unambiguous (a value above the threshold on only one of the three outputs), and five are unclear how to interpret (values above the threshold on more than one of the outputs, or below the threshold on all three outputs).

For the classification, everything is much simpler, it is common to take the output that has the highest value. If the results on three outputs are (0.1;0.3;0.2), then the largest value = 0.4 and output number 2 is active.
The trading model can have this logic:
The highest value on the first exit -> long position,
The largest value on the second exit -> exit all trades and do not trade,
The largest value on the third exit -> short position.
That's all, no thresholds, states, etc.

This is not a bicycle at all, but a method often used in neuronics for classification when more than two classes are needed, so there can be at least tens of classes when classifying images, for example.
One output with a threshold in the middle is enough for two classes.
 

Dr.Trader:

The trading model can have this logic:


  • Highest value on the first exit -> long position,
  • The largest value on the second exit -> exit all trades and do not trade,
  • The largest value on the third exit -> short position.


That's all, no thresholds, states, etc.

Also an option. Although it's not certain that this trivial approach will give a normal summary ability. Sometimes simplicity is worse than theft. I.e. it is necessary to check empirically - autopsy will show.
 
SanSanych Fomenko:

Suppose the probability of one class is 0.49 and the probability of the second class is 0.51. Is that two classes or "on the fence"?

Because ice cream.

I apologize, but whatever the question, that's the answer.

That is, I did not understand the humor, because to make a decision, you need to compare the value of the classifier output with something, for example, with a threshold value. And since in your formulation of the problem the comparable values for some reason are unknown, and only those that are not needed for the classification are known, it would be a good idea to make clarifications.