Machine learning in trading: theory, models, practice and algo-trading - page 419

 
Okay, I will throw here the information. The fact that I have an idea of what data should be used to predict the market, but unfortunately to collect them in full and in the right form does not work, if someone helped to organize the collection, I would have shared his optimizer, and the strategy as a whole. The data is already good enough, but to be super, we need to add something. Who is strong in programming and the possibility of obtaining online data from multiple sites in the csv file?
 
Mihail Marchukajtes:
All right, I'm going to throw some information here. I have some idea what data should be used for market forecasting, but unfortunately I can't gather it in full and in the right form. I think if somebody had helped me to organize this, I'd have shared with him the optimizer and the strategy in general. The data is already good enough, but to be super, we need to add something. Who is strong in programming and the ability to get online data from multiple sites in the csv file?
My model is based on multi-threaded and diversified data. I have experience parsing data. I would be happy to participate.
 

I wrote in my blog about choosing neural network parameters - SELECTING NEUROSET CONFIGURATION.

At least at the initial stage it is better to do so, i.e., then, if necessary, you can simplify the NS.

The example of choice in the blog is abstract, but it is from similar considerations that I chose parameters of my NS. The training results, in general, are not bad.

A little afraid of the volume of NS - in the example for 3 MA, it's already more than 100 neurons, and it's not yet a TS, but only a template for it.

 
Yuriy Asaulenko:

I wrote in my blog about choosing neural network parameters - SELECTING NEUROSET CONFIGURATION.

At least at the initial stage it is better to do so, i.e., then, if necessary, you can simplify the NS.

The example of choice in the blog is abstract, but it is from similar considerations that I chose parameters of my NS. The training results, in general, are not bad.

A little afraid of the volume of NS - in the example for 3 MA, it's already more than 100 neurons, and it's not yet a TS, but only a template for it.

Let's try it this way - at the weekend or next week I will throw interesting predictors here, and you tell me your opinion about them. Only predictors in the form of indicators in MT5, 4 things

I will try to use only predictors in trading, and I will have an opportunity to organize a Challenge - who will be able to teach NS to earn by these predictors :) I don't have much experience in this field.

 
Maxim Dmitrievsky:

Let's try it this way - at the weekend or next week I will upload interesting predictors here, and you will tell your opinion about them...? Only predictors in the form of indicators in mt5, 4 of them

It will be possible to organize a Challenge - who will be able to teach NS according to these predictors to earn money :) In the Optimizer RNN Reshetova is doing quite good with them, but I have not yet managed to teach MP to trade profitably using them

I have to saw and saw again before I start real trading). My trading activity is more than a month (admittedly, episodically).

It is interesting to look at your indicators, but I am sorry, I will not lay out mine. In fact, the base (the initial version of 2008) can be viewed here -ButterworthMoving Average - indicator for MetaTrader 4. Of course, everything is different now.

 
Yuriy Asaulenko:

Well, I still have to saw and saw). I have been doing it for more than a month now.

It is interesting to look at your indicators, of course, but I'm sorry, I will not lay out mine. In fact, the base (the initial version of 2008) can be viewed here -ButterworthMoving Average - indicator for MetaTrader 4. Of course, everything is done differently now.


I will do it, just because sometimes my brain turns into a tube and I need an outside opinion :)
 

I don't want to upset anyone, but alas, most of you don't know how to prepare targets correctly. All these inspiring results (75-80% accuracy) on slow candlesticks (>10min), in reality, are just a sweat. The accuracy of 55% is enough to make Sharpe Ratio higher than 2, and the accuracy of 60% on slow data is the same grail, about which there are legends, Sharpe Ratio 3-4, no one trades so on the real, only HFT-people, but they have a different scale of trading costs, there is less SR <2 is unprofitable.

In short...

DO NOT SEE THE TARGET(target)!

That is, when calculating the target, you can't use data that is EVERYWHERE used in the calculation of features, otherwise the result will be with a poke around. For obvious reasons, such "sleight of hand" as ZZ to hell, it interpolates between extrema far into area where features are calculated, the result is exorbitant, at least 90% accuracy without problems, but it's a fake. This is the basis for obscurantist discussions about how "the forecast is not important", we should still develop the TS and so on. So in fact these "90%" all the same "favorite" 50%.


Be reasonable :)

 
Alesha:


In short...

DO NOT SEE THE target(target)!

That is, when calculating a target, you can't use data that is EVERYTHING you use when calculating features, otherwise the result will be a sweep. For obvious reasons, such "sleight of hand" as ZZ to hell, it interpolates between extrema far into area where features are calculated, the result is exorbitant, at least 90% accuracy without problems, but it's a fake. This is the basis for obscurantist discussions about the fact that "the forecast is not important", we should still develop the TS and so on. So in fact these "90%" all the same "favorite" 50%.


Be reasonable :)

I cannot agree with your conclusions about ZZ, as well as with your conclusions in general.

For example, RSI. Which interpolates ZZ or vice versa into this particular predictor. Meanwhile, I can show that RSI as a predictor for ZZ has not a bad predictive ability. And, for example, the wiffle ball has no predictive ability for ZZ and is 100% noise for ZZ - completely useless as a predictor. On the basis of mashka you can get a model for ZZ with an error of less than 10%, but if this trained model is run on a new file, unrelated to the training file, we will get an arbitrary error.

In addition to the problem you mentioned, that there are predictors among the predictors for a ZZ that this very ZZ is derived from, there is another problem that is fundamental and independent of the target variable: it is the problem that the predictor is NOT related to the target, is noise for a particular (ZZ is no exception) target variable. Noise is a very convenient predictor. You can always find values among the noise values that will reduce the prediction error. When I didn't understand this, I very often got prediction error around 5%.

But if you know how to clean the initial set of predictors from the noise for a particular target variable, it is extremely difficult to reduce the error below 30%, at least for me.

Conclusively: Noise predictors that are noise for a particular target variable lead to overtraining, and ZZ is no exception.

 
SanSanych Fomenko:

I cannot agree with your conclusions about ZZ, as well as with your conclusions in general.

For example, RSI. Which interpolates ZZ or vice versa into this particular predictor. Meanwhile, I can show that RSI as a predictor for ZZ has not a bad predictive ability. And, for example, the wiffle ball has no predictive ability for ZZ and is 100% noise for ZZ - completely useless as a predictor. On the basis of mashka you can get a model for ZZ with an error of less than 10%, but if this trained model is run on a new file, unrelated to the training file, we will get an arbitrary error.

In addition to the problem you mentioned, that there are predictors among the predictors for a ZZ that this very ZZ is derived from, there is another problem that is fundamental and independent of the target variable: it is the problem that the predictor is NOT related to the target, is noise for a particular (ZZ is no exception) target variable. Noise is a very convenient predictor. You can always find values among the noise values that will reduce the prediction error. When I didn't understand this, I very often got prediction error around 5%.

But if you know how to clean the initial set of predictors from the noise for a particular target variable, it is extremely difficult to reduce the error below 30%, at least for me.

Conclusively: Noise predictors that are noise for a particular target variable lead to overtraining, and ZZ is no exception.


Fine! Let's have a debate on this extremely important topic. I propose to perform a series of experiments to figure out what is what.

So, I argue:

1) Correct, synthesis of features and classification from a random bundle of time series, into 2 classes, gives 50% accuracy (like a coin), with enough samples (from 5-10k). If there is a statistically significant shift in accuracy (>51%), then there are errors in the process of feature synthesis and / or classification.

2) If we use the tags that use the data used in the calculation of the features, we get a significant bias in accuracy (55, 60, 90%) ON RANDOM TIMES, which a priori cannot be predicted (50%). Which means this grief is false.

 
Alesha:


2) When we use the targeting data used in the calculation of traits, we get a significant bias in accuracy (55, 60, 90%) ON RANDOM TIMES, which a priori can not be predicted (50%). So this grief is false.

And why check anything? It's obvious to me.

I gave the example of RSI-ZZ - nothing in common, and you can build a model with less than 50% error.

Another example: mashka-ZZ - easily less than 10% error. When tested on a new file, a completely arbitrary result.