Machine learning in trading: theory, models, practice and algo-trading - page 43

 
Combinator:
What about the opinion that if you go the way of Sanych, you noticeably reduce the already low probability of hitting the coveted 1%?

Each indicator carries some additional information, and it is useful all, not just some one percent. For RSI there is no such strategy "buy at >0.99, sell at <0.01", it was an unfortunate example.

Let us say, you can take an indicator, make an Expert Advisor based on it and optimize its parameters for better results. But such an EA will always fail in the future. In order for the EA not to fail, it needs dozens of indicators (maybe even less, but for me it is not easier), where indicator values will be tested by a complex logic with different conditions. For example, if MA(20) >MA(16), then buy if RSI>0.3. And if MA(20) <MA(16), then do not look at the rsi, but at stochastic. The logic should be something like this, but even more complex and ornate. The random forest model can build such logic, which is very good.

All indicator values are important for model building. The model itself will determine the thresholds of these values for the decision and buy/sell, and the conditions of their fulfillment from the values of other indicators.

 
Dr.Trader:

Each indicator carries some additional information, and all of it is useful, not just one percent.

Are you familiar with Occam's razor principle?

 

If an indicator has a certain range of values, then any value in this range says something, carries its own additional meaning. I do not recommend that you simply take 1% of the upper and lower limits of the indicator and decide to trade only in them. Of course, you can try it, but it will turn out to be unprofitable and you will need a lot of other indicators to add a bunch of conditions to the strategy. That is, you can either trade in the full range of RSI values with a bunch of other indicators. Or you can trade only in a certain range of RSI values, with a bunch more indicators. I don't see how the second way will give me any advantage.

But when initially there are dozens of indicators, all with a hundred variants of lags or parameters, then some of them must be eliminated, and Occam's razor is in full force here. That's why I have only one hundred out of almost 9000 predictors (a dozen of indicators with different lags (shift)). And these remaining predictors are at least 60% accurate.

 
Dr.Trader:

The more noise predictors you have, the more likely it is that among them there will be similar useful data.

Re-learning a priori, the absence of which Sanych so brags about

 
Combinator:

The more noisy predictors you have, the more likely it is that among them there will be one that looks like useful data.

Re-learning a priori, the absence of which Sanych so brags about

I was rather inaccurate about the noise predictors.

I'm bragging here that I have an algorithm that sifts out the noise predictors. But that's not entirely accurate, in the sense that to me there are no 100% noisy and 100% non-noisy predictors. All the predictors I've seen (several hundred, over 10 sets from different people) are partly noisy or partly non-noisy. Always. I haven't seen any others. I'll explain below with numbers.

Now, what are we fighting for?

According to my algorithm, if we take purely noisy predictors, we get a probability of correct class prediction of about 50% - flip a coin. And the tricky part is that you almost always get very good results when you train on purely noise predictors, and if you take out-of-sample time, you get that 50%.

I have an abstract "noisiness" value for each predictor. If that value is between 0 and 1, then noise and complete hopelessness. If it is 1 to 2, you can, but it is better not to use it. You should go with my measure over 3. I have never seen anything over 6.

So, let's assume that the predictors with my measure of "noisiness over 3 were selected. If you build a model, I got an error of 25 to 35% for different sets of predictors. On all types of samples (training-testing-validation-all with random shuffling, and out-of-sample- strictly in the order the bars arrive) about equal, e.g., 32-30-33-35%. There is no way to improve the error by, say, half on the same particular set of predictors. That is, the magnitude of the model error is determined by a particular set of predictors. If you don't like the magnitude of the error, then you need some other set of predictors, which may give a different error.

The error I was getting is of course large, go for me it is important that out of sample the error is about equal in training and testing. But I make the most important conclusion for myself: this set of predictors does not generate overtraining of the model - in the future I will have about the same prediction error... This has been tested on different variations of random scaffolding, ada, SVM. No other models have been tried.

 
Combinator:

The more noisy predictors you have, the more likely it is that among them there will be one that looks like useful data.

Re-learning a priori, the absence of which Sanych so brags about

Andrew. This is clearly accepted. All conclusions are drawn on validation. The chance that a purely noise predictor will generate thousands of guessed and independent observations is very small, negligible . The conclusions are also verified by the correct stat tests I have.

Selecting predictors from noise works.
 
Vladimir Perervenko:2. There is a functioninrminercalled lforecast - Performs multi-step forecasts by iteratively using 1-ahead predictions as inputs. Speaking about multi-step forecasts you mean regression, of course?

I don't know)) I need a tool that would make multi-prediction using a matrix with predictors. The way it makes a prediction is not that important, because regression takes only time series as an input, it doesn't suit me, I need a matrix with predictors...

I looked at the function "iforecast", it takes a time series for regression as input, it's not the same, or what do I misunderstand?

 
Dr.Trader:

Each indicator carries some additional information, and it is useful all, not just some one percent. For RSI there is no "buy at >0.99, sell at <0.01" strategy, it was an unfortunate example.

You've got to be kidding me. I wrote, that I exaggerate (I simplify things to the limit) and I wrote it twice:) Or would it be better if I gave a real example with 135 rules for the system? Despite the fact that for what I wanted to explain one rule is more than enough.
 

To continue the topic about selection

I have this question: we have a predictor (one of many) with some range of values, let them be 10 pcs.

divide the predictor into these ranges X1,X2....X10

let's calculate the importance of each range in the predictor by some means it is not important now how

let's get a table of importance(let me remind you that it is all one predictor divided into sub-predictors)

Х1 = 0,5%

Х2 = 0,01%

Х3 = 0,003%

Х4 = 0,0033%

Х5 = 0,0013%

Х6 = 0,0039%

Х7 = 0,0030%

Х8 = - 0,0000%

Х9 = - 0,0001%

Х10 = - 0,00002%

we see that only one range "X1" really has a strong influence, the influence of the rest is either negative or half a step from negative and it is very doubtful that on the new data these ranges X2....X7 will show themselves for the better...

Question :

what is better? to leave the whole positive range X1...X7 or to leave only the range in which there is no doubt, that is only X1

And once again I remind it is only a selection on one predictor, and if so to clear 200 predictors? on which data will the algorithm better recognize the new data or not?

Who thinks about it ?

 
mytarmailS:

To continue the topic about selection

I have this question: we have a predictor (one of many) with some range of values, let them be 10 pcs.

divide the predictor into these ranges X1,X2....X10

let's calculate the importance of each range in the predictor by some means it is not important now how

let's get a table of importance(let me remind you that it is all one predictor divided into sub-predictors)

Х1 = 0,5%

Х2 = 0,01%

Х3 = 0,003%

Х4 = 0,0033%

Х5 = 0,0013%

Х6 = 0,0039%

Х7 = 0,0030%

Х8 = - 0,0000%

Х9 = - 0,0001%

Х10 = - 0,00002%

we see that only one range "X1" really has a strong influence, the influence of the rest is either negative or half a step from negative and it is very doubtful that on the new data these ranges X2....X7 will show themselves for the better...

Question :

what is better? to leave the whole positive range X1...X7 or to leave only the range in which there is no doubt, that is only X1

And once again I remind it is only a selection on one predictor, and if so to clear 200 predictors? on which data will the algorithm better recognize the new data or not?

Who thinks about it ?

A You can try it. Sometimes removing the tails of distributions and sometimes it helps.