Machine learning in trading: theory, models, practice and algo-trading - page 453

 

The topic has descended into coffee grounds guessing - at least the science was involved, called astrology.


Why do nonsense and blame everything on the input model? It seems that almost a hundred pages discussed that you should take only those predictors that affect the target variable. I always perform datamining and I never have models with errors greater than 40%. True, I have problems with models with less than 30% error. But I never have such an outrage as 50%.

 
SanSanych Fomenko:

The topic has descended into coffee grounds guessing - at least the science was involved, called astrology.


Why do nonsense and blame everything on the input model? It seems that almost a hundred pages discussed that you should take only those predictors that affect the target variable. I always perform datamining and I never have models with errors greater than 40%. True, I have problems with models with less than 30% error. But never such an outrage as 50%.

Because you have "mixed horses people, chips, targeting, ZZ ...", while predicting candlestick color or return, at such frequencies (>5min), would have about the same.

 
Dr. Trader:

Experiment. What if we take different gbpusd, usdchf, usdrub, and other popular symbols and use them to predict eurusd.

Here are two tables in atache, train.csv and test.csv, in them the targeting is the eurusd m5 gain over the next bar, and the predictors are audusdOpen[0]-audusdOpen[1], audusdOpen[2]-audusdOpen[3], audusdOpen[3]-audusdOpen[4], eurusdOpen[0]-eurusdOpen[1], eurusdOpen[1]-eurusdOpen[2], etc. There are 12 symbols in total, the increments of the previous 3 bars of history are taken from each of them. Actually, everything is clear from the column names.
The training table has 10000 rows, that is about 7 weeks.

I tried to train one model and got r^2 = 0.0006164161 on the training data, and if I round up the target and the results to classes -1 and 1, the accuracy is 0.5052. That's pretty bad. But it's just unrealistic to take dozens of bars for each training example and dozens of characters themselves, I have a model on these hundreds of columns will be trained for weeks.
On the testbed, the model's test results dropped, r^2 = -0.003390913 and an accuracy of 0.4907. Random was and random is still random.

But this is all boring and inconclusive.
It was interesting when I looked at what weights the model gave to each predictor (the higher the weight the better):


Conclusion: it is better to try to predict the direction of eurusd on the next m5 bar using first of all audusd, usdrub, usdsgd

Yes, the result is a bad one, but it is honest, the tester will have the relevant equity, not an error of 30% on the forward side and the Sharpe Ratio +-0.5 when it should be 10)))

Your chips are not good at all, at least for each instrument some past returnees with exponentially increasing window (1,2,5,10,30,60...) and it would be better to use minutes.

 

To be honest, I started thinking the same thing about Yuri Reshetov a long time ago. He once said, "I'm going to leave here" I was so surprised, at first I thought he might have gone somewhere in a secret organization, you never know... Then the site stopped working and so on. Too bad if so, let him rest in peace ......

In fact, the seriousness of his work is undeniable..... But it seems to me that he has not finished it just a little bit..... I think to take apart his method in more detail and screw something to it... well, let's see.....

 
toxic:

Because you have "mixed horses people, chips, targets, ZZ...", and would have predicted candle color or returnee, at such frequencies (>5min), would have about the same.


Here I have just nothing mixed up: the main problem in datamining, the main amount of labor.... And you have an intellectual amusement here.

 
SanSanych Fomenko:

Here I have just nothing mixed up: the main problem in datamining, the main amount of labor.... And you have intellectual amusement here.

I just with HFT predicates all is noble, I posted dataset, and 10 min and above there is nothing at all, in the prices themselves, it needs other data, macro, news, etc. in the price is zero, the proverbial efficiency.

 
The toxic:

I have just with HFT predicates all is noble, I posted dataset, and 10 min and above there is nothing at all, in the prices themselves, it needs other data, macro, news, etc. in the price is zero, the proverbial efficiency.

Rather inclined to agree with you. But what about people who open on signs, such as TA, and assure me that regularly and with pleasure win?

There are 2 possibilities: 1. they wishful thinking and it's all just blah-blah language, and 2. over 10 min there is still something with predictive value.

 
I'mnot sure:

I have just with HFT predictions all is noble, I posted dataset, and 10 minutes and above there is nothing at all, in the prices themselves, it needs other data, macro, news, etc. in the price is zero, the proverbial efficiency.

And you get HFT to trade? If it's not a secret, of course, and "honestly" ...

 
Vizard_:

Do not spook, I've been laughing at him and Vova for a long time, although Mishka surpassed all)))
Entry do not shine and do not discuss, let themselves)))


So... more..... You can't get anyone to talk, you can't get anyone to say anything good.... I see you just laughing around the corner.... What good are you?

Probably the same as me. No.... But at least I'm funny.)

 
Vizard_:

Sorry, Teacher))))))


Okay... I'm not mad at you..... I'm just curious,you know,purely theoretically..... Just for the sake of an experiment. I will send my dataset again, it will involve 3 futures, that is almost 9 months of data, you will build a model and give some verdict. Ideally I would like to run your model on my computer, but I do not insist..... Just curious....

So what's up? Shall I post it?