Machine learning in trading: theory, models, practice and algo-trading - page 92

 
Vizard_:
Okay)))) but read the conditions carefully -
"post results in % (successfully predicted cases) for both samples (train = xx%, test = xx%). Methods and models don't need to be announced, only numbers".
We are waiting for more results. It is interesting what conclusions are obtained by Mihail Marchukajtes.

my result (if you want, I'll tell you the method as well):

# predict with best models

glm_predict_train <- as.data.frame(predict(glm_obj

, newx = training

, type = "class"

, s = best_models$bestTune$lambda))

glm_predict_train$observed <- train_y

table(glm_predict_train[, 2], glm_predict_train[, 1])

table(glm_predict_train[, 2], glm_predict_train[, 1]) / nrow(training)

# validate with best models

glm_predict_validate <- as.data.frame(predict(glm_obj

, newx = validating

, type = "class"

, s = best_models$bestTune$lambda))

glm_predict_validate$observed <- validate_y

table(glm_predict_validate[, 2], glm_predict_validate[, 1])

table(glm_predict_validate[, 2], glm_predict_validate[, 1]) / nrow(validating)

56% on training:

> table(glm_predict_train[, 2], glm_predict_train[, 1])

      

       down  up

  down  333 181

  up    256 230

> table(glm_predict_train[, 2], glm_predict_train[, 1]) / nrow(training)

      

        down    up

  down 0.333 0.181

  up   0.256 0.230

52% on the test:

> table(glm_predict_validate[, 2], glm_predict_validate[, 1])

      

       down  up

  down  332 173

  up    309 186

> table(glm_predict_validate[, 2], glm_predict_validate[, 1]) / nrow(validating)

      

        down    up

  down 0.332 0.173

  up   0.309 0.186

 
Well, in general, the maximum level of generalization of data is 53% approximately, so..... Occasionally Garbage in Garbage Out appears, i.e. garbage.... There is no way to check it on the test, because Excel does not support long formulas. I don't want to write for it in MKUL, if the model would be of normal quality, I'd try, but with such a percentage of generalization, I think the result on the test will not be very good...
 
mytarmailS:
I do not understand how this "predictability" is calculated and whether there is any sense in it if the target is not taken into account

There with the help of some formulas evaluates how the signal is noisy, or vice versa, it is logical. How and what the formulas calculate, only the author knows, we can only trust that he knows what he is doing.
The point is quite simple - if the predictors themselves are not "noise", they are easier to predict something. And if they are somehow processed, you can get an even more stable signal. A stable signal is a good basis for a forecast.

You can even quickly evaluate the predictors yourself using Omega() function of this package and apply values of some particular predictor (one column from the training table) to it. 0% result is noise and predictor is useless. 100% - everything is good, the predictor can be used.
I suppose that we should not feed the pure indicator values into the function, but their increment, e.g. for a moving average - c(MA[0]-MA[1], MA[1]-MA[2], MA[2]-MA[3], etc.).

About the target values - yes, they are not used in the package. This package can't predict anything. It only somehow determines which predictors you can trust and which you can't, and creates some more new ones based on them. Selecting the target variable and training the predictive model has to be done somehow differently. It is logical that some target variables will be better predicted and some worse.
The target variable is a problem for any package at all. It is not certain that the target variable used can be predicted at all with the available predictors. For example, I can use either "price increase/decrease for the next bar" or "zigzag increase/decrease" for the target variable. I would like to learn how to create new target variables so that they best fit the available predictors. Who knows, maybe I could perfectly predict a flat with my predictors, but I'll never know because I haven't tried it.

 
Conditions as I understand no one reads (It is allowed to use any data manipulation), so I will not torture. In fact, everything is simple.
I just need to take lags from A6, apply a simple formula seven less than five and get 100% on both samples. Thank you all. Good luck...
 
Vizard_:
Conditions as I understand no one reads (It is allowed to use any data manipulation), so I will not torture. In fact, everything is simple.
Just need to take lags from A6, apply a simple formula seventh less than fifth and get 100% on both samples. Thank you all. Good luck...
What's the catch? I, too, can code an output variable in a pile of input junk. You'd never guess. I still don't know what the point of this is.
 
Mihail Marchukajtes:
So, what's the fun of it? I can also code an output variable in a pile of input garbage in such a way. You won't be able to guess it. I still don't know what the point of that was.
He was drawing our attention to the fact that the amount of data used is not always sufficient for prediction. For example, by limiting ourselves to 9 bars, we might miss important information from more distant bars. Also, you can't evaluate a predictor without taking into account its interactions with other predictors.
 
Mihail Marchukajtes:
So, what's the fun of it? I can also code an output variable in a pile of input garbage in such a way. You won't be able to guess it. I still don't get the point of this.
That you can't see the gopher, but it's there.)
Come on, let's try it. Just a simple one.
 
Vizard_:
In that you can't see the gopher, but he is.))
Come on, let's try it. Just a simple one.
Come on, I'll think of something....
 
For example, this file. Chitso train no need to do any test. On the optimizer reshetov shows garbage or 56%, but the gopher is there too. Who can find....??? Really do not see the point in these games, when the output is done from the transformation of the input, here and the network will not be needed.... so that's it....
 
Mihail Marchukajtes:
Well, for example, this file. Chitso train no need to do any test. Reshetov's optimizer shows garbage or 56%, but the gopher is there too. Who can find....??? Really do not see the point in these games, when the output is done from the transformation of the input, here and the network will not be needed.... so that's....
Pilya well pisses off this forum that a simple rar archive can not pin.... It's pathetic in a word....
Files: