Discussion of article "Evaluation and selection of variables for machine learning models" - page 2


Hi Vladimir,

Forgive me this silly question but I'm currently trying to construct my own ( very simple ) model starting from your nice example and I'm wondering why you're shifting the ZZ's differences forward in the ZZ function :

 dz <- zz %>% diff %>% c(0,.)

...I mean, after all, we want to train a model to predict the FUTURE value of the Zigzag, so what's the point in training a model using predictors ( technical indicators ) that summarize the market quotes at the end of day N with a target value which is the sign of the difference between the Zigzag value of day N against its N-1 value ( this is what you're doing after shifting ) ? Shouldn't we use the the sign of the difference between the Zigzag's  value at day ( N+1 ) and the Zigzag's value at day N instead ( i.e we wouldn't have to shift ) ?

I know I must have missed something obvious in your methodology but if you could take 5 mns to make this cleat to me, I'd be very pleased.

Best regards.


JulInParis :

Hi Vladimir,

Forgive me this silly question but I'm currently trying to construct my own ( very simple ) model starting from your nice example and I'm wondering why you're shifting the ZZ's differences forward in the ZZ function :

 dz <- zz %>% diff %>% c(0,.)

...I mean, after all, we want to train a model to predict the FUTURE value of the Zigzag, so what's the point in training a model using predictors ( technical indicators ) that summarize the market quotes at the end of day N with a target value which is the sign of the difference between the Zigzag value of day N against its N-1 value ( this is what you're doing after shifting ) ? Shouldn't we use the the sign of the difference between the Zigzag's  value at day ( N+1 ) and the Zigzag's value at day N instead ( i.e we wouldn't have to shift ) ?

I know I must have missed something obvious in your methodology but if you could take 5 mns to make this cleat to me, I'd be very pleased.

Best regards.


The question is correct. There is a typo in the article. It should be like this:

1. calculate the inputs

 x <- In(p = 16 ) 

2. calculate the target

 out1 <- ZZ(ch = 25 )

> head(out1) zz sig [1,] 84.213 0 [2,] 84.199 -1 [3,] 84.185 -1 [4,] 84.171 -1 [5,] 84.157 -1 [6,] 84.143 -1 > tail(out1) zz sig [4995,] 89.3965 0 [4996,] 89.3965 0 [4997,] 89.3965 0 [4998,] 89.3965 0 [4999,] 89.3965 0 [5000,] 89.3965 0

3. Combine x and out in data . Wherein:

  • Delete the examples where sig == 0
  • Create a new variable Сlass (factor)
  • We shift the Class variable to 1 bar in the "future"
  • Remove the variable sig from the set

 data <- cbind(x, sig = out1[ , 2 ]) %>% tbl_df %>% 
   dplyr::filter(., sig != 0 ) %>%
  mutate(., Class = factor(sig, ordered = F) %>% dplyr::lead()) %>% 
  dplyr::select(-sig) %>% 

> data %>% str() Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 4944 obs. of 18 variables: $ DX : num 0.355 0.541 6.324 3.026 9.511 ... $ ADX : num 12 11.3 11 10.5 10.4 ... $ oscDX : num 0.303 0.427 5.012 2.459 -8.641 ... $ ar : num -18.8 -18.8 -18.8 -18.8 -12.5 ... $ tr : num 0.032 0.051 0.037 0.004 0.011 ... $ atr : num 0.0422 0.0432 0.0425 0.038 0.0348 ... $ cci : num -14.75 20.6 27.23 6.22 -33.27 ... $ chv : num 0.0422 0.03 -0.0439 -0.0456 -0.1172 ... $ cmo : num -16.3 -20.1 -26.5 -39.2 -40.7 ... $ sign : num -0.0137 -0.013 -0.0117 -0.0107 -0.0108 ... $ vsig : num -0.00352 0.00655 0.0132 0.01059 -0.00103 ... $ rsi : num 45.7 49.8 50 46.8 42.4 ... $ slowD : num 0.408 0.438 0.447 0.43 0.405 ... $ oscK : num 0.0137 0.039 -0.0116 -0.0427 -0.0322 ... $ SMI : num -18.2 -16.6 -15.8 -16.2 -17.1 ... $ signal: num -12.8 -13.6 -14 -14.5 -15 ... $ vol : num 0.01005 0.01004 0.00985 0.00975 0.00946 ... $ Class : Factor w/ 2 levels "-1","1": 1 1 1 1 1 1 1 1 1 1 ... - attr(*, "na.action")=Class 'omit' Named int [1:34] 1 2 3 4 5 6 7 8 9 10 ... .. ..- attr(*, "names")= chr [1:34] "1" "2" "3" "4" ...

Further on the text.

Good luck

Vladimir Perervenko:
I answered you in the next branch.

Hi Vladimir,

I did not find your answer regarding this question. I also not sure what is the value of Dig. could you plz specify. thank you!

hzmarrou :

Dear all, 

Can someone tell me what the --Dig-- defined in  ZZ function variable means. Is it a constant? if yes what should the value be of this constant?    

Dig - the number of digits after the decimal point in quotes. Maybe 5 or 3.

I'm sorry to be late with the reply. Did not see the question. The discussion is scattered across many branches. I do not have time to track it.

Excuse me.