Machine learning in trading: theory, models, practice and algo-trading - page 20

 
interesting article https://geektimes.ru/post/144405/, maybe someone will understand how to simulate this in R
Прогнозирование финансовых временных рядов
Прогнозирование финансовых временных рядов
  • geektimes.ru
Только зарегистрированные пользователи могут оставлять комментарии. Войдите, пожалуйста. Пометьте топик понятными вам метками, если хотите или закрыть
 

I started experimenting with clusters, based on the motives that I voiced earlier. I encountered a problem - when I tried to glue the price from the pieces that correspond to one cluster, then in the places of gluing there were gaps in prices (and this is logical, but it did not occur to me)) ) The question is how to eliminate these gaps

#  типа какая то цена
dat <- cumsum(rnorm(1000))+1000
plot(dat,t="l")


#  фун. матрицы хенкеля для имитации скользящего окна
hankel<- function(data, r=10) {
  do.call(cbind,
          lapply(0:(r-1),function(i) { data[(i+1):(length(data)-(r-1-i))]}))}
#  делаем аналог скользящего окна глубиной в 50
glubina <- 50
D <- hankel(dat,r = glubina)


#  скалирую и центрирую дату, проще сказать нормализирую
DC <- t(  apply(D, 1,    function(x) {  scale(x,T,T)  }    ))


library(SOMbrero)
#  тренирую сеть кохонена на данных чтоб получить кластера
TS <- trainSOM(DC,  dimension=c(3,3))

#  край матрицы будет вектор нашей цены без первых значений
dt <- D[,glubina] 
#  полученые кластера
cl <- TS$clustering

#  график цены , график кластеров этой цены
par(mfrow=c(2,1))
plot(dt,t="l")
plot(cl,t="l")
par(mfrow=c(1,1))


#  пробую посмотреть склееный график только одного кластера 
one_clust <- dt[cl==3]
#  график с разрывами в местах склейки
plot(one_clust,t="l")
 
Dr.Trader:

I have this problem too. Usually it's enough to execute attributes(KZP) to get a list of available variables and then just go through them, for example KZP$window, etc. and find the numbers you need. But here these numbers are generated in the function Summary itself and nowhere stored.

Here here is a source code: https://cran.r-project.org/web/packages/kza/index.html, you need to do something like this:

By the way, it may be useful for you if you use indicators, this function detects in a very noisy data periodic components, period in short, this period is constantly changing in the market.

The idea is to constantly identify this dominant period and adjust indicators to it, not just with fixed parameters like everybody else does. I tested this approach very superficially on a random date, the result was positive compared to the usual approach. I took RSI indicator, it lost with fix parameters, and it earned with adaptive ones. I think that if you are interested in it you may use it, I'll be very interested to read the result of your research

 
mytarmailS:

I started experimenting with clusters, based on the motives that I voiced earlier. I encountered a problem - when I tried to glue the price from the pieces that correspond to one cluster, then in the places of gluing there were gaps in prices (and this is logical, but it did not occur to me)) ) The question is how to fix these gaps

You logarithm the price series, then convert to a series of differences. For gluing in the resulting series leave only those intervals that correspond to the found condition. Then you make a new series from this series through summation. And, if desired, expose it then.
 
Anton Zverev:
Logarithm the price series, then convert it to a series of differences. For gluing you leave only those intervals in the series that correspond to the found condition. Then you make a new series from this series through summation. And, if desired, expose it afterwards.
Thanks, that's about how I imagined it, let's try it
 

I found out a couple more interesting things:

The predictor sifting function posted earlier (designTreatmentsN$scoreFrame) clearly doesn't give the final set of predictors. It doesn't even remove 100% correlated predictors, plus it may remove something you want, plus it may leave garbage. I've complicated their sifting method a bit - first I'm screening out predictors the old way, through designTreatmentsN$scoreFrame (I increased threshold twice - 2/N, to sift out less potentially good predictors). Then I remove all predictors that correlate with each other >0.99 (I have a lot of them after random generation of deltas and sums. And I look at the correlation of predictors with each other, not with the target variable). Then I use a genetic algorithm to find the best set of the remaining predictors.

I try to use something else from the main components article. The function below builds the principal component model, and returns R^2 for the case if all the found principal components are used. It doesn't have "Y-scale" like another example from the article, but this way is faster. I'm using it now to evaluate the predictor set (probably for nothing, I don't know yet :) ). The last column of parameter srcTable is the target variable. There may be errors if the predictor has too few values, the function may not fit some data.

library('caret')
GetPCrsquared <- function(srcTable){
        targetName <- tail(colnames(srcTable),1)
        origVars <- setdiff(colnames(srcTable), targetName)
        # can try variations such adding/removing non-linear steps such as "YeoJohnson"
        prep <- preProcess(srcTable[,origVars], method = c("zv", "nzv", "center", "scale", "pca"))
        prepared <- predict(prep,newdata=srcTable[,origVars])
        newVars <- colnames(prepared)
        prepared$target <- srcTable$target
        modelB <- lm(paste(targetName, paste(newVars,collapse=' + '),sep=' ~ '),data=prepared)
        return(summary(modelB)$r.squared)
}

While my R^2 used to be about 0.1, I have now achieved 0.3. It is still not enough, they recommend at least 0.95. There is also a strange moment that with R^2=0.1 I had an error in fronttest 37%, and with R^2=0.3 this error has grown to 45%. Maybe the problem is that I added more bars and more indicators to the set of predictors. One step forward and two steps back, now I need to somehow analyze the whole set of indicators, and throw out unnecessary ones. Or the principal components model is simply not applicable to forex (it is difficult to verify, first I need to achieve R^2 > 0.95, and see what will be the result in the fronttest, with an untrained model it is too early to draw conclusions).

I also compared the GA package (genetics) and GenSA (gradient annealing, from Alexei's example). Both packages achieved the same result. Genetics knows how to work multi-threaded, so it won in time. But GenSA seems to win on a single thread. There is also a trick to cache results, I think Alexei will appreciate:

fitness_GenSA_bin <- function(selectionForInputs){
        testPredictorNames <- predictorNames[ which(selectionForInputs == TRUE) ]
        #do  the fitness calculation
}

library(memoise)
fitness_GenSA_bin_Memoise <- memoise(fitness_GenSA_bin)

fitness_GenSA <- function(selectionForInputs){
        selectionForInputs[selectionForInputs>=0.5] <- TRUE
        selectionForInputs[selectionForInputs<0.5] <- FALSE
        return(fitness_GenSA_bin_Memoise(selectionForInputs))
}

library(GenSA, quietly=TRUE)
GENSA <- GenSA(fn = fitness_GenSA,
                                lower = rep(0, length(predictorNames)),
                                upper = rep(1, length(predictorNames)),
                                control=list(smooth=FALSE, verbose=TRUE)
                                ) 

The point is that the intermediate function fitness_GenSA_bin_Memoise takes data from the cache if such a set of predictors has already been encountered at least once. The fitness_GenSA_bin should contain the fitness function calculations themselves, and in idea will be called only once for each unique set.

mytarmailS:

By the way, it may also be useful for you if you use indicators, this function identifies periodic components in a very noisy data, a period in short, this period is constantly changing in the market.

The idea is to constantly identify this dominant period and adjust indicators to it, not just with fixed parameters like everybody else does. I tested this approach very superficially and on a random date, the result was positive compared to the usual approach. I took RSI indicator, it lost with fix parameters, and it earned with adaptive ones. If you are interested in it you can use it, I'll be very interested to read the results of your research

For now I'm just using standard parameters for indicators. If I'm wrong about it, I think that the indicator will be very useful for beginners. My method is not very good at transferring results from D1 to H1, the more indicators I use, the more I get stuck on D1. It turns out that indicator parameters need to be changed depending on the timeframe, and time, yes.

 
Dr.Trader:

I found out a couple more interesting things:

The predictor sifting function posted earlier (designTreatmentsN$scoreFrame) clearly doesn't give the final set of predictors. It doesn't even remove 100% correlated predictors, plus it may remove something you want, plus it may leave garbage. I've complicated their sifting method a bit - first I'm screening out predictors the old way, through designTreatmentsN$scoreFrame (I increased threshold twice - 2/N, to sift out less potentially good predictors). Then I remove all predictors that correlate with each other >0.99 (I have a lot of them after random generation of deltas and sums. And I look at the correlation of predictors with each other, not with the target variable). Then I use a genetic algorithm to find the best set of the remaining predictors.

I try to use something else from the main components article. The function below builds the principal component model, and returns R^2 for the case if all the found principal components are used. It doesn't have "Y-scale" like another example from the article, but this way is faster. I'm using it now to evaluate the predictor set (probably for nothing, I don't know yet :) ). The last column of parameter srcTable is the target variable. There may be errors if the predictor has too few values, the function may not fit some data.

While my R^2 used to be about 0.1, I have now achieved 0.3. It is still not enough, they recommend at least 0.95. There is also a strange moment that with R^2=0.1 I had an error in fronttest 37%, and with R^2=0.3 this error has grown to 45%. Maybe the problem is that I added more bars and more indicators to the set of predictors. One step forward and two steps back, now I need to somehow analyze the whole set of indicators, and throw out unnecessary ones. Or the principal components model is simply not applicable to forex (it is difficult to verify, first I need to achieve R^2 > 0.95, and see what will be the result in the fronttest, with an untrained model it is too early to draw conclusions).

I also compared the GA package (genetics) and GenSA (gradient annealing, from Alexei's example). Both packages achieved the same result. Genetics knows how to work multi-threaded, so it won in time. But GenSA seems to win on a single thread. There is also a trick to cache results, I think Alexei will appreciate:

The point is that the intermediate function fitness_GenSA_bin_Memoise takes data from the cache if such a set of predictors has already been encountered at least once. The fitness_GenSA_bin should contain the fitness function calculations themselves, and in idea will be called only once for each unique set.

So far, I just use the standard parameters for the indicators. I think they should be adjusted. I agree with that and it seems to me that the indicators have been developed mainly for D1 share trading, that only on this timeframe they will be good with standard parameters. My method is not very good at transferring results from D1 to H1, the more indicators I use, the more I get stuck on D1. It turns out that indicator parameters should be changed depending on timeframe, and on time, yes.

The thing about caching is cool. I tried to write it myself. Is there a ready to use solution? Cueruet. Thank you.

And what fitness function do you use when overshooting? I missed that. Linear dependence or statistics of some kind?
 
Dr.Trader:

There's no "Y-scale" like another example from the article,

I think that's what the article is all about. I reread it, and it says so explicitly.

If we go back to the general scheme of preparing a list of predictors.

Noise screening is only part of the problem and does not solve the rest of the problems and recommendations in this area.

If all this is accomplished, then the next working algorithm is in a loop, viz.

1. we take the selected and processed set of predictors. This list is constant.

2. For the current window we select the predictors by one of the algorithms. For example there are two algorithms in caret.

3. Fitting the model.

Do some trading

5. Shift the window and go to step 2.

From those initial sets of predictors that I got, the number of noise predictors was more than 80!

Then, the standard algorithm selects about half of the remaining predictors, which the algorithm does not classify as noise. As the window moves, the composition of this half is constantly changing. But always the error on the training set and the error outside the sample are about equal. From which I conclude that my model is not retrained and this is a consequence of the predictor selection algorithms described above.

 

What fitness function do you use for brute-force?

I used to train the forest, and return an error on the validation sample. In principle it worked - if the forest overtrains even a little bit, the error immediately tends to 50%.

Now I use GetPCrsquared(), the same code as above. I also have your example from feature_selector_modeller.txt, but I have to figure out and get the needed piece of code there, so I haven't tested it on my data yet.

 
SanSanych Fomenko:

I think that's what the article is all about. I reread it, it says it directly.

I tried Y-scale too, R^2 in both cases (with and without Y-scale) came out the same (even though they use different packages!).

I understood that Y-scale may give the same good result with fewer principal components. But, if even using all components the result is still unsatisfactory (as I have now) - then there is no difference. This way works faster, which is more important for me now. But I haven't proved by theory or by practice if this method is suitable for picking predictors at all... At first I had an idea to make a principal component model for all predictors and pick the predictors by looking at the coefficients of the components. But then I noticed that with the addition of garbage - R^2 of the model drops. It would be logical to try different sets of predictors and look for those with R^2 higher, but still it's just a theory.

Reason: