Machine learning in trading: theory, models, practice and algo-trading - page 9

 
Dr.Trader:

A little addition to the previous post. No, there are no deltas. I'll have to try.

Anyway, I tried to look for a correlation on your data. More likely no than yes. I happened to find something on half of the randomly taken observations from the train. Started checking on the other half, and the same correlation isn't there. It turns out that either it is absent at all, or the data is so constructed that no good dependence can be found on them.

Let me try again, though. I will give my feedback.

Alexey

 

I suspect that there is something missing in this data. It's like in your assignment from your first post - if you remove even one of those 6 entries from the sample, the result becomes unpredictable. Forex clearly depends on its own past price, time of day, etc. And this "itd" is exactly what is missing in my data, and therefore the models simply cannot adequately find any regularities and describe the logic. Les apparently can not try different combinations of "subtract high from low" and take the best of them during optimization, and such things need to be added to the sample itself. I am currently revising my script which saves forex data in csv, I will add there a lot of deltas, plus distances to past tops of the zigzag, according to the advice. Then I will post a new file with data for experiments.

 
SanSanych Fomenko:

Attached are a number of articles that supposedly solve the problem of clearing the original set of predictors from noise, and with much greater quality. Unfortunately at the moment I do not have time to try. Maybe someone will try it and post the result?

I was able to repeat that described process. According to the results - my set of predictors describes the result with 0.1% confidence, or something like that... There's a lot of theory beyond my knowledge, I didn't understand everything.

I added 3 files. There you can just change the path to the csv file and you can run. The required result should be in the last column in the csv, all the rest are predictors. Do not normalize anything beforehand, feed the data as it is.

1) Principal Components Regression 01. Some code from introduction part of article, I think there is something missing in it, because I have errors when running the code. It should give each input a score, and draw a graph, here, unfortunately, I do not understand what to what and how to apply it.

2) Principal Components Regression 03. Principal Components Regression, Pt. 2: Y-Aware Methods. I skipped the first part, because the article says that the first part is another, weaker algorithm.
The code is divided into a couple of parts, you have to run them one after another to look at the graphs drawn in R after each part.

The first run - you need to copy and run everything from the file to the beginning of the second step (the beginning of the second step is highlighted in bold letters STEP 2). The R console will show a table where the smaller the value of the input, the better it is. Value = 1 = garbage. Plus the graph, the longer the line the worse, similar to the table.

Next we need to run the code from step two. At the end there will be a chart where the longer line relates to input, the more reliable it is (it was vice versa in step 1). There is also an examplePruneSig variable in the code that is responsible for screening the inputs according to the value of psig from the table in step 1. You can set variable value =1, if you want to see transformed value of plausibility of all inputs. For it may be that in the first step the input was evaluated badly, but in the second step it became better. It is recommended to take either some threshold value or examplePruneSig = 1/number of_inputs, but there are no exact instructions.

The third step is Principal Component Analysis (prcomp). This is something new to me again, but the point is that this function tries to output many "principal component" (PC) (something like internal variables, on which the desired result depends). Each of these internal variables relies on a different set of input data. Further the task is to collect the minimum set of such PCs, which can reliably determine the result. And the result sample of predictors itself is the predictors of those PCs that fall into that minimal set.
The article itself doesn't solve this problem, it just takes the first 2 PCs and looks at whether it worked or not. But I could have missed or missed something, you'd better read it yourself if you understand prcomp.
But, at the end of the step will be drawn a diagram of the first 5 PCs, and the inputs used by them. The longer the line on the diagram, the more important the input.

Steps four, five, six - evaluation of the result on training and test data.


3) Principal Components Regression 04. A function from another package that does the same thing as in Principal Components Regression 03. But the bonus is that it sifts out the PC, and leaves a minimum set of those that can describe the result with an accuracy of 95%.
But there are no adequate examples and graphs, I guess you have to use something from Principal Components Regression 03.


tl;dr:

1) Sifting out noisy values. Take file "Principal Components Regression 03", run code only up to second step (not including second step). there will be a table in R, only inputs with psig value less than threshold should be taken. Value "1" is noise and random. "0" is good. Unsurely a threshold of (1/number of inputs) is recommended. This method does not give any guarantees for correct sampling of inputs, rather just allows to remove really random and noisy values.

2) More complex approach. A clever algorithm creates some Principal Components, which can be used to calculate the result. PC is a kind of function that describes some internal process in the model being simulated. And the result of the model itself is a set of interacting PC's. Next we take the minimum set of PCs that describe the result with high accuracy, look at the inputs used for these PCs. These inputs are the set we need without garbage. The file "Principal Components Regression 04" allows us to get the minimal set of such PCs, but it is not clear what to do with it further, we still need to pull out the predictors used.

Here is the article itself again, and the code for it.

http://www.r-bloggers.com/principal-components-regression-pt-2-y-aware-methods/

https://github.com/WinVector/Examples/blob/master/PCR/YAwarePCA.Rmd


Principal Components Regression, Pt. 2: Y-Aware Methods | R-bloggers
Principal Components Regression, Pt. 2: Y-Aware Methods | R-bloggers
  • Nina Zumel
  • www.r-bloggers.com
In our previous note, we discussed some problems that can arise when using standard principal components analysis (specifically, principal components regression) to model the relationship between independent (x) and dependent (y) variables. In this note, we present some dimensionality reduction techniques that alleviate some of those problems...
 
Dr.Trader:

I suspect that there is something missing in this data. It's like in your assignment from your first post - if you remove even one of those 6 entries from the sample, the result becomes unpredictable. Forex clearly depends on its own past price, time of day, etc. And this "itd" is exactly what is missing in my data, and therefore the models simply cannot adequately find any regularities and describe the logic. Les apparently can not try different combinations of "subtract high from low" and take the best of them during optimization, and such things need to be added to the sample itself. I am currently revising my script which saves forex data in csv, I will add there a lot of deltas, plus distances to past tops of the zigzag, according to the advice. I will post a new data file later for my experiments.

I have tried to search some more. I have done a validation check. But the dependence found was not confirmed. In general, I think there is not enough information in the data. Try to expand the list of inputs, yes.

And here is my forex data: https://drive.google.com/drive/folders/0B_Au3ANgcG7CYnhVNWxvbmFjd3c

dat_train_final - file for model training. There's a history of 5 currency pairs for 10 years and all my predictors.

Many_samples - I need a load in R. This is a list - in every element there is a validation sample, and there are 49 of them in total. Any or all of them can be validated.

Meet Google Drive – One place for all your files
  • accounts.google.com
Google Drive is a free way to keep your files backed up and easy to reach from any phone, tablet, or computer. Start with 15GB of Google storage – free.
 

I do not see your files, the link is just an empty folder.

So here is my new file for training on eurusd (h1, 5 bars, target is increase/decrease of price for the next bar). I analyzed it following above mentioned article principal-components-regression-pt-2-y-aware-methods and it appeared that data reliably describes less than 1% of results.
(RData by SanSanych has this number more than 10% for Rat_DF1), so apparently I have garbage again. I don't think I could train the model on this file, it's more suitable if you want to train to sift out the predictors.

The archive contains 2 files, the condition for working with them is as follows - teach the model in the first file (it is more convenient to divide it into several pieces for testing and validation, Rattle divides it into 75%/15%/15% by default), then when the inputs are selected and the model is trained you do a physical test in the second file. If the error is less than 45% - there is a chance to trade with such a model in forex. There might not be any profit, but you can gain bonuses from brokers for the number of trades and rebates. If the classification error on the second file is less than 40% - this is already a profitable strategy.

Files:
 
Dr.Trader:

2) Principal Components Regression, Pt. 2: Y-Aware Methods. I skipped the first part, because the article says that the first part is another, weaker algorithm.

As it seemed to me, maybe I'm wrong, but the uncertainty of your result is generated by not understanding the essence of the method of principal components. And the essence is as follows.

From the existing predictors create new predictors, which would have some new useful properties.

The result is presented as a table, with PC1, PC2... in the header, the names of the table rows are the names of your predictors, and in the columns under each PC are numbers - the coefficients by which you need to multiply your initial predictor to form a PC value. I.e.: for a particular bar we take the values of the initial predictors, multiply by the coefficients and get the PC value, then the next bar and so on. As a result, besides your initial vectors, for example Ask, we get one more vector

All PCs in the table are ordered. The first is the PC that most explains the variability in the original set, the second is the PC that most explains the variability from what is left of the first PC. For example, PC1 = 0.6, PC2 = 0.2, then the sum of PC1+PC2 explains 0.8 of the variability. Usually for large sets of predictors, 5-6 of these very "principal components" are enough to explain over 95% of the variability. This is in the case if most of the predictors are noise ones and there really are "main" ones among them!

I have described the classic "principal components". What makes the article interesting for us is that it, unlike the classics, calculates variability with respect to the target variable. Thresholds, on the other hand, are needed in order to pick something out of a completely hopeless set of predictors. As I see it, this is not relevant for us. It is relevant, for example, for statistics in sociology, where it is very difficult to collect anything additional. In our case, even on one currency pair it is possible to create an enormous number of predictors.

Maybe you can make another run (hit) on these fundamental components?

PS.

1. Let's not forget that the principal components require prior normalization of the raw data

2. The resulting principal components have the remarkable property that they are independent of each other.

3. The principal components can be predicted.

 

Now I understand, thank you for the clarification. I only learned about this model from the article. I thought that PC1 and PC2,3,4,... correspond to different sets of predictors, not coefficients. I did see the table of coefficients, now I quickly found where to get what.

The code from PrincipalComponentRegression04.txt seems too complicated. Plus, there doesn't seem to be a definition of variability with respectto the target variable, I went back to PrincipalComponentRegression03.txt from the archive I attached this morning.

You have to do the first 5 steps.

Next,

> model$coefficients

(Intercept) PC1 PC2.

0.02075519 0.40407635 -0.42250678

the result of the execution should be intercept + coef1 * PC1 + coef2 * PC2 + ... + other PCs if there are

The values of PC1, PC2. :

> proj

> proj

PC1 PC2

X_clean 0.00516309881 0.00477076325

X1_clean 0.00142866076 0.00149863842

X2_clean -0.00008292268 0.00001010802

.....

PC1 = X_clean * 0.00516309881 + X1_clean*0.00142866076 +.

It's a mystery to me now whether the postfix "clean" is a reference to take the original value of the inputs, X, X1, X2,... before normalizations and transformations, or not.

I'll take an easier example later and calculate all values manually to compare if I've got the formulas right or not. For now, just a guess)

But then it turns out that this method is not designed to sift out predictors; it is rather designed to train a model that itself ignores the maximum number of predictors. We can only calculate the average coefficient for each predictor, and exclude something below the threshold.
By the way, this model itself is very similar to neuronics, only without activation function, and without shift on inner layer neurons. But the essence is the same.

One more problem - how many PC components to take. If it's less than 95% you need to go back to step 3 and change proj <-extractProjection(2,princ) from two to three, then perform steps 3,4,5, calculate error, and if it's less than 95% go back to step 3 and increase number of components again.

If we had direct access to R in MT5, we would have a finished model ready to trade. I understand that this model doesn't suffer from the problem of retraining, it's very good if it is. I.e. you achieved the accuracy of 10% and that's good.

Almost everything is clear about the model. It would be very good to implement it in MT5, to implement only the logic of decision making by coefficients. It is not clear how to connect R with MT5. I can export all data from mt5 to csv and then process them in R, train the model and write the coefficients in another csv. Read CSV with coefficients from Expert Advisor. And then it will be very bad because R has many functions which normalize data before calculate PC from it. It is hardly possible to repeat this code for normalization in MT5. We will have to think about it.

 
Dr. Trader:

Now I understand, thank you for the clarification. I only learned about this model from the article. I thought that PC1 and PC2,3,4,... correspond to different sets of predictors, not coefficients. I did see the table of coefficients, now I quickly found where to get what.

The code from PrincipalComponentRegression04.txt seems too complicated. Plus, there doesn't seem to be a definition of variability with respectto the target variable, I went back to PrincipalComponentRegression03.txt from the archive I attached this morning.

You have to do the first 5 steps.

Next,

(Intercept) PC1 PC2.

0.02075519 0.40407635 -0.42250678

the result of the execution should be intercept + coef1 * PC1 + coef2 * PC2 + ... + other PCs if there are

The values of PC1, PC2. :

> proj

PC1 PC2

X_clean 0.00516309881 0.00477076325

X1_clean 0.00142866076 0.00149863842

X2_clean -0.00008292268 0.00001010802

.....

PC1 = X_clean * 0.00516309881 + X1_clean*0.00142866076 +.

It's a mystery to me now whether the postfix "clean" is a reference to take the original value of the inputs, X, X1, X2,... before normalizations and transformations, or not.

I'll take an easier example later and calculate all values manually to compare if I've got the formulas right or not. For now, just a guess)

But then it turns out that this method is not designed for sifting out predictors, rather it is designed for training a model that itself ignores the maximum number of predictors. We can only calculate the average coefficient for each predictor, and exclude something below the threshold.
By the way, this model itself is very similar to neuronics, only without activation function, and without shift on inner layer neurons. But the essence is the same.

One more problem - how many PC components to take. If it's less than 95% you need to go back to step 3 and change proj <-extractProjection(2,princ) from two to three, then perform steps 3,4,5, calculate error, and if it's less than 95% go back to step 3 and increase number of components again.

If we had direct access to R in MT5 we would have a finished model ready to trade. I understand that this model doesn't suffer from the problem of retraining, it's very good if it is. I.e. you achieved the accuracy of 10% and that's good.

Almost everything is clear about the model. It would be very good to implement it in MT5, to implement only the logic of decision making by coefficients. It is not clear how to connect R with MT5. I can export all data from mt5 to csv and then process them in R, train the model and write the coefficients in another csv. Read CSV with coefficients from Expert Advisor. And then it will be very bad because R has many functions which normalize data before calculate PC from it. It is hardly possible to repeat this code for normalization in MT5. I have to think.

As far as I see, you have to use Principal Components Regression, Pt. 2: Y-Aware Methods

As understood from a superficial glance at the text, the main idea is that scaling is performed taking the target function into account. In conventional PCA, the target function is not taken into account at all. Because of this, supposedly the first components are the components most important for explaining the target function, rather than explaining the variability of the entire set of predictors!


Another problem is how many PC components to take.

This is the reason for all the fuss. Intuitively, without any thresholds. If the first 5 components do not explain more than 95% of the variability, then we should look for a new set of predictors. Although I may be wrong.

About the model seems to be almost everything is clear. It would be very good to implement it in MT5.

Everything works fine in MT4. There is a library in Pascal with source code. I have not tried it myself, but to my opinion, if MT4 is able to refer to the library in pascal, MT5 should do it even more.

The appeal to R looks like this.

1. OnInit sets connection to R. If there is specially prepared data, the working area is loaded. Apart from this the code in R arranged in one or many functions is loaded. The number of strings in each function is of course arbitrary - determined by the logic.

2. These functions are called in the body of the Expert Advisor or the indicator.

If we consider that R has the richest graphics not bound to the terminal window, we have great opportunities to visualize the data in parallel to the terminal.

 
SanSanych Fomenko:
Here I read, read.... and I cannot understand what is the target variable first from the formal point of view: a real number (regression) or a nominal value (classification). Also, if we are discussing the degree to which predictors influence the target variable, it would be nice to know the meaning of the target variable itself.

Paradoxically, classification is the same as regression.

Only for regression the output is real, and for classification it is probability.

And the target for regression is a continuous curve, while the target for classification is impulses (0, 1) or (-1,+1).

This output is then translated into the appropriate class (ifelse(y > 0.5, 1, 0).

 
Dr.Trader:

I don't see your files, the link is just an empty folder.


this is training: https://drive.google.com/file/d/0B_Au3ANgcG7CN2tTUHBkdEdpVEU/view?usp=sharing

this is validation: https: //drive.google.com/file/d/0B_Au3ANgcG7CZmFWclFOd0RqNFk/view?usp=sharing

the validation should be handled like this load(validation_file)

each list item contains a unique validation sample, where independent observations. The validation samples have almost no overlap with each other because the observations in them are taken from random time points. Each validation sample can be viewed as a point estimate of the trade.

This is done so that the trades are not modeled at every minute. Trades are modeled approximately every 12 hours.

Reason: