Machine learning in trading: theory, models, practice and algo-trading - page 95

 
Vizard_:
"Gopher" I could not find, screwed up))) And what all these exercises are for... Who "hippies" will understand.
Honestly, I don't get it.........
 
mytarmailS:

well, yes, but it's not right, a quality predictor is one that explains the target, not one that explains itself, i don't know how you can know the quality of a predictor without comparing it to the target, i don't get it....

It depends on how you approach the problem. Initially, we don't know the desired predictors, and we don't know the target variable. And for successful trading it is necessary to know both. It is impossible to know both at the same time, that is why we should either select the predictors according to our target variable. Or you can gather high-quality predictors and then look at their predictions from experience.

First approach. For example I have a target variable - price rise/decline for the next bar. I took it not because it's a good target variable, but because I had to start somewhere, so I took something easier :) Next I took a bunch of indicators and now I'm trying to teach the model on the "bunch of indicators" to recognize "rise/decline". I'm glad that some of it worked, because there are no guarantees that randomly picked target variable is predictable, and there are no guarantees that the predictors have enough information to predict it. ForeCA in this case only serves to preprocess the data, to bring it to a form that is easier for the neural network to learn. For example the data can be normalized instead, or deltas of neighboring values can be taken, or PCA components can be made from them, etc. All this skims the neuronka's work, it is easier to train on such preprocessed data. What ForeCA is required to do in this case is to group similar classes together somehow. But it's not certain that ForeCA will help with this - it depends on your luck and your predictors and target variable.
Briefly, in this case the model and predictors that all together miraculously have at least some prescalability are matched to the target variable.

Another approach. We take a bunch of predictors, sort them according to their "reliability" and remove unnecessary and irrelevant ones. For example, if an indicator has a constant value all the time, it is obviously useless. If the predictor was generated using random(), it is also useless. It is important to leave those predictors that actually carry some information. I don't know a lot about it, the only thing I remember - how to find predictor's importance for PCA component (parameter pruneSig in caret), seems to be an adequate evaluation. ForeCA in this case should sift the flies from the cutlets, and help find reliable predictors. How all these packages know what is important and what is not is a mystery, but they do find noisy and random predictors and reject them. A quality predictor will be one that isn't random, and isn't noisy.
Next, with a set of quality predictors, you should try them to predict something. For example you can build a cohonen map, find the dependence of market behavior on the class in the cohonen map. Make conclusions about which classes to trade on and which not, and thus construct a new target variable for yourself. The target variable will have high predictability with predictors since it is built on them, everything sounds good, but I think there will be a lot of problems and catch phrases of its own.

 
Dr.Trader:

It depends on how you approach the problem. Initially, we don't know the desired predictors, and we don't know the target variable. And for successful trading it is necessary to know both. It is impossible to know both at the same time, that is why we should either select the predictors according to our target variable. Or select a good-quality predictors and then look at what can be predicted using them.

First approach.....

Another approach....

Approaches are various, but are we going to trade?

What are we going to trade?

The trend?

Deviation?

The level?

Keep in mind that we have only two orders - BUY/SELL. Or maybe there is a certain variety of them? Enter the market BUY / Exit the market BUY / Enter the market SELL / Exit the market SELL

These are the variations of the target variable.

Next.

Can we form a target variable to what we're trading?

Or is there some kind of gap between the target and the idea of the trading system, which is an error?

Etc.

But we have to start with WHAT we're trading.

 
Mihail Marchukajtes:
you need to multiply the variable v2 by your lag and divide by v3 .
Show 10 lines with the following columns (v2; lag v2 (Lag1_v2); v3; v11 (target) and a column with the formula v2*Lag1_v2/v3)

v2;Lag1_v2;v3;v11(цель);v2*Lag1_v2/v3

There is no need to download the archive with the file.
 
Mihail Marchukajtes:
lag does not see the network.
You cut the empty lines and stick them wherever you want...
 

SanSanych Fomenko:

It would be nice to know exactly what to trade. There are a lot of options, but you can choose something poorly predictable and spend a lot of time to learn about poor predictability. It is necessary to choose something that is easier to predict on the available predictors. You have to think a lot about how to do this.

Made a test code for prediction with ForeCA. Need a trainData table, with the target variable in the last column. It will be divided by rows into 2 parts for training/validation (will be divided strictly in the middle). For some reason it is always used with PCA in examples, I think it will work here too. You can replace lm(...) function in the code with another model.
ForeCA requires data in matrix with full rank after cov(). I have attached code that checks this in advance, and removes columns with low eigenvalue. It's all in a loop, and takes a long time to execute, and probably could have been made simpler. I got part of predictors eliminated this way, I don't know if it's good or bad.
So far I've got no result, the package ate 5 gigs of despair and thought for a long time, try it, maybe it will give someone a good prediction.

Files:
 
Vizard_:
Show 10 rows with these columns (v2; lag v2 (Lag1_v2); v3; v11 (target) and a column with the formula v2*Lag1_v2/v3)

v2;Lag1_v2;v3;v11(цель);v2*Lag1_v2/v3

The archive with the file does not need to be downloaded.
To be honest, I don't see the point of this game.... Construct the output variable based on the not cunning transformation of input data, there is no point. If the TC will work well, there is no need to use the network. Another thing I thought, what if the optimizer will look at the training file not line by line, but as a whole. Looking for relationships not only between columns, but also in the past values of any columns. The result may be quite interesting. In addition to data normalization, there will also be data preprocessing, producing not only a normalization formula, but also a preprocessing formula, such as: Before normalization, you have to multiply v2 by lag1 v5, etc. Then I think search will be much more interesting. Or rather preprocessing gives out transformation formulas, a kind of pre-optimizer. Then we apply these formulas to data, getting another data set and already it is shoved into the optimizer to build the model.... How do you like this variant of events.....???? Some kind of multivariate search for dependencies, not only between columns, but also between relations with past values. Can anyone tell me how to organize this in MKUL???? I would try it :-)
 
For example, before building a model I optimize each of my inputs with the help of MKUL optimizer. That is, if I use a wand, then before building a model, I select the parameters of the wand in the training interval, so that it gives as many profitable signals as possible, and so with each input, then I shove it all into the optimizer....and.... The model becomes more stable. The model is stable, but about 50% of generalization or a little higher, because each of the predicates during optimization in MKUL gives in a rare case the number of profitable trades more than 50%. But I do it with each input individually and mainly the averaging parameter and lag is selected at a given section, and there is no correlation with other inputs. Suppose we can use optimizer to search for this correlation, as I wrote above. Then we get parameters of relation between the data. Then we apply them to the data and give it all to the optimizer. Then I think it will make sense and the model will be obtained with the appropriate level of generalization. But I don't know how to arrange it yet... I think... I'll listen to any idea ...... I think, a simple neural network on MKUL can help, which can build such a curve, which would maximally describe all signals on the test set correctly, thereby help to build a good model in the Reshetov basic optimizer (it's a matter of who sits on what). There are some simple meshes in MKUL????
 
Dr.Trader:

Made a test code for a forecast with ForeCA.....

me too... I decided to make a quick one, took irises, added 10 predictors with noise

and i just trained Forest, then i took the same date and using foreca i reduced it to 4 predictors and trained Forest again

result:

just forrest on the new data

Prediction   setosa versicolor virginica

  setosa         16          0         0
  versicolor      0         15         1
  virginica       0          0        18

Overall Statistics

Accuracy : 0.98    

forrest didn't even notice the noise in the data ...

and foreca is worse

Prediction   setosa versicolor virginica
  setosa          6          6         4
  versicolor      5         10         1
  virginica       8          5         5

Overall Statistics
                                          
               Accuracy : 0.42 
Files:
code.txt  1 kb
 
mytarmailS:

Me too... I decided to do a quick one, took irises, added 10 predictors with noise

and just trained Forest, then took the same date with noise and using foreCA reduced the dimension to 4 predictors and trained Forest again on them

result:

just forrest on the new data

forrest didn't even notice the noise in the data ...

and foreca is worse.

Truth is good and happiness is better!

Truth = 42%, oh-so-positive for the depo.

Although it's possible to live happily, but with zero depo.