Machine learning in trading: theory, models, practice and algo-trading - page 129

 
Andrey Dik:

Here, clearly spelled out what I do:

In detail: on the current bar a buy signal, like we buy, count the least number of bars forward into the future and check - whether the trade will be profitable, if so, like we close, if not - count forward one more bar and check again. And so on, until the maximum number of bars is reached and we finally close it. This is a learning mechanism.

What is not clear? It's not a fantasy, that's exactly what I do now. The intended function is maximizing profits with minimal drawdowns. I train using my own genetics.

What are we teaching? It can be implemented in such a way, right?
 

SanSanych Fomenko:
1. What do we teach?

2. Isn't it possible to just implement it that way?

The target function is to maximize profits with minimal drawdowns. I teach with the help of my genetics.

2. Yes, very simple.

 
Does anyone know how to find out what language the R package is written in?
 
mytarmailS:
Does anyone know how you can find out what language the R package is written in?

Documentation. Opened from Help in R:

  • Writing R Extensions
  • R Internals

Additionally, there is a detailed description of how to work with Cp

 

Gentlemen, a new task from me:

Here is a dataset in .R format: https://drive.google.com/open?id=0B_Au3ANgcG7CcjZVRU9fbUZyUkE

The set has about 40,000 rows, 101 columns. The far right column is the target variable. On the left are 100 inputs.

I suggest you try to build a regression model predicting the value of the 101st column based on the 100 remaining columns, on the first 20,000 observations.

On the remaining 20,000+ observations, the constructed model should show an R^2 of at least 0.5.

I then reveal the way the data is generated and give my solution.

The clue is the time series data. The input is 100 samples, predicted 1 ahead. It's not prices or quotes or derivatives thereof.

Alexey

 
I can also post this data in csv. It would be interesting to hear the opinion of coryphans about the significance of predictors.

Again, the data is purely synthetic and purely for fun.
 

I tried to find the importance of predictors through the vtreat package. But the package is not able to search for relationships between predictors, it only takes into account direct dependencies between predictor and target, not very suitable for this task.

treatments <- designTreatmentsN(dat_ready[1:20000,], colnames(dat_ready)[1:100], tail(colnames(dat_ready),1))
treatments$scoreFrame #важность  предикторов определяется через колонку "sig"
treatments$scoreFrame[order(treatments$scoreFrame$sig),] #предикторы  отсортированы по важности

Judging by importance of vtreat - lag_diff_51 and lag_diff_52 are the most useful. Little by little, I added other predictors from the list I got, and looked at the growth of R^2 on the training data for the forest. In the end, I settled on these predictors - 51, 52, 53, 54, 55, 17, 68, most likely they are used to calculate the target. R^2 on training data with them > 0.9, but on the test and validations all bad. Now I need to try different mathematical operations with these predictors, pick up formulas and so on, so that on crossvalidation R^2 also grew. I won't look it up further :)

Finished it later:
Experimented a bit more, made a bunch of new predictors from existing ones, with different mathematical operations. Both vtreat and forest really like these two combinations: sum(51,52) and average(51,52). But I have never been able to get a formula for the target value. And models trained on these predictors can't adequately predict anything either.

 

100 entrances? Strong.

Why not a thousand?

You guys don't understand at all what a nervous network is.

 
Dr.Trader:

I tried to find the importance of predictors through the vtreat package. But the package does not know how to search for relationships between predictors, it only takes into account direct dependencies between predictor and target, not very suitable for this task.

Judging by importance of vtreat - lag_diff_51 and lag_diff_52 are the most useful. Little by little, I added other predictors from the list I got, and looked at the growth of R^2 on the training data for the forest. In the end, I settled on these predictors - 51, 52, 53, 54, 55, 17, 68, most likely they are used to calculate the target. R^2 on training data with them > 0.9, but on the test and validations all bad. Now I need to try different mathematical operations with these predictors, pick up formulas and so on, so that on crossvalidation R^2 also grew. I won't look for it any further :)

I finished it later:
Experimented a bit more, made a bunch of new predictors from the existing ones, with different mathematical operations. Both vtreat and forest really like these two combinations: sum(51,52) and average(51,52). But I have never been able to get a formula for the target value. And models trained on these predictors can't predict anything adequately either.

Close walks, but misses. Not everything has been pointed out. There are linear correlations between outputs and inputs. But they are not very helpful.

You've got the woods over-trained. It is better to look at the CV. I'll tell you the secret of the data later. The idea is simple. There is a lot of redundancy in the inputs.
 

Also, 0.9 is overtraining. R^2 greater than 0.6 on the training set is a realistic maximum.

One more thing - remember about interactions. Single connections can lead in the wrong direction.

I myself am trying to solve my own problem. I have applied single-layer NS. The R^2 test does not exceed 0.148. It does not fit...