Machine learning in trading: theory, models, practice and algo-trading - page 19

 
Alexey Burnakov:
We have not lost anything. At the first measurement we recognize the cluster. At the points of transitions from cluster to cluster you may build a square matrix and change the trade's MO: enter buy on cluster n, close position on cluster m. Then the same matrix for sell. All variants are just going through. And you can vary clustering parameters and look at the result in the loop.
This is interesting.
 
Dr.Trader:

You probably have different parameters for creating forests in r and rattle, so the results are different. In rattle itself you can also change the number of trees and variables.

And you have an error of 34% in rattle on training data, and 3% on test data? Something is wrong with the test data, either it somehow already existed in the training data, or you have a very small dataset and it just happened to be that way.

No, in rattle at all stages there is a small error

and in R at all stages a big one )

parameters are the same, and with any parameters such a gap is not possible...

 
Damn, how can you attach a file here, or it does not attach or freezes to hell.
 
mytarmailS:

And another question for connoisseurs R

library(kza)

DAT <- rnorm(1000)

KZP <- kzp(DAT,m=100,k=3)

summary(KZP,digits=2,top=3)


how can i get it out of "summary" http://prntscr.com/bhtlo9 so that i can work with these digits

I have this problem too. Usually you just have to run attributes(KZP) to get a list of available variables and then just go through them e.g. KZP$window etc. and find the right numbers. But here these numbers are generated in the function Summary itself and nowhere stored.

Here here is a source code: https://cran.r-project.org/web/packages/kza/index.html, you have to do something like this:

summary.kzp <- function(object, digits = getOption("digits"), top=1, ...)
{
        cat(" Call:\n ")
        dput(object$call, control=NULL)

        M=object$window
        if (is.null(object$smooth_periodogram)) {       d<-object$periodogram } else { d<-object$smooth_periodogram }
        
        mlist<-rep(0,top)
        for (i in 1:top) {
                mlist[i]<-which.max(d)
                d[which.max(d)]=NA                      
        }

   cat("\n Frequencies of interest:\n")
   print((mlist-1)/M, digits=digits, ...)

    cat("\n Periods of interest:\n")
    print(M/(mlist-1), digits=digits, ...)
    invisible(object)
}
 

Thanks Dr.Trader, it's nice to have someone to ask about this fiddly R )

About rattle I understand what the problem is, rattle also samples the data "sample()" is this action necessary? With sampling and on R I got the same results, but the trick is that new data will come one candle at a time, it will not be possible to sample them.

and it turns out that if the whole sample is sampled, the results are amazing for all periods, including out-of-sample, but when you submit real data, it's the same as always.

So the question is whether this sampling is necessary at all?

 

Yes, you do. With sample - rattle divides training data into several groups by rows (rows are distributed randomly into three tables, in the ratio 75%/15%/15%). It produces 3 tables from one input file. Columns are not affected by this, they will be the same in all tables.

Train table is used for teaching the model

Validate and test tables - are needed for control of training

Suppose you are going to take the data for the last year and want to train a model on them to trade during the next months. Training will only take place on the train table. After that, you can test the model on the second or third table by counting their errors. If the model is trained correctly, the errors on all three tables will be approximately equal, despite the fact that it was trained using only the first table.

This is easy to check on a random forest. Virtually any set of data can give a 0% error on a train table. But having checked the same model with test and validate tables you will most likely see 50% error there. This means that the model is retrained and after transferring it to mt5 you will gradually drain your deposit.
But if we take the previously posted RData file with SanSanych's example, the forest will give us error of about 30% on the train table. Remarkably, the error on validate and test tables will remain approximately the same, despite the fact that the model has not seen data from these tables during training. This model can be easily transferred to mt5 and traded.

If we simply take all available data without sampling, train the model, see 0% error and be happy, it will be very bad in real trading.

 
Dr.Trader:

Yes, you do. With sample - rattle divides training data into several groups by rows (rows are distributed randomly into three tables, in the ratio 75%/15%/15%). It produces 3 tables from one input file. Columns are not affected by this, they will be the same in all tables.

Train table is used for teaching the model

Validate and test tables - are needed for control of training

Suppose you are going to take the data for the last year and want to train a model on them to trade during the next months. Training will only take place on the train table. After that, you can test the model on the second or third table by counting their errors. If the model is trained correctly, the errors on all three tables will be approximately equal, despite the fact that it was trained using only the first table.

This is easy to check on a random forest. Almost any dataset can give an error on a 0% table. But, having checked the same model but with test and validate tables - you will most likely see 50% error there. This means that the model is retrained and after transferring it to mt5 you will gradually drain your deposit.
But if we take the previously posted RData file with SanSanych's example, the forest will give us error of about 30% on the train table. Remarkably, the error on validate and test tables will remain approximately the same, despite the fact that the model has not seen data from these tables during training. This model can be easily transferred to mt5 and traded.

If you just take all available data without sampling, train the model, see 0% error and be happy, it will be very bad in real trading.

I see what you mean, but the satire is that just on sampled data error is small on all three samples, and big one without sampling.

Send me your email in a private message I'll send you the data, see for yourself or teach me to attach a file because I either do not attach or the forum hangs on trying to attach in the hell

 

kroch, I trained my model (random forest), the result is not satisfied, but not retrained vdidi

1) the target is a zigzag with a knee of 0.5%

2) predictors - levels and candlesticks - 100 things in total (without oscillators and other indicators)

3) The model itself has not optimized, I just put the division of 3, the number of trees 200 pcs

4) By the way, if we pass through PrunePredictors (feature selection), then from all 100 predictors only 3 have been removed, the total of 97 predictors left

The sample is divided into two parts, training and test, the test is the 5th part of the training

On the training part: model error 33%


On the test: the model error is 32%


The data is not sampled, because I don't understand this beast yet

Now we can think about clusters

 

Forum: how to insert a picture

This is the only way to keep the picture on the forum forever and all users, even after a year, can see the picture and understand what it was about.

But the third-party services, not only that carry advertising, are not safe, and also delete pictures after a while.

 
mytarmailS:

The question is not why there are different results in R and rattle on the same data and the same model

Question two: what is the point of testing an "out of sample" model on rattle if it shows the hell

You misunderstood my post.
Reason: