Machine learning in trading: theory, models, practice and algo-trading - page 884

 
SanSanych Fomenko:

I understand the difference between trees and forests (or I think I do) forests are better to use when there is more uncertainty in the data, i.e. a less stable pattern since forests make decisions by voting, which occurs on random (independent due to shortening) trees, or am I wrong?

I don't know, I'm judging by the results.

And the option "adad" I do not have, it is not in the screenshot, there is "Forest" - is not it?

In order:


Tree

The 'rpart' package provides the 'rpart' function.


Boost

# Extreme Boost

# The `xgboost' package implements the extreme gradient boost algorithm.


SVM

# Support vector machine.

# The 'kernlab' package provides the 'ksvm' function.


Linear

# Regression model

# Build a Regression model.


Neural Net

# Neural Network

# Build a neural network model using the nnet package.

library(nnet, quietly=TRUE)


By the way, I did this work for you - you can look it all up yourself in Log. If you have another version of rattle, the list may be different.

Thanks for the transcript. I have version 5.1.0 probably the latest - all these packages are installed automatically when you call them, and in addition there is "Forest".

SanSanych Fomenko:

The library is modified to my order - I needed a tester from MT5. I counted, too lazy to search, maybe cleaned it up.

Have a look at the articlesby Vladimir Perervenko

If you are interested in networks, he has the latest peak in this area, R, advisors, the man is available on the site

Thanks, I'll have a look.

 
SanSanych Fomenko:


So how do you cut a file with R, do you need to use a special algorithm? It's interesting to see what happens in the end.

By index, for example: [1:2000,], [2001:4000,]. It is important not to break the natural time sequence in the second file.

In other words, you can cut it in Excel, right?

 
Maxim Dmitrievsky:

The main thing is to remember to read the theory, so as not to do something stupid, and it's not difficult to download any package, there are plenty of them, and even online - you do not need to install anything. There is a boom of datasens, "it" is everywhere

I have no time to analyze archives, I'm working on my own stuff

The point is that different variants give different results, and if so, then conclude on the quality of predictors how? Just turns out it is necessary to take the average version, ie if everywhere is not bad, then fine, and then tune the network / tree / woods?

And still, I want to know how to transfer logic of the tree to Expert Advisor...

 
Aleksey Vyazmikin:

So you can cut it in Excel, right?

There's no need for Excel. One line, that's it.

Read Robert I. Kabakov R in Action. Data Analysis and Visualization in the R Language. There is some on the Internet.

 
Yuriy Asaulenko:

There's no need for Excel. One line and that's it.

Read - Robert I. Kabakov R in Action. Data Analysis and Visualization in the R Language. It's on the web.

I don't have a goal to learn how to program in R, I need the possibility to check predictors and convert rule set to MT5. In general, if it's one line, why not just write it to you? I have coped with available means.

 
SanSanych Fomenko:


Your rattle picture is incomplete. At the very least, you should go to the adjacent evaluate tab and see the results there.

But the most important thing is to divide the source file into two parts with different names (most likely you will have to do it in R).

On the first file you build all six models and look at their estimates test, validate. Then the name of the second file you enter in the R Dataset field. And on it you get marks again. All the evaluations must be approximately the same!

If these estimates do not coincide, and the results of models on the second file are much worse, then this means that your models are over-trained and the reason for over-training is the presence of noise (not related to the target variable) predictors.


This is the moment of truth: either you have a set of predictors relevant to a particular target variable or you don't. And no model can fix this sad circumstance. Then begins the stupid work of selecting a pair of "target-predictors", models are not interesting at all, find a pair, then the models are just seeds in R, you will find a dozen of them in a day and make ensembles of them.

I'll ask you to explain in more detail.

1. In the tab "Evaluate" what parameter to choose in set "Type"?

2. What should I do to make it possible to enter the name of a second file? At me this window though is active, but it is impossible to choose a file! And "CSV File" - why can I choose there?


 
Aleksey Vyazmikin:

That's the thing, different variants give different results, and if this is the case, then the conclusion about the quality of predictors how? Does it simply mean that we should take the average variant, i.e. if it's not bad everywhere, then it's okay, and then tune up the network/tree/forest?

And still, I want to know how to transfer the logic of the tree to the Expert Advisor...

There is a random forest in mt5, you can adapt it to your needs. Watch for errors and give it test examples, watch the results. The only thing is that it does not give out important variables.

But if you consider that the algorithms are the same everywhere, you may visualize and experiment in R, then train it and use in MT.

 
Aleksey Vyazmikin:

Anyway, if it's one line, why not just write it to you? Well, in the meantime, I've managed with the means at hand.

Well, I didn't ask, you did.) I don't need it.)

And if you work with R, you'll have to.

 
Aleksey Vyazmikin:

I will ask you to explain in more detail.

1. In the "Evaluate" tab, which parameter to choose in the "Type" set?

2. What to do, that it would be possible to enter the name of the second file? At me this window though is active, but it is impossible to choose a file! And "CSV File" - why can I choose there?


Any. This is a different evaluation of the model, and each has a different meaning. rattle is good precisely because it gives the beginner a systematic knowledge of machine learning: preparation of initial data, modeling, estimating the model. Only having mastered at least on a primitive level all THREE parts, it makes sense to move on from playing with numbers to more meaningful things.

2. R Dataset is an r file. Means that on the Data tab, the raw data was loaded as RData File- this is the workspace in terms of R. In this workspace two data frames were prepared: one for training and testing the model, and the other as this very R Dataset.

The easiest way for you to do this is to load a ready-made Excel file, download the log, exit to R, and split the data frame you get there into two.

You can do it in another way.

Open R itself and there load the excel file - this is one line. Then we split the data frame into two.

But you MUST do a run on the second file of the trained model.

 
Maxim Dmitrievsky:

Why? mt5 has a random forest, you can adapt it to your needs

Or, if you consider that the algorithms are the same everywhere, you can visualize and experiment in R, and then train and use in MT

I can see everything visually in the program "Deductor Studio" - there is no such thing in R, and there will never be any in MT5 (i.e. you can do everything, but you will have to pay a lot for it...). So, it turns out that I should use the algib library for including of random forests in MT5?

I found a piece of code ofC4 .5 algorithm http://datascientist.one/algorithm-c4-5/ in R, is it very difficult to implement in MT5?

C45 <- function(data,x){
   result.tree <- NULL
   if ( IsEmpty(data) ) {
                node.value <- "Failure"
                result.tree <- CreateNode(node.value)
                return(result.tree)
        }
         if( IsEmpty(x) ){
                node.value <- GetMajorityClassValue(data,x)
                result.tree <- CreateNode(node.value)
                return(result.tree)
        }
         if( 1 == GetCount(x) ){
                node.value <- GetClassValue(x)
                result.tree <- CreateNode(node.value)
                return(result.tree)
        } <br>

       gain.ratio <- GetGainRatio(data,x)<br>
     best.split <- GetBestSplit(data,x,gain.ratio)

     data.subsets <- SplitData(data,best.split)
     values <- GetAttributeValues(data.subsets,best.split)
     values.count <- GetCount(values)

     node.value <- best.split
    result.tree <- CreateNode(node.value)
         idx <- 0
    while( idx<=values.count ){
        i       dx <- idx+1
                newdata <- GetAt(data.subsets,idx)
                value <- GetAt(values,idx)
                new.x <- RemoveAttribute(x,best.split)
                new.child <- C45(newdata,new.x)
                AddChildNode(result.tree,new.child,value)
   }

    result.tree
        }
Алгоритм C4.5
Алгоритм C4.5
  • 2016.05.06
  • datascientist.one
Алгоритм C4.5 строит классификатор в форме дерева решений. Чтобы сделать это, ему нужно передать набор уже классифицированных данных. А что такое классификатор? Классификатор – это инструмент, применяемый в data mining, который использует классифицированные данные и на их основании пытается предсказать, к какому классу стоит отнести новые...