Machine learning in trading: theory, models, practice and algo-trading - page 967

 
Ivan Negreshniy:

IMHO, you have to work in python first, where it is more or less debugged, otherwise there is a lot of uncertainty

Yeah, it looks like the dll is 32bit and mt5 is 64

i will come back later, meanwhile python is yes

 
Maxim Dmitrievsky:

I guess I can't do it myself, I'll have to use python after all :D

You should be awarded the title of "Outstanding Specialist in Creating Difficulties for Myself"! Exclusively from the movie: normal heroes always go the other way around.


Here's the proverbial rattle, just a few clicks for the xgboost model you mentioned.

We get it for training sampling:

Error matrix for the Extreme Boost model on Df1.num [**train**] (counts):

      Predicted
Actual    0    1 Error
     0 1930   90   4.5
     1   42 2152   1.9

Error matrix for the Extreme Boost model on Df1.num [**train**] (proportions):

      Predicted
Actual    0    1 Error
     0 45.8  2.1   4.5
     1  1.0 51.1   1.9

Overall error: 3.1%, Averaged class error: 3.2%

Rattle timestamp: 2018-05-31 11:21:20 user

For the validation sample.

Error matrix for the Extreme Boost model on Df1.num [validate] (counts):

      Predicted
Actual   0   1 Error
     0 306 119  28.0
     1 111 367  23.2

Error matrix for the Extreme Boost model on Df1.num [validate] (proportions):

      Predicted
Actual    0    1 Error
     0 33.9 13.2  28.0
     1 12.3 40.6  23.2

Overall error: 25.5%, Averaged class error: 25.6%

Rattle timestamp: 2018-05-31 11:22:15 user

For the test sample

Error matrix for the Extreme Boost model on Df1.num [test] (counts):

      Predicted
Actual   0   1 Error
     0 314 118  27.3
     1 112 360  23.7

Error matrix for the Extreme Boost model on Df1.num [test] (proportions):

      Predicted
Actual    0    1 Error
     0 34.7 13.1  27.3
     1 12.4 39.8  23.7

Overall error: 25.5%, Averaged class error: 25.5%

Rattle timestamp: 2018-05-31 11:22:50 user


If you are satisfied with the result, you can look at the code in R. Here's the challenge:

crs$ada <- xgboost(Long_Short ~ .,
  data              = crs$dataset[crs$train,c(crs$input, crs$target)],
  max_depth         = 6,
  eta               = 0.3, 
  num_parallel_tree = 1, 
  nthread           = 2, 
  nround            = 50,
  metrics           = 'error',
  objective         = 'binary:logistic')


In fact you can take all R code from rattle as a function and call this function from MT4/5 EA and see results in tester. Primitive dll, works long time ago and steadily, there are a lot of people who use it...

All this within an hour! NO PROBLEM WITH MODELS!

There are problems with target and corresponding target predictors or vice versa. But to solve this problem, you need to have a toolkit with extremely low labor to test variants.


PS.

If you go to rattle, you can build in one click a tree (rPart), randomForest, SVM, logistic regression glm, your favorite neural network, though the simplest nnet. And for a snack, a model of surviving (reaching exorbitant profits or achieving a deposit dump), if you are able to formulate the target and its predictors.


PPSS.

Since you have awakened love for xgboost and you can feed this love with acceptable preliminary calculations in rattle, here is the documentation:

Description of the package - https://cran.r-project.org/web/packages/xgboost/xgboost.pdf

Understand your dataset with Xgboost - https://cran.r-project.org/web/packages/xgboost/vignettes/discoverYourData.html

xgboost: eXtreme Gradient Boosting - https://cran.r-project.org/web/packages/xgboost/vignettes/xgboost.pdf

And for starters, the Xgboost presentation https://cran.r-project.org/web/packages/xgboost/vignettes/xgboostPresentation.html


Considering the level you show here, you have no problem with R at all.


Good luck with it.

 
SanSanych Fomenko:

Thanks, SanSanych... just didn't get it, does rattle have xgb? cool

in any case, i just need to attach this model instead of alglib scaffolding, for RL tasks

I don't need to research anything, just need a better thing with regularization and cross-validation... I don't know about R, but in python, for example, cross-validation is easily screwed to xgb fall away, too you can

and then in R there are no normal libraries for RL, all of them are in python, i.e. i have to put R over python again... so i still haven't decided what i need :)

 

We're talking about trees...

I do not know about one-bar targets, but when the targets are actually events whose occurrence will occur after N bars (I am specifically considering the trend trade, or other cases where a position is closed by SL/TP, after N bars or a fixed time period), and their outcome will be taken into account and classified, it is very important to pay attention not to the contingency table, but to

1. The frequency of changes in the classification results in the window of N bars

2. Grouping of rules on N bars (rule density)

In the first case we need an indicator to estimate the frequency of change of the predicted target, if it is high, then the model is unstable, although it can score a large percentage of correct solutions.

In the second case, we need to apply a rule per window (N bars) one repeated rule to estimate reinforcement and spread this rule over the model.

Thus, it is necessary to change estimates when training model quality, including scaffolding and other models that require self-analysis of their results to make corrections.

What do you think of these thoughts?

 

Once again I was convinced that R is not my thing :) syntax is almost not highlighted, the code is unreadable, errors are almost not highlighted. The code itself and the language are not aesthetically pleasing

here could be your counterarguments

Yes, you can train an algorithm in 3 lines instead of 5 in python, that's all. The readability in python would be better. I don't see any advantages for MO packages, it's all the same.

 
Maxim Dmitrievsky:

Once again I was convinced that R is not my thing :) syntax is almost not highlighted, the code is unreadable, errors are almost not highlighted. The code itself and the language are not aesthetically pleasing

here could be your counterarguments

Yes, you can train an algorithm in 3 lines instead of 5 in python, that's all. The readability in python would be better. I don't see any advantage in MO packages, everything is the same.

I'm in the throes of watching a video course in Russian on programming in R :) For example, the ability to declare a global variable in a function can really mess up the code, especially if the function is called more than once, then you'll stumble and look for bugs, but they won't be to the compiler.

What upsets me a lot in R is memory consumption - at the moment a 187 MB csv file is spread out in memory as 1.5 GB for a tree (while in Rattle 7.5 GB for forest management), multi-threading is implemented as separate processes which can't work with shared memory, as a result instead of loading 6 processor cores I can load only 4, being limited by 8 GB of available memory.

How does python work in this case?

 
Aleksey Vyazmikin:

I'm in the throes of watching a video course in Russian on programming in R :) For example the possibility to declare a global variable in function, can make a big mess of code, especially if the function is called more than once, then you'll stumble and look for errors, but it will not be to the compiler.

What upsets me a lot in R is memory consumption - at the moment a 187 MB csv file is spread out in memory as 1.5 GB for a tree (while in Rattle 7.5 GB for forest management), multi-threading is implemented as separate processes which can't work with shared memory, as a result instead of loading 6 processor cores I can load only 4, limiting myself by the available 8 GB.

How is python doing in this case?

I haven't looked at the memory since I've never used such big files :) But I heard that early versions of R had a problem with memory and its clearing

Python is naturally a more advanced language in all respects, as it is used for a wide range of tasks.

I also don't understand the funny thing about R plot visualization - it is miserable compared with python, Rstudio IDE is also a forest monster, how it is possible to support in 2018

i wrote 100 lines of code and got confused, everything blurred into one unreadable mess with unlit syntax :) so if you want to get a buzz out of it, use python in vscode or jupyter notebooks

Upd trees and forests themselves take up a lot of memory, depending on the size of the set and the number of trees and their depth. For example, my committee of 20 forests of 50 trees and a set of about 1000 examples takes ~40mb
 

Write nonsense about R: you don't know, you don't know how, and you don't want to.

1. The speed and ease of debugging code in R is amazing compared to languages that have the compiler as an interpreter advantage. There is very limited highlighting in this regard, as there is nothing to highlight - the code is almost immediately workable. Extremely high-capacity code. Writing "spoilery" is most likely the result of a lack of knowledge of R itself and the functionality from packages. And if it really happened that way, a good mauvais ton call for splitting into functions, OOP is present.

2. the use of global variables in all languages should be careful. In R their need is very doubtful, because the parameters of functions and return from them can be "object", while in R it is anything. Other than that, you can control the space to which variable names are attached.

2. Graphics is one of the best in the world - there is everything from the simplest plot to cartoons, several levels of graphics: from primitive to specialized blanks for statistics.

3. It is impossible to compare with python: they are approximately equal in prevalence, but python has a lot of "foreign" users, mainly site developers, while R is a system of statistics, our native, doubly native, as the MOE is included in the statistics. If we're talking about packages for us, we should compare them with other specialized packages (SAS...), but they are paid.

4. R is the algorithmic standard in statistics. Almost all modern publications necessarily contain R code.


Last. R is part of Microsoft, and python is a subdeluge, here on a neighboring thread very skilled in programming people could not agree on the source of the distribution. To me this is a verdict.


In programming, very often one chooses what is more convenient rather than what is more useful, functional, but watering down to make an extremely questionable choice - do not.

 
SanSanych Fomenko:

SanSanych, the only source of the distribution there is the python site :)

For statistics and machine learning extension IPython and anaconda. Go to the Russian-language opendatascience community or watch the videos from Yandex. They have never heard of R at all. So what should be considered a standard? Try python to form your own opinion and compare. Plus knowing python, as you said, will allow you to do not only statistics, but other things, if necessary.

It's also an interpreted language, but perfectly highlighted and checks syntax on the fly, not only after running the script, + code folding, notepads and a bunch of other goodies

 
Maxim Dmitrievsky:

I didn't look at the memory because I've never used such big files :) But I heard that early versions of R had a problem with memory and its clearing

Python is naturally a better language in all respects, as it is used for a wide range of tasks.

I also don't understand the funny thing about R plot visualization - it is miserable compared with python, Rstudio IDE is also a forest nonsense, how it is possible to support in 2018

I wrote 100 lines of code and got confused, everything blurred into one unreadable mess with unlit syntax :) in short, if you want to get a buzz out of it, use python in vscode or jupyter notebooks

Upd trees and forest themselves take a lot of memory, depending on the size of the set and the number of trees and their depth. For example, my committee of 20 forests of 50 trees and a set of about 1000 examples takes ~40mb

My feeling so far is that R is a cool calculator. What really kills me is the lack of a russian-language help on the main functions, well, it's very important to me because of my weakness in linguistics.

Visualization - there are some difficulties, for me the trees of large size don't fit for visualization properly, only conversion to PDF helps, which is good already.