Machine learning in trading: theory, models, practice and algo-trading - page 2803

 
Vladimir Perervenko #:

The error says that undefined values(NA) have appeared in the correlation matrix and the findCorrelation function cannot use it. Open the package and read the function description.

The scripts are messy and there are a lot of unnecessary intermediate results. below is the corrected script

Explanation in order:

1. You don't need to load the "caret" package into the global scope. It is very heavy, pulling a lot of dependencies and data. You only need one function of it. You import it directly into the get.findCor function.

The tidyft package is a very fast dataframe manipulation package. Use it.

Thanks!

Strange, where can NA come from - it's skips in this case, isn't it?

I can't say anything about the script code - it's not mine, I'm just a user here.

I did not understand where to get this package " tidyft " - it is not in the list, I understand that from the github must be downloaded, but what to download there did not understand.

 
Aleksey Vyazmikin #:

Well, I attached the logs in that post - I couldn't make sense of the error either.

I have never seen such errors for as long as I have been working with Rca, so the questions are for you and what you have installed there.

Aleksey Vyazmikin #:

Because that's how R works, then the script needs an older version, then a newer one - very inconvenient - there is no normal backward compatibility even.

I've worked with Rca for a long time, I've never seen such problems, so the questions to you and what you have installed there.

Aleksey Vyazmikin #:

Thank you. But where to specify the path with files?

replace file.choose() with path, but I find it more convenient.

Aleksey Vyazmikin #:

It's not clear here, besides, the trick of it was in the presence of a loop.

I don't understand the thing with the loop, do you need to throw out correlated features from the set or what? If yes, the script does it.

Aleksey Vyazmikin #:

I don't understand, where to get this package " tidyft " - it is not in the list, I understand that you need to download it from github, but I don't understand what to download there.

Oops...fuck...

....

...

Well, read at least the first 50 lines of the Rca manual, why are you so stupid? And blame all your failures on Rca, pseudo-compatibility, etc.... It's annoying how long it's been and you don't know the basic things..

What list? Where did you look? Did you look at all?

or

install.packages("tidyft")

Well, elementary things...

 
mytarmailS #:

I have never seen such errors for as long as I have been working with Rka, so the questions to you and what you have installed there.

I've never seen such problems, so the questions to you and what you have installed there.

replace file.choose() with path, but I think it's more convenient that way.

I don't understand the loop thing, do you need to throw out correlated features from the set or what? If yes, the script does it.

And that other packages can affect the packages that are specified in the script?

Everything I've tried is what has been published here.

What's not clear with the loop is that you have to throw with different coefficient - the number of throwing depends on it directly.

mytarmailS #:

Oops...yikes...

....

...

Well, read at least the first 50 lines of the Rca manual, why are you so stupid? and blame all your failures on Rca, pseudo-compatibility, etc.... It's annoying how long it's been and you don't know the basics....

What list? Where did you look? Did you look at all?

or

Well, it's elementary, isn't it?

Obviously, I tried to find and install the package - it is not in the list, but in the log.

Warning in install.packages :
  unable to access index for repository https://cran.rstudio.com/src/contrib:
  cannot open URL 'https://cran.rstudio.com/src/contrib/PACKAGES'
Warning in install.packages :
  package ‘tidyft’ is not available (for R version 4.0.5)
Warning in install.packages :
  unable to access index for repository https://cran.rstudio.com/bin/windows/contrib/4.0:
  cannot open URL 'https://cran.rstudio.com/bin/windows/contrib/4.0/PACKAGES'
> 

Just apparently I need to change to some version again - I already have 4 of them installed - it's not convenient.

 
In general it turned out that it was necessary to change the source of repository (or whatever it is called) - put China and the installation started. Apparently sanctions from other countries....
 
Aleksey Vyazmikin #:

What is not clear with the cycle - it is necessary to throw out with different coefficient - the number of throwing out depends on it directly.

There's a coefficient you can set.

let someone else answer all the other questions, I don't know, maybe I don't understand something.... I'll tell you one thing, I use Rku every day for many hours, several years, I have 3.6.3 and for the last year I have never !!!! switched between versions, you run Rku 3 times a year and you have 4 versions, and you don't feel comfortable there, incompatibilities, something else.... I don't know what's wrong, but I think the problem is not with Rca...

 
Vladimir Perervenko #:

For control, I tested on my kit using this script. Result:

I got some errors again :(

R version 4.0.5 (2021-03-31) -- "Shake and Throw"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Workspace loaded from F:/FX/R/.RData]

Loading required package: Matrix
Error: package or namespace load failed for ‘Matrix’ in .getGeneric(f, where, package):
 reached elapsed time limit
> source('~/.active-rstudio-document', encoding = 'UTF-8', echo=TRUE)

> #=====================================================================
> install.packages(c("tidyft"), dependencies=TRUE)
Installing package into ‘C:/Users/S_V_A/Documents/R/win-library/4.0’
(as ‘lib’ is unspecified)
trying URL 'https://mirrors.tuna.tsinghua.edu.cn/CRAN/bin/windows/contrib/4.0/tidyft_0.4.5.zip'
Content type 'application/zip' length 304623 bytes (297 KB)
downloaded 297 KB

package ‘tidyft’ successfully unpacked and MD5 sums checked

The downloaded binary packages are in
        C:\Users\S_V_A\AppData\Local\Temp\RtmpYZ5ExE\downloaded_packages

> require(tidyft)
Загрузка требуемого пакета: tidyft
Error: package or namespace load failed for ‘tidyft’:
 .onLoad не удалось в loadNamespace() для 'fstcore', подробности:
  вызов: setnrofthreads(logical_cores)
  ошибка: function 'Rcpp_precious_remove' not provided by package 'Rcpp'

> #--get df1------------------------------------------------------------
> way <-         "D:\\FX\\MT5_CB\\MQL5\\Files\\Po_Vektoru_TP_0_SL_0\\EURUSD_0 ..." ... [TRUNCATED] 

> df1 = read.csv(paste0(way, "train.csv"), header = TRUE, sep = ";",dec = ".")

> #df1 = fread(paste0(way, "train1.csv"))
> #fst::write_fst(df1, "train1.fst")
> #-----archiv--------------------------------
> ft <- as_fst(df1) #
Error in as_fst(df1) : could not find function "as_fst"
 
mytarmailS #:

There's a coefficient you can set.

let someone else answer all other questions, I don't know, maybe I don't understand something.... I'll tell you one thing, I use Rku every day for many hours, several years, I have 3.6.3 and for the last year I have never !!!! switched between versions, you run Rku 3 times a year and you have 4 versions, and you don't feel comfortable there, incompatibilities, something else.... I don't know what's wrong, but I think the problem is not with Rca...

You can, but I am in favour of automating the process - run and do another thing or run and wait until each step is over, that would be a manual factor to twist.

I have version 3.5 for the working script, all the others changed just under the code of the inhabitants here. New versions do not work with old code (packages).

 
Aleksey Vyazmikin #:

You can, but I'm in favour of automating the process - run and do another thing or run and wait until each stage is over, that would be a manual factor to twist.

I have version 3.5 for the working script, all the others I changed just under the code of local inhabitants. New versions do not work with old code (packages).

So what's the point of this search?

to filter out features with correlation greater than 0.9.

to filter out features with a correlation greater than 0.8.

sift out features with a correlation greater than 0.7.

To screen out traits with a correlation greater than 0.6.

....

..

I don't understand what the point of this is, just sift once and that's it.

========================================

Besides, it's well known that wooden people don't give a damn about correlated features.

Take, train the model, select important features from the model and don't worry...

you don't do nonsense, you don't waste your time and other people's time.

 
Vladimir Perervenko #:


I had to put more packages

#install.packages(c("tidyft"),  dependencies=TRUE)
#install.packages(c("Rcpp"),  dependencies=TRUE)
#install.packages(c("import"),  dependencies=TRUE)

The script went into a long thought, having taken 6 gigabytes of memory - the sampling itself is within a gigabyte - seems to be excessive consumption.

Still waiting.

 
mytarmailS #:

So what's the point of this overkill?

To filter out traits with correlations greater than 0.9.

to screen out traits with a correlation greater than 0.8.

screen out features with a correlation greater than 0.7

screen out features with a correlation greater than 0.6

....

..

I don't see what the point is, you just screen it once and you're done.

What do you mean "once and all" - there are a lot of samples, so a systematic approach is required. If it will be useful, I will do it in MQL5, so that it would work out of the box, and hopefully faster.

mytarmailS #:

========================================

Besides, it is known that wood doesn't care about correlated signs.

Take, train the model, select important features from the model and don't worry....

you don't do nonsense, you don't waste your time and other people's time.

CatBoost chooses randomly the number of predictors at each iteration of splitting or tree building - depends on settings, and it means that strongly correlated predictors have more chance to get into random, i.e. not at them, but at the information they carry.

I'm doing it now, also for the forum thread, to see if it makes sense for that sample.

At least, I expect that this approach will allow to make models more diverse, which will allow to describe (Recall will be more) more situations in the sample and further use the package of models.