Machine learning in trading: theory, models, practice and algo-trading - page 2960

 
mytarmailS #:

just like with a normal MO, we train and look at the track with one eye and the test with the other.

Well, we get a random markup, which is then tested on new data. It is necessary to select the best markup variants through the same genetics. But this means hundreds and thousands of retraining of models.

And in my last article about the metamodel I reduced the number of iterations to a minimum. After 10 iterations you can get the correct trades, but they are usually few, the rest are filtered out. You can choose intermediate variants, where there are more deals, but the result is worse.
 
mytarmailS #:

Here's the code for teaching the Rendom Forrest with AO tools,

fitness function (OUR OBJECTIVE) - to find a beautiful/stable profit growth, namely the maximum correlation between the balance sheet dynamics and a straight upward growth line


Here is the code of the profit and FF calculation functions



Here is the result, AO has found such a target for AMO that if we trade its signals we will get a beautiful profit growth


If it's not too much trouble, can you give me this example in python?

 
Does anyone know what the MO usually does with non NAN numbers?
Rejects the dataset or skips those lines and doesn't factor them into the calculations?

And the same with + - INFINITY numbers?
 
Maxim Dmitrievsky #:
There is no way to trade arbitrage there, if they are not initially cointegrated. You can only find pieces by time, where there are such moments. And what's the point of distorting symbol quotes, you can't open trades on them

I'm just showing by example that not everything can be expressed through a ready-made target.

Maxim Dmitrievsky #:
Well, you get a random markup, which is then checked on new data. It is necessary to select the best markup variants through the same genetics. But this means hundreds and thousands of retraining of models.

So this is exactly what is being done, I did not want to complicate the example, I want a minimum of code, maximum clarity.

Elvin Nasirov #:

If it is not difficult, can you give this example in python?

as soon as I learn to write in python, I will translate it at once))))

Forester #:
Does anyone know what MO usually does with non NAN numbers?
Rejects the dataset or skips these strings and doesn't take them into account in calculations?

And the same with + - INFINITY numbers?

That's a strange question, it all depends on the specific implementation of AMO

 
Elvin Nasirov #:

If it's not too much trouble, can you give this example in python?

choose some good python library with AO, learn how to work with it and you will understand at once what you need to do.

 
Forester #:
Does anyone know what MO usually does with non NAN numbers?
Rejects the dataset or skips these strings and doesn't take them into account in calculations?

And the same with + - INFINITY numbers?

R usually has a na.action argument that defines what to do with them. I have always tried to avoid the need to use it (when preparing data), so I don't really understand the correct way.

 
Aleksey Nikolayev #:

R usually has a na.action argument that defines what to do with them. I have always tried to avoid the need to use it (when preparing data), so I don't really understand the correct way.

Thanks!!! Have read and considered other people's experience of working with this.
I think it's better to drop a column if it has a NAN in it.
In my case, just 1 column contained several hundred NANs and INFs. Something went wrong with the creation of the fiche.

Throwing out rows, I think, is wrong, because they can be used in other fiches to the benefit of the overall result.

 
Forester #:

Thanks! Have read and considered other people's experiences working with this.
I think it's better to drop a column if it has NANs in it.
In my case, just 1 column contained several hundred NANs and INFs. Something screwed up with the creation of the fiche.

Throwing out rows, I think, is wrong, as they can be used in other fiches to the benefit of the overall result.

You can interpolate or replace with a mean
Are you writing in R?
 
mytarmailS #:
You can interpolate or substitute with the mean

Substitution by the mean was used in statistics when there was simply no data, then the mean was substituted. They used NAN as a lack or omission of data - they needed to mark this moment somehow - they decided to use NAN for this purpose with subsequent replacement by the mean.

I have NAN - there is an error in data preparation and I get for example after /0 (but sometimes I get + - INF). I don't need to consider erroneous data as normal or even average.
Errors should be corrected (I print out that the column contains NAN and is missing). Although who reads these printouts...? )))

 
Forester #:

Substitution by the mean was used in statistics when there was simply no data, then the mean was substituted. They used NAN as a lack or omission of data - they needed to mark this moment somehow - they decided to use NAN for this purpose with subsequent replacement by the mean.

I have NAN - there is an error in the preparation of data and I get for example after /0 (but sometimes I get + - INF). I don't need to consider erroneous data as normal or even average.
Errors should be corrected (I print out that the column contains NAN and is missing). Although who reads these printouts...? )))

Well, then there's nothing to ask, what can you do but throw it away?


Just in case, an example of replacing NANs, since I've already written an example.

m <- round(matrix(rnorm(100),ncol = 5,nrow = 10),2)
m[ sample(1:nrow(m),5,replace = T) , sample(1:ncol(m),5,replace = T) ] <- NaN
m

[,1]  [,2]  [,3]  [,4]  [,5]
 [1,] -1.17 -0.10 -0.22 -1.49 -1.23
 [2,]   NaN   NaN  0.85   NaN -2.13
 [3,]  0.60  0.06  1.50 -0.31  0.05
 [4,]   NaN   NaN -0.41   NaN -0.43
 [5,]  1.17  0.86 -0.51  1.43 -0.07
 [6,] -0.44  0.79 -0.61  0.68  0.11
 [7,]  0.85  0.74  0.31 -1.16 -0.38
 [8,]   NaN   NaN  1.09   NaN -0.36
 [9,]   NaN   NaN -0.58   NaN -1.27
[10,] -0.19 -0.42  0.07  0.31  1.92

and solution

library(imputeTS)
m2 <- round(apply(m,2,na_ma),2)
m2

 [,1]  [,2]  [,3]  [,4]  [,5]
 [1,] -1.17 -0.10 -0.22 -1.49 -1.23
 [2,] -0.14  0.12  0.85 -0.57 -2.13
 [3,]  0.60  0.06  1.50 -0.31  0.05
 [4,]  0.49  0.49 -0.41  0.27 -0.43
 [5,]  1.17  0.86 -0.51  1.43 -0.07
 [6,] -0.44  0.79 -0.61  0.68  0.11
 [7,]  0.85  0.74  0.31 -1.16 -0.38
 [8,]  0.37  0.51  1.09 -0.14 -0.36
 [9,]  0.14  0.14 -0.58  0.04 -1.27
[10,] -0.19 -0.42  0.07  0.31  1.92
Reason: