Machine learning in trading: theory, models, practice and algo-trading - page 2642

 
mytarmailS #:
What's up?
I haven't done it yet. I've got a complicated logic, I need to figure out where to put it in.
 

that's the kind of sign I got. Correlate because the base is increments of close orders of magnitude


example formula: price - MA(n) * std(n) * coef, where MA and std - moving average and standard deviation of arbitrary order and levelling coefficient, the larger - the more stationary the series. In this case, it's 50000.

for some reason my MO shows stability better than just on increments

with coef 20.

It turns out to be something similar to fractional difference, but it counts instantly.

Maybe someone can think of other options

 
Maxim Dmitrievsky #:

that's the kind of sign I got. Correlated, because the base is increments of close orders of magnitude

What are these curves in general?

Maxim Dmitrievsky #:

maybe someone will come up with other options

Here we go, symbolic regression to the rescue

 
mytarmailS #:

What's with the curveballs anyway?

Well, symbolic regression to the rescue.

The formula is written.
Suggest a variant of how to bring the quotes closer to the stationary series by using cf
 
Maxim Dmitrievsky #:
The formula is written
Suggest a variant of how to bring the quotes closer to the stationary series by means of cf

I'll throw something up, I'll show you a simpler example without SR.

 
mytarmailS #:

I'll just throw something together and show you a simpler example without SR.

You can't do here with just one variant, you need to construct attributes and check and train on them.

But the signs are not stupid, but at least somehow meaningful, otherwise you can go on forever.
 
Maxim Dmitrievsky #:

With SR it takes more time to code and plan, so for simplicity, speed and clarity I made it simple...

Instead of creating a formula in real time, I create a "formula result" - a curve and then use it as a target for the model.


I create a fitness function that maximises the correlation between the price and the model output, but the model output has a limitation: it can only be between -1 and 1.

That is, we get a series that should correlate with the price, but "clamped" within the limits of statsionary values. If you need real statsionarity according to Dickie Fuller and so on, you just change the fitness function to what you need.



create data and train the model with genetics

par(mar=c(2,2,2,2))
#  для простоты Создаю цену 
P <- cumsum(rnorm(300))
plot(P,t="l")

hankel <- function(x,n) embed(x, n)[ ,n:1]
#  Создаю данные для обучения Х ,скользящее окно виде матрицы
X <- t(apply(hankel(P,11),1,function(x) cumsum(diff(x))))
P <- tail(P,nrow(X))

#  Делаю разметку индексов для трейн и тест и валидации
tr <- 1:100
ts <- 1:200
al <- 1:nrow(X)

library(randomForest)
#  Создаю фитнес сункцию , подбираем генетикой для фореста такой таргет чтобы 
#  на выходе был максимально коррелированый с ценой ряд
fit <- function(Y){
set.seed(123)
rf <- predict(  randomForest(Y~.,X[tr,],ntree=100) ,   X[ts,])
return( cor(rf, P[ts]) )}

library(GA)
GA <- ga(type = "real-valued", 
         fitness =  fit,
         lower = rep(-1,100), 
         upper = rep(1,100), 
         popSize = 100,
         maxiter = 100,
         run = 40)
plot(GA)
GA_Y <- tail(GA@solution,1)[1,]

test the model.

#  Получаем нашу модель которая делает то что нужно
set.seed(123)
rf <- predict(  randomForest(GA_Y~.,X[tr,],ntree=100) ,   X[al,])

layout(1:2)
plot(P,t="l",main="original price") ; abline(v=c(100,200),lty=2,col=c(3,4))
plot(rf,t="l",main="model out") ; abline(v=c(100,200),lty=2,col=c(3,4))
abline(h=0,col=3,lty=3)

layout(1:2)
plot(P,t="l",main="original price") ; abline(v=c(100,200),lty=2,col=c(3,4))
plot(cumsum(rf),t="l",main="model out cumsum") ; abline(v=c(100,200),lty=2,col=c(3,4))

The vertical lines are separation of train, test, validation.


As you can see in the picture, the model has learnt to take the price as input, and the output is a statistical series that correlates with the price.

For better clarity we can make a cumulative sum of the model output.


like this )))) And you don't need to invent anything, everything can be done automatically.

 
mytarmailS #:

With SR you need more time for code and planning, so for simplicity, speed and clarity I made it simpler.

Instead of creating a formula in real time, I create a "formula result" - a curve, and then use it as a target for the model.


I create a fitness function that maximises the correlation between price and model output, but the model output has a limitation: it can only be between -1 and 1.

That is, we get a series that should correlate with the price, but "clamped" within the limits of statsionary values. If we need the real statsionarity according to Dickie Fuller and so on, we simply change the fitness function to what we need.



create data and train the model with genetics

validate the model

Vertical lines are separation of train, test, validation.


As you can see in the picture, the model has learnt to take the price as input, and the output is a statsyonary series that correlates with the price

For better clarity we can make a cumulative sum from the model output


like this )))) And you don't have to think of anything, everything can be done on the machine

Interesting, I'll try to think about it later, it's a bloody mary today, it's hard to think.
 
Maxim Dmitrievsky #:
Interesting, I'll try to think about it later, we're having a bloody mary today, it's hard to think.

I wonder how many lines it would take in python.....

probably thousands in µl))))))))))))))))))))))))))))

 
mytarmailS #:

I wonder how many lines it would take in python.....

in µl, probably thousands))))

Yeah about the same amount, a little more
It's probably a matter of overfit, it will show different curves on new data