Machine learning in trading: theory, models, practice and algo-trading - page 3041

 
СанСаныч Фоменко #:

It is not clear how to compare. Ideally, upSample due to duplication of identical data should lead to overtraining, which is not immediately detectable.

Why not? Train, test, validate and go.

 
mytarmailS #:

Why wouldn't you? Train, test, validate and go.

Too bad, I changed the avar.

 
СанСаныч Фоменко #:

Too bad, avar changed

Why?

 
Militarism has come to this cute, cuddly theme too
 
Maxim Dmitrievsky #:
Militarism has come to this cute, cuddly topic.

So is that a sniper he's got?

 

I am trying to linearise the space, or just translate a non-linear space into a more linear space. I'm interested in the HLLE algorithm.


https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction


looks pretty interesting. It seems to me that the AMO will be easier to recognise such a sketch than the price as it is.

Can anyone tell me why there is such a nasty distortion of colours in the animation when I upload it here?


So this is how the price transformed by the algorithm looks like.


who wants to play around

p <- cumsum(rnorm(400,sd = 0.01))+100
p <- stats::embed(p,dimension = 20)[,20:1]
plot(p[,20],t="l",pch=20)

library(dimRed)
emb <- embed(p, "HLLE", knn = 15)

pp <- emb@org.data[,20]
xx <- emb@data@data

par(mar=c(2,2,2,2), mfrow=c(1,2))
plot(pp,t="l",pch=20)
plot(xx,t="p",pch=20)

for(i in 1:nrow(xx)){
  Sys.sleep(0.05)

  plot(pp,t="l",pch=20)
  points(i,pp[i],col=2,lwd=6)
  plot(xx, t="p",lwd=2,pch=20)
  points(xx[i,1],xx[i,2],col=2,lwd=6)
}
Files:
anigif.zip  6455 kb
 

Well, manifold learning has the same problems as pca.

you'll have a hard time fitting non-stationary series

 
Maxim Dmitrievsky #:

Well, I'm learning with the same problems as pca.

you'll have a hard time fitting non-stationary series

What's there to pick up? There's nothing to pick up, the current pattern is transformed to a different dimension and that's all.

 

made a nicer picture

p <- cumsum(rnorm(300,sd = 0.01))+100
n <- 10
p <- stats::embed(p,dimension = n)[,n:1]

library(dimRed)
emb <- embed(p, "HLLE", knn = 15)
pp <- emb@org.data[,n]
xx <- emb@data@data


gg <- cbind.data.frame(time=1:length(pp),xx,pp)
library(patchwork)
library(ggplot2)
p1 <- ggplot(gg, aes(x =time, y = pp, col=time)) +
  geom_point()  +
  scale_color_gradientn(colours = rainbow(4))
p2 <- ggplot(gg, aes(x = HLLE1, y = HLLE2, col=time)) +
  geom_point()  +
  scale_color_gradientn(colours = rainbow(4))
p1 + p2 + plot_layout(nrow = 2) 


 

Extracting a few "good" rules/strategies from the data...

Full step

1) data transformation and normalisation

2) model training

3) rule extraction

4) rule filtering

5) visualisation

ready code, just substitute your data.

close <- cumsum(rnorm(10000,sd = 0.00001))+100
par(mar=c(2,2,2,2))
plot(close,t="l")

sw <- embed(x = close,dimension = 10)[,10:1] #  make slide window data
X <- t(apply(sw,1,scale)) #  normalase data

dp <- c(diff(close),0) #  diff prices
Y <- as.factor( ifelse(dp>=0,1,-1) ) #  target for classification

tr <- 1:500
library(inTrees)  # ?inTrees::getRuleMetric()
library(RRF)

rf <- RRF(x = X[tr,],y = Y[tr],ntree=100)
rule <- getRuleMetric(unique(extractRules(RF2List(rf),X[tr,])),X[tr,],Y[tr])
rule <- data.frame(rule,stringsAsFactors = F)
for(i in c(1,2,3,5)) rule[,i] <- as.numeric(rule[,i])
buy_rules <- rule$condition[ rule$pred==1 ]

plot(x = 1:1000,y = rep(NA,1000), ylim = c(-0.001,0.001)) 
for(i in 1:length(buy_rules)){
   cum_profit <- cumsum( dp[  eval(str2expression(buy_rules[i]))  ] )
   lines(cum_profit,col=8,lwd=1)
}
for(i in 1:length(buy_rules)){
  cum_profit <- cumsum( dp[  eval(str2expression(buy_rules[i]))  ] )
  ccor <- cor(cum_profit, 1:length(cum_profit))
  if(ccor>=0.9)  lines(cum_profit,col=i,lwd=2)
}
abline(h = 0,col=2,lty=2)



The question is that, if you can find "working TCs" in random, what are the ways to prove that the TCs found on real data are not random?

Alexey is doing it here, I wonder if there is any statistical test for this kind of tasks?

Reason: