Machine learning in trading: theory, models, practice and algo-trading - page 3041

СанСаныч Фоменко #:

It is not clear how to compare. Ideally, upSample due to duplication of identical data should lead to overtraining, which is not immediately detectable.

Why not? Train, test, validate and go.

mytarmailS #:

Why wouldn't you? Train, test, validate and go.

Too bad, I changed the avar.

СанСаныч Фоменко #:

Too bad, avar changed


Militarism has come to this cute, cuddly theme too
Maxim Dmitrievsky #:
Militarism has come to this cute, cuddly topic.

So is that a sniper he's got?


I am trying to linearise the space, or just translate a non-linear space into a more linear space. I'm interested in the HLLE algorithm.

looks pretty interesting. It seems to me that the AMO will be easier to recognise such a sketch than the price as it is.

Can anyone tell me why there is such a nasty distortion of colours in the animation when I upload it here?

So this is how the price transformed by the algorithm looks like.

who wants to play around

p <- cumsum(rnorm(400,sd = 0.01))+100
p <- stats::embed(p,dimension = 20)[,20:1]

emb <- embed(p, "HLLE", knn = 15)

pp <-[,20]
xx <- emb@data@data

par(mar=c(2,2,2,2), mfrow=c(1,2))

for(i in 1:nrow(xx)){

  plot(xx, t="p",lwd=2,pch=20)
Files:  6455 kb

Well, manifold learning has the same problems as pca.

you'll have a hard time fitting non-stationary series

Maxim Dmitrievsky #:

Well, I'm learning with the same problems as pca.

you'll have a hard time fitting non-stationary series

What's there to pick up? There's nothing to pick up, the current pattern is transformed to a different dimension and that's all.


made a nicer picture

p <- cumsum(rnorm(300,sd = 0.01))+100
n <- 10
p <- stats::embed(p,dimension = n)[,n:1]

emb <- embed(p, "HLLE", knn = 15)
pp <-[,n]
xx <- emb@data@data

gg <-,xx,pp)
p1 <- ggplot(gg, aes(x =time, y = pp, col=time)) +
  geom_point()  +
  scale_color_gradientn(colours = rainbow(4))
p2 <- ggplot(gg, aes(x = HLLE1, y = HLLE2, col=time)) +
  geom_point()  +
  scale_color_gradientn(colours = rainbow(4))
p1 + p2 + plot_layout(nrow = 2) 


Extracting a few "good" rules/strategies from the data...

Full step

1) data transformation and normalisation

2) model training

3) rule extraction

4) rule filtering

5) visualisation

ready code, just substitute your data.

close <- cumsum(rnorm(10000,sd = 0.00001))+100

sw <- embed(x = close,dimension = 10)[,10:1] #  make slide window data
X <- t(apply(sw,1,scale)) #  normalase data

dp <- c(diff(close),0) #  diff prices
Y <- as.factor( ifelse(dp>=0,1,-1) ) #  target for classification

tr <- 1:500
library(inTrees)  # ?inTrees::getRuleMetric()

rf <- RRF(x = X[tr,],y = Y[tr],ntree=100)
rule <- getRuleMetric(unique(extractRules(RF2List(rf),X[tr,])),X[tr,],Y[tr])
rule <- data.frame(rule,stringsAsFactors = F)
for(i in c(1,2,3,5)) rule[,i] <- as.numeric(rule[,i])
buy_rules <- rule$condition[ rule$pred==1 ]

plot(x = 1:1000,y = rep(NA,1000), ylim = c(-0.001,0.001)) 
for(i in 1:length(buy_rules)){
   cum_profit <- cumsum( dp[  eval(str2expression(buy_rules[i]))  ] )
for(i in 1:length(buy_rules)){
  cum_profit <- cumsum( dp[  eval(str2expression(buy_rules[i]))  ] )
  ccor <- cor(cum_profit, 1:length(cum_profit))
  if(ccor>=0.9)  lines(cum_profit,col=i,lwd=2)
abline(h = 0,col=2,lty=2)

The question is that, if you can find "working TCs" in random, what are the ways to prove that the TCs found on real data are not random?

Alexey is doing it here, I wonder if there is any statistical test for this kind of tasks?