Machine learning in trading: theory, models, practice and algo-trading - page 3436

 
Aleksey Vyazmikin #:

How volatility was measured - let it remain a mystery, but write how you estimated the result in clusters for their classification, otherwise it is not clear.

I wrote somewhere above, I think. Clustering into n clusters and training to trade on each cluster, ignoring the others. Evaluation through a regular tester.

There is also causal clustering, but I haven't got to it yet.
 
Maxim Dmitrievsky #:
I wrote somewhere above, I think. Clustering into n clusters and learning to trade on each cluster, ignoring the others. Evaluation through a regular tester.

There is also causal clustering, but I haven't got to it yet.

I see, so one model for each cluster. Don't you want to just exclude clusters that have a lot of negative examples?

 
Forester #:

And if we divide it into 27 clusters without trees? Will the result change?

Here's what I got.

The trend seems to be the same. It is still unclear by which criterion it is better to evaluate, this is how the response bias indicators ordered in ascending order look like - on the first graph and the distribution of the number of responses (examples in the sample) according to this ordering - on the second graph (sample train).


And this is the same, but for the tree


 

Has anyone tried to predict random prices by zigzag? I get some very optimistic results.

library(TTR)
library(zoo)
library(randomForest)
library(caret)

price <- rnorm(10000) |> cumsum()

X <- price |> rollapplyr(10, \(x) x-x[10])
Y <- price |> ZigZag(change = 0.1,percent = F) |> diff() |> sign() |> as.factor() |> tail(nrow(X)) 

tr <- 1:5000
ts <- 5001:nrow(X)

randomForest(Y[tr]~., X[tr,]) |> predict(X[ts,]) |> confusionMatrix(Y[ts])       

Confusion Matrix and Statistics

          Reference
Prediction   -1    1
        -1 2431   49
        1    43 2468
                                          
               Accuracy : 0.9816          
                 95% CI : (0.9774, 0.9851)
    No Information Rate : 0.5043          
    P-Value [Acc > NIR] : <2e-16          
                                          
                  Kappa : 0.9631          
                                          
 Mcnemar's Test P-Value : 0.6022          
                                          
            Sensitivity : 0.9826          
            Specificity : 0.9805          
         Pos Pred Value : 0.9802          
         Neg Pred Value : 0.9829          
             Prevalence : 0.4957          
         Detection Rate : 0.4871          
   Detection Prevalence : 0.4969          
      Balanced Accuracy : 0.9816          
                                          
       'Positive' Class : -1    
 
mytarmailS #:

Has anyone tried to predict random prices by zigzag? I get some very optimistic results.


The zigzag looks into the future.

 
Maxim Dmitrievsky #:

the zigzag looks into the future.

It looks into the past like any target with a forecast, but in the test sample the model does not see the zigzag, it predicts it.
 
mytarmailS #:
In the past peeks as any target with a prediction, but in the test sample the model does not see the zigzag, it predicts it.

So there's a peek somewhere.

 
Aleksey Vyazmikin #:

Here's what I got

This is kind of how the trend continues. It is still unclear by which criterion it is better to evaluate, this is how the response bias indicators ordered in ascending order look like - on the first graph and the distribution of the number of responses (examples in the sample) according to this ordering - on the second graph (sample train).


And this is the same, but for the tree


Did you use Alglib's clustering?
It looks good in general, only 2 clusters of 20% spoil the picture.

 
Forester #:

Did you use Alglibov clustering?
It looks good in general, only 2 clusters of 20% spoil the picture.

It was in python, I transferred my tree algorithm and decided to test it. I think the clustering algorithms are the same, but I haven't made statistics calculation for MQL5 yet.

Of course, K-Means is partly about random, but as a tool to throw out part of the sample before training, it seems to be an interesting approach.

Previously I still had time to try to do a lot of clustering without a tree, and then train on those clusters - the effect was to reduce the balance spread over 100 models.

Same sample as here are the results for it.

 
Maxim Dmitrievsky #:

So somewhere there's a peeping tom

No, check your own code, it's only 3 lines.
Reason: