Machine learning in trading: theory, models, practice and algo-trading - page 3533

 
Aleksey Vyazmikin #:
in the figure, the probability of choosing a stable quantum segment at each iteration .
Yes. At each iteration of what?
 
Maxim Dmitrievsky #:
Yes. On each iteration of what?
Forester #:

What are the iterations of what? Hardly training additional trees. Some kind of loop of your own. What does it do?

Here described this method - insert below - iterations on it are shown there.

Forum on trading, automated trading systems and testing trading strategies.

Machine learning in trading: theory, models, practice and algo-trading.

Aleksey Vyazmikin, 2024.05.05 08:24

Well, different approaches, in my method there is a distillation of data, especially relevant in case of strong class imbalance. Along with the sifting process comes the exploration of the data at each iteration. Initially, it is an exploratory method for selecting predictors and quant tables. An additional goal is to facilitate training other classifiers by removing examples in the sample over difficult to detect predictor ranges.

The process can be visualised as a tree like this


The tree has gone through 2 iterations and has come to a third. In the oval is the data that we will no longer explore further when building the current model - class "0".

And this is how I did 100 iterations and saw what probability to make a split that will be as effective in separating zeros from ones on new data.


 
Maxim Kuznetsov #:

small, nasty, annoying error for schedules, sessions and everything about real time: real number of minute bars per day != 1440 (more precisely not always equal)...also for 5 minutes and even for 15 minutes

it is necessary either to add the missed bars to 1440 or to add the "time" field to the data and calculations.

I am quite familiar with this (and a lot of other) peculiarities of quotes. In this code, skips are taken into account by the fact that (a) increments are counted by the open/close of one bar, not by neighbouring bars; (b) the number of a bar in a day is determined only by its time in the day and does not depend on the number of bars skipped in the day before it; (c) correlation is counted with the next available bar, which is not always the bar with the next daily number.

This approach simplifies things considerably and leaves the accuracy sufficient. If the "schedule of reversals" existed, it would certainly show itself in such an analysis, which would make sense to specify further. For now, it is obvious that the "schedule", if it exists, is not a fixed time of day for turning.

 
Aleksey Vyazmikin #:

Here was describing this method - below the insert - the iterations on it are shown there.

The figure shows leaves at tree depth = 2. Did you make the depth of all branches to 100. Or the number of leaves was brought to 100.
Is this a self-written tree or Cutbust?
 
Aleksey Vyazmikin #:

Here is an example of the first experiments, on the figure the probability of choosing a stable quantum segment at each iteration - separately for each class (I think, by the name of the curve it is clear), where in the name of the curve the letter D - there the described method was used - red curve.

In general, we can talk about a positive effect. Yes, it is not significant yet, but there are different variations on the implementation of the process. I am satisfied with the preliminary positive result.

It is very strange that on the first iteration/depth/sheet you have 80% probability of stable quantum (I understand it is good). And then due to some manipulations it drops to 30%. I don't understand what positive effect you saw, it's a deterioration.

 
Forester #:
In the picture you got leaves at tree depth = 2. Did you make the depth of all branches to 100. Or did you take the number of leaves to 100.

So far the maximum number of iterations is 300. I would like to clarify that the figure shows only the splits selected by the algorithm, there are no candidates, and the evaluation in the graph is done on both selected and candidate splits. Well and I would like to note that the algorithm in the form of a tree is presented for visual understanding of the process, in fact the tree as such is not built - graphs/branches are not calculated, each iteration is almost like starting the algorithm from the beginning, but without taking into account the retired rows in the sample at the previous iterations.

Forester #:
Is this a self-written tree or Cutbust?

Self-written algorithm.

Forester #:

It is very strange that on the first iteration/depth/leaf you have 80% probability of a stable quantum, (I take it this is good). And then due to some manipulation it drops to 30%. I don't understand what positive effect you saw, it's a deterioration.

I measured the positive effect through the average on each iteration, maybe I need to refine this measurement, but I haven't figured out how to do it better yet. Here's what the difference between the two graphs looks like.

At each iteration, there are additions and dropouts of selected quantum segments.

In the graphs below, you can see for a particular predictor at each iteration the percentage of quantum segments from among the sampled ones that retain a probability bias on the delayed samples. At -100 - none of the quantum cutoffs were effective on the new data.


 
Aleksey Vyazmikin #:

Here described this method - below the insert - the iterations on it are shown there.


You pull out the rules, it works. But in the form of attributes and labels. The more complex the rule, the more noise there is in it, the probability of assignment to a particular class decreases.
 
Maxim Dmitrievsky #:
You pull out the rules, you get it. But in the form of signs and labels. The more complex the rule, the more noise there is in it, the probability of attribution to a particular class decreases.

In general - yes - one of the challenges. So I'm looking for methods to keep the probability at an appropriate level.

 
Maxim Dmitrievsky #:
The more complex the rule, the more noise in it, the likelihood of being assigned to a particular class decreases.

In general, this statement is not quite confirmed - much depends on how to count. I.e. if you count all "rules" per iteration, then on the contrary there will be a slow growth from the depths.....

The important factor is not the complexity of the rule per se, but the small number of examples to estimate the probability bias. This is precisely the main manifestation of insufficient sample size.
 
Aleksey Vyazmikin #:

In general, this statement is not quite confirmed - much depends on how to count. I.e., if you count all the "rules" per iteration, then on the contrary, there will be a slow growth from the depths....

The important factor is not the complexity of the rule per se, but the small number of examples to estimate the probability bias. This is precisely the main manifestation of insufficient sample size.

One would need to be more immersed in the discipline of statistical learning to lead the discussion