Machine learning in trading: theory, models, practice and algo-trading - page 3176

 
Aleksey Vyazmikin #:

What significance tests do you propose? I am not saying that the algorithm of quantum segments selection is perfect, on the contrary - a lot of rubbish gets in and I want to improve it.

I do not understand, on what signs you decided that it is some kind of " pi-hacking" - and what part exactly, selection of quantum segments or screening of strings, which are well and without training screened out by quantum segments (i.e. graphs that I built)? Yes, the method is a bit different from the common approach to building wooden models, but not really much, the concept remains.

Regarding the example on SB, there are two considerations:

1. If the process is unknown, and there is only data, then one can take it as a pattern that there is some best hour to trade. Or is there a consideration to reject this hypothesis?

2. If these observations were relatively evenly distributed over time (event history), then this is more like a random number generator error.

In training, I use samples over a long period of time - usually at least 10 years.

I can suggest modifying my experiment. Let there are ten boxes with numbers from 1 to 10, one hundred white balls and one hundred black balls (the numbers 10 and 100 are taken conditionally). The balls are somehow arranged in the boxes, then you look how many balls are in each box and try to understand if there is a regularity in the algorithm of arrangement - in the boxes with which numbers there is a predominance of balls of some colour.

So, if each ball (of both colours) is just put randomly and with equal probability 0.1 in one of the drawers, then in the end there will be no uniformity in the ratio of colours! Almost always there will be a box where almost all white and one where almost all black. And the matter is not at all in quality of DSP, you can take a real quantum DSP and everything will be the same. It's about the very nature of probabilistic randomness. There will always be irregularity, but the numbers of boxes where it will be found at the next layout are absolutely unpredictable. It is the same in the previous example with the hour of the week (the hour of the week is the analogue of the box number).

There are two ways to do this. Either try to show that the unevenness in practice is much greater than it would be under equal probability. This is done by some kind of statistical tests. Or just be sure that the non-uniformity, though small, is due to some regularity that is just weakly manifested due to noise. But that's a matter of faith and practice and if it works, ok.

I hope it's clear that the box numbers (hour of the week) are an analogy to your quanta.

 
Aleksey Nikolayev #:

I can suggest modifying my experiment. Let there are ten boxes with numbers from 1 to 10, one hundred white balls and one hundred black balls (numbers 10 and 100 are taken conventionally). The balls are somehow arranged in the boxes, then you look how many balls are in each box and try to understand if there is a regularity in the algorithm of arrangement - in the boxes with which numbers there is a predominance of balls of some colour.

So, if each ball (of both colours) is just put randomly and with equal probability 0.1 in one of the drawers, then in the end there will be no uniformity in the ratio of colours! Almost always there will be a box where almost all white and one where almost all black. And the matter is not at all in quality of DSP, you can take a real quantum DSP and everything will be the same. It's about the very nature of probabilistic randomness. There will always be irregularity, but the numbers of boxes where it will be found at the next layout are absolutely unpredictable. It is the same in the previous example with the hour of the week (the hour of the week is an analogue of the box number).

There are two ways to do this. Either try to show that the unevenness in practice is much greater than it would be under equal probability. This is done by some kind of statistical tests. Or just be sure that the non-uniformity, though small, is due to some regularity, which is just weakly manifested due to noise. But it's already a matter of faith and practice and if it works, ok.

I hope it's clear that the box numbers (hour of the week) are an analogy to your quanta.

If we are talking about SB, then what kind of models can we talk about, because the essence of models (wooden or neural) is to find patterns in the data.

About the fact that there can be a majority of balls of the same colour in one box - so I do the experiment 10 times and each time I get the results (I split the sample into 10 parts), and only if most of them are similar in result, I select a quantum segment. What is the probability that after doing the experiment 10 times we will find more balls of a certain colour in the same box than in other boxes?

Do you know of any additional statistical test that would fit this case?

You can't be sure of anything, plus on SB tho....

I'm looking for methods that will add to the certainty.

Also, I assume that the selected quantum segment still has more potential for non-random splitting than the other part of the predictor - I don't know how to express this as a formula or some kind of estimate. I perceive such a segment abstractly as a vein of valuable mineral/metal inside a cobble stone.....

 

I don't know how to make it clearer, so I'm showing two tree splits schematically.

Two bars are two predictors - vertical bars symbolise time (but I didn't make an exact reproduction of it).

The thick line is the standard place of predictor splitting by the tree model.

The second predictor (left in the figure) shows that the range from 2 to 3 inclusive has the largest accumulation of units, which I have highlighted in colour.

After the first split, I highlighted in bluish colour those figures that remained from the first split (let's say it is the right part, where the units went along the branch).

So, if we count the total units left after the first split, the split should be done exactly in the middle and split from 1 to 2 inclusive, but the first column contains the weakest statistical indices on responses in absolute terms, as well as 4 - only 8 each, while the central ones contain 10 and 12. The quantum cutoff can shift columns 1 and 4 to one side and columns 2 and 3 to the other, which is only one unit less in total than without the quantum cutoff, but there are initially 8 more units observed in this range, which seems significant. That is, it is expected that this range will continue to contain more units than the two neighbouring ones.

Have you managed to explain the essence of the idea of quantum segment selection?

I should add: This is a convention - allowing for arithmetical errors - what matters is text and logic, not numbers here.
 

Well, in everyday terms - we have a predictor with a range of -162 and +162 - which sends signals.

With the help of quantum segments detection we can find the ranges of levels, when hitting which occurs more often, for example, bounce to levels below. The remaining sections that are not near a level can simply be categorised in order. And it turns out that one predictor, but there are two ways of representing the data for different purposes - as an option.


 
Aleksey Vyazmikin #:

Do you know of any additional statistical test that would fit this case?

The most universal one is probably Monte Carlo. Simulate repeatedly the situation of obviously inseparable classes and see how your quanta behave on average. If they find something, then it is self-defeating.

Plausibly indivisible classes can be obtained by taking samples generated with the same distribution as features.

 
Aleksey Nikolayev #:

The most universal one is probably Monte Carlo. Simulate repeatedly the situation of obviously inseparable classes and see how your quanta behave on the average. If they find something, then it is self-deception.

Plausibly indivisible classes can be obtained by taking samples generated with the same distribution as features.

Monte Carlo is about mixing sequences and randomly getting strings - how does that help? And it's probably not correct to mix sequences if you assume that they are not random..... and it's not random in time series. I don't get the idea, if you could describe it in more detail.

Can you make such a sample for the test in csv? I think it's fast enough in R. Otherwise I'll spend another day writing code, and I don't know if I'll get it right.

 
Aleksey Vyazmikin #:

Monte Carlo is about mixing sequences and getting strings randomly - how does that help? And it's probably not correct to mix sequences if we assume that they are not random.... and it's not random in time series. I don't understand the ideas, if you can describe them in more detail.

Can you make such a sample for the test in csv? I think it's fast enough in R. Otherwise I'll spend another day writing code, and I don't know if I'll get it right.

You can do it in MT5, the statistical library has functions for generating samples for different distributions. For example, you can generate a normal sample of 200 as a sign in the first column, and in the second column you can make marks by random selection with a probability of 0.5.

It would be better if you automate this somehow within your package, as you have to do it many times and calculate something each time - you alone know what.

 
Aleksey Nikolayev #:

You can do it in MT5, the statistical library has functions for generating samples for different distributions. For example, generate a normal sample of 200 as a sign in the first column, and in the second column make marks by random selection with a probability of 0.5 each.

It would be better if you automate this somehow within your package, as you have to do it many times and calculate something each time - you alone know what.

I have never used this feature before.

Is it about this function?

Генерирует псевдослучайные величины, распределенные по нормальному закону с параметрами mu и sigmа. В случае ошибки возвращает false. Аналог rnorm() в R. 



bool  MathRandomNormal( 
   const double  mu,             // математическое ожидание 
   const double  sigma,          // среднеквадратическое отклонение 
   const int     data_count,     // количество необходимых значений 
   double&       result[]        // массив для получения псевдослучайных величин 
   );
 
 
Aleksey Vyazmikin #:

Just as I wrote about how random overshooting is and unproductive approach.

I use oversampling with the randomness element of predictor selection when testing sampling potential, and I have been using it for many years in CatBoost.

Randomisation does not give any justification for expecting the model to continue to work, because the predictor responses have been randomised into it.

There is a risk of getting bogged down in pointless wrangling again. What is the difference between a randomly found set that works on oos and one that was invented through the hardest mental suffering, but also without fundamental justification? When the method of validation is the same. Rhetorical question.

What is the difference between a random search and a search with an element of randomness of choice? ))
 
Aleksey Vyazmikin #:

Never used this functionality before.

Is it about this function?

Yes

Reason: