Machine learning in trading: theory, models, practice and algo-trading - page 2729

 
Aleksey Vyazmikin #:

Okay, if you're being clever, then you don't want to think in that direction. I won't bother.

Alexei, learn to formulate your thoughts, if you really want to be understood...

 
mytarmailS #:

Alexei, learn to formulate your thoughts if you want to be understood.

And where did you lose my train of thought?

My thoughts are formulated - I understand what I'm talking about, if someone doesn't understand - ask. Maybe you need to learn to understand the essence better, without clinging to terms....

 
Aleksey Vyazmikin #:

Okay, if you're being clever, then you don't want to think in that direction. I won't bother.

You either can't formulate an idea or you don't want to share it. In both cases there is no point in further development of the topic and let's do without getting personal.

 
Aleksey Vyazmikin #:

Do you think the Criterion doesn't make sense? Take ten samples of different sizes and compare them - choose the one with the best scores on several indicators that are responsible for the similarity/similarity/homogeneity of the samples.

We take and mix the samples, we get different scores... we get sad

* mix them amongst themselves. Since no one forbids it, since it is not a sequential model being trained, the sequence of samples does not matter. It only matters the classification error, which can always be reduced by mixing.

To search for something, you need to understand very precisely what you are looking for, otherwise you will be playing with samples until you are stupefied. Well, nobody knows what is being searched for, if anyone finds out, let me know.
 
There is another trick. The more uninformative the features are, the smaller the training sample should be.

The more informative the attributes are and there are fewer of them, the larger the sample can/should be taken. And almost everyone thinks the opposite.
 
Aleksey Nikolayev #:

You either can't formulate an idea or you don't want to share it. In both cases there is no point in further development of the topic and let's do without getting personal.

Didn't I write that the idea is to compare samples (training and application), that if your theory is correct, the sample will cease to be similar as it increases, and to understand this you need criteria for assessing its change, which are derived from the methods of assessing similarity?

Besides, I spoke about marking the whole sample into sections according to some comparable tendency feature, and ranking within these groups. And such ranking can again be done by criteria of "similarity" of samples.

I am not getting personal - I see the style of the answer, and I am just puzzled - what are people doing here - do they want to show their uniqueness? I'm interested in finding ways to solve the problem, I'm interested in using other people's knowledge and sharing my own.

 
Maxim Dmitrievsky #:
Take and mix samples, get different estimates... sad

* mix them amongst each other. Since no one forbids it, since it is not a sequential model being trained, the sequence of samples does not matter. It only matters the classification error, which can always be reduced by mixing.

To search for something, you need to understand very precisely what you are looking for, otherwise you will be playing with samples until you are stupefied. Well, nobody knows what is being searched for, so if someone finds out, let me know.

You can mix only within a sample, if you mix two samples, it means denying that the market is changing.

 
Maxim Dmitrievsky #:
Take and mix samples, get different estimates... sad

* mix them amongst each other. Since no one forbids it, since it is not a sequential model being trained, the sequence of samples does not matter. It only matters the classification error, which can always be reduced by mixing.

To search for something, you need to understand very precisely what you are looking for, otherwise you will be playing with samples until you are stupefied. Well, nobody knows what is being searched for, so if someone finds out, let me know.

I don't really like what you and Alexey have in common in your reasoning - you have them in the context of a specific model and studying its behaviour when the training sample changes. Ideally, I'd like to be independent of a particular model when selecting the training sample - that's why I've settled on using zigzag vertices for now. But you are probably both right and complete independence from the type of TC is hardly possible.

 
Aleksey Vyazmikin #:

Didn't I write that the idea in comparing samples (training and application) is that if your theory is correct, the sample will cease to be similar as it increases, and to realise this you need criteria for assessing its change, which are derived from methods of assessing similarity?

Here you are apparently talking about multivariate samples (each element is a row of a table, a vector), while the homogeneity criteria in your three links are about numerical samples. Multivariate homogeneity criteria in matstat is a separate song and not quite clear to me.

Aleksey Vyazmikin #:

Besides, I was talking about partitioning the whole sample into sections according to some comparable tendency feature, and ranking within these groups. And such ranking can again be done according to the criteria of "similarity" of samples.

It is similar to the task of searching for many change points detection. Again it turns out that we have to work with a multidimensional (vector) case, which complicates the matter a lot.

Well, and in general, I don't like the dependence on which attributes are chosen for the study. If we take different sets of them, the results may differ.

 
Aleksey Nikolayev #:

Here you are apparently talking about multivariate samples (each element is a row of a table, a vector), while the homogeneity criteria in your three links are about numerical samples. Multivariate homogeneity criteria in matstat is a separate song and not quite clear to me.

Each predictor separately is a numerical sample, so why not evaluate them separately and average the results? If there are dynamics of deterioration in most of the predictors, the sample is redundant.

Aleksey Nikolayev #:

It looks like the task of searching for many change points detection. Again, it turns out that we need to work with a multivariate (vector) case, which complicates things a lot.

Well, and in general, I don't like the dependence on which attributes are chosen for the study. If we take different sets of them, the results may differ.

Perhaps we should find those variants that will give the best results in terms of identifying the belonging of segments to a particular group and the efficiency of training on a grouped population.