How to detect "similarity" in the computer system - General

iwelimorn 2021.10.29 11:48 #24771

Maxim Dmitrievsky #:
There is more of an effect of standardization than balancing, in my opinion. Plus, sampling from distributions helps against overtraining

Did I get it right, the more samples, the closer to standardized ones?

Maxim Dmitrievsky 2021.10.29 11:57 #24772

iwelimorn #:

Did I get it right, the more samples, the closer the signs are to standardized?

It's hard to say what the optimal sample size is, probably depends on the number of components of Gaussian mixtures. Too big sampling, with noisy dataset, leads to generation of very similar samples, i.e. the probability of occurrence of frequently repeated samples increases, because Gaussians are used for density estimation and generation. Therefore, it is more likely no than yes.

I read somewhere that GMM doesn't work well with large datasets.

What makes an unsteady Regularity or Randomness Is it necessary to

iwelimorn 2021.10.29 12:06 #24773

Maxim Dmitrievsky #:
It is hard to say what is the optimal sample size, probably, it depends on the number of components of Gaussian mixtures. Too big sample, with noisy dataset, leads to generation of very similar samples, i.e. the probability of occurrence of frequently repeated samples increases, because Gaussians are used for density estimation and generation. Therefore, more likely no than yes.

Thanks. I probably didn't ask the question correctly, I meant is it possible that more samples generated would yield a more standardized sample.

Maxim Dmitrievsky 2021.10.29 13:00 #24774

iwelimorn #:

Thank you. I probably didn't ask the question correctly, I meant is it possible that more samples generated will give a more standardized sample.

Yes, of course

Mihail Marchukajtes 2021.10.30 18:20 #24775

iwelimorn #:

Thank you. I probably didn't ask the question correctly, I meant is it possible that more samples generated will give a more standardized sample.

It is important that there are as few inconsistencies as possible when generating the sample, otherwise the training will be useless. Imagine that in one case with the same value of the input vector the target has value 1, and in the next example with an identical input vector the target has value 0. What should the algorithm do then? How to react? So increasing the training samples is only relevant if it doesn't lead to an increase in contradictions. This is a philosophical question. I, for example, cover 3 months of the market on M5 on 100 training samples. As an example...

Taking Neural Networks to Is there a pattern How to form the

Mihail Marchukajtes 2021.10.30 18:21 #24776

And by the way there is a lady in our ranks, as I understand it? That's really a rare case, one might say an exception to the rule... :-)

Dmytryi Nazarchuk 2021.10.30 18:27 #24777

Mihail Marchukajtes #:
It is important to have as few contradictions as possible when forming the sample, otherwise training will be useless . Imagine that in one case with the same value of the input vector the target has value 1, and in the next case with an identical input vector the target has value 0. What should the algorithm do then? How to react? So increasing the training samples is only relevant if it doesn't lead to an increase in contradictions. This is a philosophical question. I, for example, cover 3 months of the market on M5 on 100 training samples. As an example...

did you even understand what you wrote?

iwelimorn 2021.10.30 19:05 #24778

Mihail Marchukajtes #:
It is important to have as few contradictions as possible when forming the sample, otherwise training will be useless. Imagine that in one case with the same value of the input vector the target has value 1, and in the next case with an identical input vector the target has value 0. Then what should the algorithm do? How to react? So increasing the training samples is only relevant if it doesn't lead to an increase in contradictions. This is a philosophical question. I, for example, cover 3 months of the market on M5 on 100 training samples. As an example...

I agree with you, if one and the same example describes several states, then during classification by any available algorithm we will get the probability close to 1/n where n is a number of states.

But there are no absolutely similar examples, they are similar to a certain degree. The question is how to detect this "similarity".

100 examples in three months on M5... I wonder... Do you select samples from the initial sample according to the rules that you then use in trading?

Errors, bugs, questions Is there a pattern Tool to autoclose open

Dmytryi Nazarchuk 2021.10.30 19:08 #24779

iwelimorn #:

I agree with you, if the same example describes several states, we will get probability close to 1/n where n is the number of states when classifying by any available algorithm.

But there are no absolutely similar examples, they are similar to a certain degree. The question is how to detect this "similarity".

100 examples in three months on M5... I wonder... Do you select samples from the initial sample according to the rules that you then use in trading?

If the same set of independent variables in the training sample corresponds to only one dependent variable, it's a deterministic series.

There's nothing to classify there - the prediction error is 0.

Yes, it's already agony.

Econometrics: one step ahead A formal definition of Any questions from newcomers

iwelimorn 2021.10.30 19:21 #24780

Dmytryi Nazarchuk #:

If the same set of independent variables in the training sample corresponds to only one dependent variable, then it is a deterministic series.

There is nothing to classify - the prediction error is 0.

Yes, this is agony.

Thanks, maybe it's not agony, but my lack of fundamental knowledge.

Is it also true if several sets of independent variables correspond to the same variable?

Machine learning in trading: theory, models, practice and algo-trading - page 2478