Machine learning in trading: theory, models, practice and algo-trading - page 2478

 
Maxim Dmitrievsky #:
There is more of an effect of standardization than balancing, in my opinion. Plus, sampling from distributions helps against overtraining

Did I get it right, the more samples, the closer to standardized ones?

 
iwelimorn #:

Did I get it right, the more samples, the closer the signs are to standardized?

It's hard to say what the optimal sample size is, probably depends on the number of components of Gaussian mixtures. Too big sampling, with noisy dataset, leads to generation of very similar samples, i.e. the probability of occurrence of frequently repeated samples increases, because Gaussians are used for density estimation and generation. Therefore, it is more likely no than yes.
I read somewhere that GMM doesn't work well with large datasets.
 
Maxim Dmitrievsky #:
It is hard to say what is the optimal sample size, probably, it depends on the number of components of Gaussian mixtures. Too big sample, with noisy dataset, leads to generation of very similar samples, i.e. the probability of occurrence of frequently repeated samples increases, because Gaussians are used for density estimation and generation. Therefore, more likely no than yes.

Thanks. I probably didn't ask the question correctly, I meant is it possible that more samples generated would yield a more standardized sample.

 
iwelimorn #:

Thank you. I probably didn't ask the question correctly, I meant is it possible that more samples generated will give a more standardized sample.

Yes, of course
 
iwelimorn #:

Thank you. I probably didn't ask the question correctly, I meant is it possible that more samples generated will give a more standardized sample.

It is important that there are as few inconsistencies as possible when generating the sample, otherwise the training will be useless. Imagine that in one case with the same value of the input vector the target has value 1, and in the next example with an identical input vector the target has value 0. What should the algorithm do then? How to react? So increasing the training samples is only relevant if it doesn't lead to an increase in contradictions. This is a philosophical question. I, for example, cover 3 months of the market on M5 on 100 training samples. As an example...
 
And by the way there is a lady in our ranks, as I understand it? That's really a rare case, one might say an exception to the rule... :-)
 
Mihail Marchukajtes #:
It is important to have as few contradictions as possible when forming the sample, otherwise training will be useless . Imagine that in one case with the same value of the input vector the target has value 1, and in the next case with an identical input vector the target has value 0. What should the algorithm do then? How to react? So increasing the training samples is only relevant if it doesn't lead to an increase in contradictions. This is a philosophical question. I, for example, cover 3 months of the market on M5 on 100 training samples. As an example...

did you even understand what you wrote?

 

Mihail Marchukajtes #:
It is important to have as few contradictions as possible when forming the sample, otherwise training will be useless. Imagine that in one case with the same value of the input vector the target has value 1, and in the next case with an identical input vector the target has value 0. Then what should the algorithm do? How to react? So increasing the training samples is only relevant if it doesn't lead to an increase in contradictions. This is a philosophical question. I, for example, cover 3 months of the market on M5 on 100 training samples. As an example...

I agree with you, if one and the same example describes several states, then during classification by any available algorithm we will get the probability close to 1/n where n is a number of states.

But there are no absolutely similar examples, they are similar to a certain degree. The question is how to detect this "similarity".


100 examples in three months on M5... I wonder... Do you select samples from the initial sample according to the rules that you then use in trading?

 
iwelimorn #:

I agree with you, if the same example describes several states, we will get probability close to 1/n where n is the number of states when classifying by any available algorithm.

But there are no absolutely similar examples, they are similar to a certain degree. The question is how to detect this "similarity".


100 examples in three months on M5... I wonder... Do you select samples from the initial sample according to the rules that you then use in trading?

If the same set of independent variables in the training sample corresponds to only one dependent variable, it's a deterministic series.

There's nothing to classify there - the prediction error is 0.

Yes, it's already agony.

 
Dmytryi Nazarchuk #:

If the same set of independent variables in the training sample corresponds to only one dependent variable, then it is a deterministic series.

There is nothing to classify - the prediction error is 0.

Yes, this is agony.

Thanks, maybe it's not agony, but my lack of fundamental knowledge.

Is it also true if several sets of independent variables correspond to the same variable?

Reason: