Machine learning in trading: theory, models, practice and algo-trading - page 2746

 
Maxim Dmitrievsky #:
To summarise Sanych's theory (since he himself failed to formalise it properly and give examples):

*his way of feature selection is based on correlation, since "relation" and "relationship" are definitions of correlation.

*This way we make an implicit fit to history, similar in meaning to LDA (linear discriminant analysis) or PCA, simplify the learning process, reduce error.

*There is not even a theory that the trained model should perform better on new data (not involved in the estimation of feature-target relationships) because the features were previously fitted to the trait or (worse) to the entire available history.

*The situation is somewhat improved by averaging QC in a sliding window, like you can estimate the spread and select more stable ones. At least some statistics to rely on

*I was thinking of causality or a statistically significant relationship, but that's not the case in his approach.

It's completely wrong.

1. I wrote above about my understanding of "predictive power"

2. The meaning is not clear

3. there is no traine in the usual sense. Random forest fitting. Sample size = 1500 bars, number of trees=150. The sample size is picked up from the fit error graph. On this sample of 170 predictors, selection and preprocessing of predictors on different criteria is done. Eventually, out of 20-30 remaining predictors, 5 to 10 predictors are selected on the basis of maximum predictive ability and the model is fitted. The next bar is predicted using the obtained model. With the arrival of a new bar, the whole process of model building is repeated.

The maximum fitting error is about 20%, but it is quite rare. Usually about 10%.

4. I described my approach earlier.

 
СанСаныч Фоменко #:

One more time.

but the target isn't a zigzag, is it?

 
СанСаныч Фоменко #:

Completely wrong

1. Above wrote about his understanding of "predictive ability"

2. The meaning is not clear

3. no trayne in the usual sense. Random forest fitting. Sample size = 1500 bars, number of trees = 150. The sample size is picked up from the fit error plot. On this sample of 170 predictors, selection and preprocessing of predictors on different criteria is done. Eventually, out of 20-30 remaining predictors, 5 to 10 predictors are selected on the basis of maximum predictive ability and the model is fitted. The next bar is predicted using the obtained model. With the arrival of a new bar, the whole process of model building is repeated.

The maximum fitting error is about 20%, but it is quite rare. Usually about 10%.

4. I described my approach earlier.

It's clearer. Where do the targets come from, based on clustering results?
 

Confidence that future results will be as decent comes from the predictive ability statistic, which:

1. should have a sufficiently high sd value

2. a low sd value.

As usual, if one can find predictors that have an sd of less than 10%, then the variation in prediction error will be about the same.


My conclusion:

1. We should adopt (or develop) one of the "predictive ability" algorithms

2. Find a list of predictors whose predictive ability values differ by a factor of one

3. Run a window and get statistics: mean and deviation from mean. If you are lucky, you will find such a list. I did.

The model doesn't matter. On my predictors RF, ada, GBM, GLM give about the same result. SVM is slightly worse. Not good at all with nnet.


All success is in the predictors and their preprocessing. And you are making nonsense here!

 
Maxim Dmitrievsky #:
That makes more sense. Where do the targets come from, from the clustering results?

I've got the sign of normal increments.

The target is secondary. The problem with the target is predictors: you can match predictors to a particular target or not.

 
СанСаныч Фоменко #:

I have a sign for normal increments.

The target is secondary. The problem of the target is predictors: you can fit predictors to a particular target or not.

I fit them to one or more signs at the stage of marking targets, through correlation or at least Mahalanobis can be done. I.e. any informative set can be made

The sliding window theme is clear now, just retrain the model and reselect the signs

I would just calculate statistics on them in a sliding window and choose the optimal ones, so as not to retrain on every bar.
 
СанСаныч Фоменко choice of predictors that the models produce.

Predictive ability is information correlation and NOT:

1. Correlation is the "similarity" of one stationary series to another, and there is always some value, and there is no "no relationship" value. Correlation always has some kind of value, so you can easily use correlation to find the relationship between a teacher and coffee grounds.

2. Fiche selectin is the frequency of using fiches when building models. If we take predictors that have nothing to do with the teacher, we still get a ranking of fiches.

An analogue to my understanding of "predictive power is for example caret::classDist(), which defines Mahalanobis sampling distances for each class of centres of gravity. Or woeBinning. There are many approaches and many packages in R. There are more based on information theory.

Still don't get it. It's not correlation or frequency of use. How is the presc.sp. estimated in training or what is it estimated by?
Or is it some kind of equilibrium indicator that is so named?
S.F. I read further, it became clearer.
 
СанСаныч Фоменко #:

I have a sign for normal increments.

The target is secondary. The problem of the target is predictors: you can match predictors to a particular target or not.

The sign of increments and the sign of ZZ do not guarantee profit. 5 small increments will easily overlap one strong one, but in the opposite direction. And 10, for example, night profitable bars will also be covered by 1 day losing bar (just 10% error).

What balance line will be obtained on the new data? I hope it is not horizontal with small upward/downward fluctuations?

In Vladimir's articles the error is also around 10-20%, but the balance line does not inspire optimism.

 
Valeriy Yastremskiy #:
I still don't get it. It's not correlation or frequency of use. How is presc.sp. estimated in training or what is it estimated by?
Or is it some sort of equilibrium measure that is so named?

same vector algebra, same feature mapping that removes the multicollinearity problem.

The Mahalanobis distance is one of the most common measures in multivariate statistics.

- i.e. spatial selection/projection of essentially the same "components".... location in the space of multicollinear features gives a field for application of vector(!) algebra, in order not to get rid of multicollinearity in an artisanal way, it is better to just take it into account (e.g. by reducing it to 3d-space or whatever you want and operate with projections, and if necessary, the initial data can be multiplied by these estimates, like something like factor loadings, although usually the library itself measures this Mahalanobis distance and gives the results).

Anyway, the end result is the same approximation to the mean and st.dev and making trading decisions based on them.

- there is no other modelling in nature -- there are just ways to solve common problems (heteroscedasticity, multicollinearity, autocorrelation of residuals) in (different-) n-dimensional space...

and there is no getting away from statistics ... the solution to the trait correlation problem is here in explicit form ...

p.s..

UPDATED: still this tool(MD) is used for clustering/grouping/multidimensional_classification... to select outliers in a multidimensional space... is sometimes used along with Euclidean distance... "when variables are not correlated -- the Mahalanobis distance coincides with the usual Euclidean distance".... in LDA... in general tz is the one I described earlier....

with this post I in no way meant to equate PCA and clustering, it was just in my memory that both PCA and MD give the possibility to get rid of outliers in multidimensional space... but the essence of my update does not change: these are all solutions of spatial problems by vector algebra to take into account the problem of multicollinearity (so that it does not distort/shift statistical estimates).

Машинное обучение в трейдинге: теория, модели, практика и алготорговля - Попробуйте сделать кластерный анализ, чтобы определиться сколько классов формально выделить как их назвать уже дело субъективного вкуса.
Машинное обучение в трейдинге: теория, модели, практика и алготорговля - Попробуйте сделать кластерный анализ, чтобы определиться сколько классов формально выделить как их назвать уже дело субъективного вкуса.
  • 2022.09.15
  • www.mql5.com
поскольку он сам не смог нормально формализовать и привести примеры его способ отбора признаков основан на корреляции. чтобы определиться сколько классов формально выделить как их назвать уже дело субъективного вкуса. на основании которых относить sample ы к тому или иному классу
 
Maxim Dmitrievsky #:
so you don't have to change at every bar.

It's a matter of principle, it's a matter of principle to change shoes. We don't need a model that lives for 100 years. We need a model that can predict the next bar with a low error. Then comes the Expert Advisor, and it has its own problems with this prediction.