Bayesian regression - Has anyone made an EA using this algorithm? - page 35

 

What, not enough again????

Well, for more:

Dependent: AUDNZD Multiple R = .83469441 F = 3845.556

R?= .69671476 df = 1.1674

No. of cases: 1676 adjusted R?= .69653358 p = .000000

Standard error of estimate: .053321255

Intercept: 6.047516031 Std.Error: .0782142 t( 1674) = 77.320 p = 0.0000


 

Control to head:

Dependent: NZDCAD Multiple R = .87619213 F = 5532.591

R?= .76771265 df = 1.1674

No. of cases: 1676 adjusted R?= .76757389 p = .000000

Standard error of estimate: .032035522

Intercept: -2.664033151 Std.Error: .0469913 t( 1674) = -56.69 p = 0.0000


 
Дмитрий:

Is R^2 "very low" already?

Is there a correlation?

The correlation is undetectable. R is weak. I happen to use R2 very actively in evaluating equity quality of my strategies and believe me, I have seen hundreds of charts where R2 was similar to what it is presented here. This one is utterly tinny, indistinguishable from SB.

 
Vasiliy Sokolov:

The relationship is undetectable. R is weak. As it happens, I myself am very active in using R2 in assessing the equity quality of my strategies, and believe me, I have seen hundreds of charts whose R2 was roughly what it is presented here. This one is utterly tinny, indistinguishable from SB.

))))))))))))))))))))))))))0
 

I remember doing a thing like this in R-project: generating a thousand random market trajectories, a thousand measurements each. Then I threw a linear regression on each of them and got its R^2. As a result, the resulting vector of R^2 values turned out to be a uniformly distributed value from zero to 0.99... With a mean of about 0.5. I suggest everyone to repeat my result, and think about the essence of what we are counting.

s.w. Too bad I don't have R or these codes at hand, otherwise one picture would be worth a thousand words...

 
Vasiliy Sokolov:
I remember doing such a thing in R-project: I generated a thousand random market trajectories, a thousand measurements each. Then I threw a linear regression on each of them and got its R^2. As a result, the resulting vector of R^2 values turned out to be a uniformly distributed value from zero to 0.99... With a mean of about 0.5. I invite everyone to repeat my result, and think about the essence of what we are counting.

И?

What is the point of what is written? That regression analysis should not be used on the grounds that one of the nth number of generated PRNG series can show a large R^2?

So it is necessary to throw out all methods of matstatistics and prediction.

 
Vasiliy Sokolov:

I am amazed at the high level of mastery of mathematical methods by the panelists and their complete lack of understanding of the principles of their applicability. Any regression analyses correlated data. If there is no correlation, then the regression is not applicable. If the distribution of the quantities under study is different from normal, parametric statistics methods are also not applicable. The market does not have the property of normality. Also the market as a process doesn't depend on time. Both of them cross out the very idea of regression analysis, no matter what it is at the root.

The problem is that many participants, including you, don't understand regression and are using obscure definitions. There is no limit to the distribution of errors in a proper definition of regression analysis. The main thing is that the errors must be statistically independent of each other to allow the total regression error to be represented as the sum of the functions of the individual errors. Everything else is special cases of regression. For example, the error normality requirement only applies to mean-square regression, i.e. when the total regression error is represented as the sum of the squares of the individual errors. This is the simplest method of regression so it leads to solving a system of linear equations. If you don't want to assume normality of errors, use any other distribution. Instead of sum of squares, the total error will be represented by the sum of some other function of individual errors.

Let me try to explain it this way. Suppose we have measurements y and input data x. Let's plot y on x. The points of y(x) form some cloud. If this cloud is circular with uniform density of points in all directions, then no matter how twist and screw with the error distribution, the model y(x) does not exist since y and x are independent. If this cloud extends in some direction, then we can build a model. In that case we have several model choices:

1. Construct a linear y_mod(x) = a + b*x or a non-linear y_mod(x) = F(x) = example = a0 + a1*x + a2*x^2 +... models.

2. Assuming independence of measurement errors e[i] = y[i] - y_mod[i], we assume their normality err_sum = SUM e[i]^2, or non-normality err_sum = SUM G(e[i]) where G() is any "non-square" function, for example G(e) = |e|, or in general case G(e) = |e|^p. We can get twisted and make an error function where more weight is given to negative values of y[i], for example. Whichever we choose G(e) does not affect the predictability of y as a function of x. It only affects how we draw a straight line through the cloud y(x). For example, if G(e) = e^10, then this straight line will lie closer to larger values of y.

The choice of a linear y_mod(x) = a + b*x or a polynomial y_mod(x) = a0 + a1*x + a2*x^2 +... model depends on the shape of our elongated cloud. In both cases we can use mean-square regression, which will lead to a system of linear equations that is solved quickly.

Now let's talk about time. If y(t) and x(t) depend on time, which happens in almost all regression cases since measurements are made at different points in time, it does not change the matter. We can still talk about regression y(t) = F(x(t)). If the function y(t) = F(x(t)) is time dependent, i.e. y(t) = F(x(t),t), then the static regression y=F(x) over the whole time interval is not applicable. A dynamic model y=F(x,t) should be used.

 
Vladimir:
According to research by a mathematician (I don't remember his last name, he works for FINAM), the distribution is close to normal with elongated tails (but it's understandable why). So linear regression, imho, rules the day.
 
Yuriy Asaulenko:
According to research by a mathematician (I don't remember his last name, he works for FINAM), the distribution is close to normal with elongated tails (but it's understandable why). So linear regression, imho, is quite good.
I have tried a lot of different error distributions. I did not notice any particular difference in the results, but the calculation time increases significantly. That's why I use RMS regression. I'm afraid to call it linear, because function y(x) may be non-linear in x variable, but linear in model coefficients, in which case averaged-square regression still gives noticeable speedup of calculations. Instead of spending so much time on the theories of applicability of normality and regression, it's much more important to talk about preparing input data as I can draw this cloud y(x) or make it circular by a simple transformation of input x and measurements y. How we draw a straight line or a parabola through this cloud and calculate modelling errors (squares or absolute values) is a secondary matter.
 

I appeal to the sceptics.

Ladies and gentlemen, ladies and gentlemen, comrades! There is too much blood in your alcohol circulation system.(C)

What can you mathematically model on R if you haven't decided on the conceptual questions for the Bayes formula: what is the market to the right of the zero bar. And is it a market? Or maybe a good game simulator with an appropriate algorithm? What distribution and likelihood function to take?

The world has not come to a happy end with the normal distribution. Bayes was dead when Gauss was born. I suggested taking the normal distribution because you skeptics have shown it convincingly. And if you skeptics say it doesn't fit, it doesn't apply, then please propose something that does, other than what is already proposed. Your likelihood function and distribution law can be applied to Bayes formula, for example as I described on p.31 in the post of March 8 under the bouquet. And see what happens.