Bayesian regression - Has anyone made an EA using this algorithm? - page 38

 
Vasiliy Sokolov:
I subscribe to every word. What is the point of building a regression if on the next section, the characteristics of this regression will be absolutely different. You can tweak the model to fit the data as much as you like, but it is easier to just admit that Y (price) does not depend on X (time), at least in terms of linear regression.

This is another example of common sense.

A colleague of mine was studying Data Science in Coursera and did a graduation project where he built a linear regression approximation on a randomly generated series (a martingale, or you could say a Wiener process with normally distributed increments) and demonstrated how on the next segment of the series all regression parameters drifted totally unpredictably. Toy problem.

If I were to use regression (I would lean towards comb regression, although I don't know the principle very well), I would do it on price increments or price derivatives. And then there is a chance to get robust metrics. But even in this case it is unrealistic to obtain normally distributed residuals.

 
Alexey Burnakov:

This is another example of common sense.

A colleague of mine was studying Data Science in Coursera and did a graduation project where he built a linear regression approximation on a randomly generated series (a martingale, or you could say a Wiener process with normally distributed increments) and demonstrated how on the next segment of the series all regression parameters drifted totally unpredictably. Toy problem.

If I'm going to use regression (I would lean towards comb regression, although I don't know it very well, only the principle), I should do it on price increments or price derivatives. And then there is a chance to get robust metrics. But even in this case it is unreal to get normally distributed residuals.

;)

For ridge regression the normality of residuals distribution is not required.

Bayesian regression is similar to comb regression, but is based on the assumption that noise is normally distributed in the data - hence it is assumed that a general understanding of the data structure already exists, and this makes it possible to obtain a more accurate model compared to linear regression.

 
Ridge regression solves the problem of multicollinearity - if there are so many independent variables correlated with each other
 
Дмитрий:

;)

For ridge regression, the normality of the distribution of the residuals is not required.


Well, I confess that I don't know regression subspecies very well. But the fact that normality of residuals is not required is very good. And ridge regression may be more applicable to markets. There restrictions are imposed on the values of the coefficients. I know examples when this type of regression on quotes gave robust results.

There is also regression with L2-regulation when it is possible to degenerate coefficients of some regressors to zero. It is useful when there are a lot of regressors and we need to decrease the dimensionality of the input vector.

But without knowing the details, it can be dangerous to go into the maze of the regression matrix.

 
Дмитрий:
Ridge regression solves the problem of multicollinearity - if there are so many independent variables correlated with each other

And this is also an extremely useful aspect of ridge regression.

In practice, getting independence among regressors is almost unrealistically difficult, and the presence of collinearity distorts all statistics in an ordinal linear regression. Therefore, as SanSanych rightly points out, the applicability of the method comes first.

 
Alexey Burnakov:


There is also regression with L2-regulation, when it is possible to degenerate the coefficients on individual regressors to zero. This is useful if there are many regressors and we need to reduce the dimensionality of the input vector.


Lasso-regression? Yes, there is such a thing.

In practice, it is more convenient to use ridge regression - it is implemented as a regression with inclusions or exclusions of factors

 
Дмитрий:

Lasso regression? Yes, there is such a thing.

In practice, the ridge regression is more convenient - implemented as a regression with inclusions or exclusions of factors

Yeah, it is.

Here is an example of using robust regressions to predict quotes, 3rd place in the competition, but without details:http://blog.kaggle.com/2016/02/12/winton-stock-market-challenge-winners-interview-3rd-place-mendrika-ramarlina/

And another gorgeous, in my opinion, example:https://www.kaggle.com/c/battlefin-s-big-data-combine-forecasting-challenge/forums/t/5966/share-your-approach

Read Sergey Yurgenson and see his code (2nd place in another contest):

My algorithm was written on Matlab and code will be provided below. The main idea of algorithm is to use linear regression model (robust regression) using small number of predictors, which are chosen based on p-value of slops of each potential predictor.

Winton Stock Market Challenge, Winner's Interview: 3rd place, Mendrika Ramarlina
Winton Stock Market Challenge, Winner's Interview: 3rd place, Mendrika Ramarlina
  • 2016.12.02
  • blog.kaggle.com
The Stock Market Challenge, Winton's second recruiting competition on Kaggle, asked participants to predict intra and end of day stock returns. The competition was crafted by research scientist at Winton to mimic the type of problem that they work on everyday. Mendrika Ramarlina finished third in the competition with a combination of simple...
 

And on the subject of L1 / L2-regulation:https://msdn.microsoft.com/ru-ru/magazine/dn904675.aspx

In any case, it is useful to get to know each other.

 
Alexey Burnakov:

And this is also an extremely useful aspect of ridge regression.

In practice, getting independence among regressors is almost unrealistically difficult, and the presence of collinearity distorts all statistics in an ordinal linear regression. Therefore, as SanSanych rightly points out, the applicability of the method comes first.

I've tried the principal component method. It seems to be ideal. The transformation results in a set of regressors with zero correlation with each other. You can still pick out the "principal" ones that explain the main diversity.

Killed a lot of time for classification tasks. At least to reduce the error by %.

 
СанСаныч Фоменко:

I tried the principal component method. It seems to be ideal. The transformation results in a set of regressors with zero correlation with each other. It is also possible to select the "principal" ones that explain the main diversity.

Killed a lot of time for classification tasks. At least to reduce the error by %.

I was recently discussing the history and development of linear regression with colleagues. To make a long story short, initially there was little data and few predictors. Ordinary linear regression managed with some assumptions. Then, with the development of information technology, the amount of data increased and the number of predictors could easily exceed tens of thousands. Under these conditions ordinary linear regression will not help - over-learn. Therefore regularised versions appeared, versions robust to the requirements of distributions, etc.