Market prediction based on macroeconomic indicators - page 9

 
transcendreamer:


I would still disagree - regression works fine with any data, not necessarily better than other methods, but still good enough especially if you consider its extreme lack of demand for computational resources


Regression does NOT work with any data. This is especially true for linear regression, which is mentioned at the beginning of this thread.

The problem with applying linear regression can be divided into two levels.

1. Primary estimation of regression coefficients. Exactly EVALUATION. If we write y=a+inx, there is no accuracy here, as regression is not an equation and the correct entry y ~ a+inx, where the tilde sign stresses that the coefficients are not constants, but estimates of random variables with a certain accuracy, and therefore they cannot be added together, as you suggest in your post.

Accordingly, when using any regression fitting package, each coefficient is matched with some set of numbers that characterize the specified coefficient value as a random variable. The total result is displayed in the extreme right-hand column as asterisks. Three asterisks mean that you can take the value of a coefficient as a constant, or rather as an estimate of a random value with a small error and a small spread. If there are no asterisks, the given value is nothing at all and cannot be used in any way.

But this is not all the trouble. And the main troubles are as follows.

2. linear regression is applicable ONLY to stationary data, i.e. having approximately constant mo and constant variance. The transformation you mentioned, leading to the removal of the trend, is exactly the attempt to bring it to a stationary form. All this is generalized in the form of ARIMA models, but there are such financial series, and there are most of them, when ARIMA models do not solve problems.

If you don't distinguish all these subtleties, the results obtained with linear regression are an empty numbers game.

 
faa1947:

Regression does NOT work with any data. This is especially true for linear regression, which is mentioned at the beginning of this thread.

The problem with applying linear regression can be divided into two levels.

1. Primary estimation of regression coefficients. Exactly EVALUATION. If we write y=a+inx, it is no longer accurate, because regression is not an equation and the correct entry y ~ a+inx, where the tilde sign stresses that the coefficients are not constants, but estimates of random variables with a certain accuracy, and therefore we cannot add them up, as you suggest in your post.

Accordingly, when using any regression fitting package, each coefficient is matched with some set of numbers that characterize the specified coefficient value as a random variable. The total result is displayed in the extreme right-hand column as asterisks. Three asterisks mean that you can consider the value of the coefficient as a constant, or rather as an estimate of a random value with a small error and a small spread. If there are no asterisks, the given value is nothing at all and cannot be used in any way.

But this is not all the trouble. And the main troubles are as follows.

2. linear regression is applicable ONLY to stationary data, i.e. having approximately constant mo and constant variance. The transformation you mentioned, leading to the removal of the trend, is exactly the attempt to bring it to a stationary form. All this is generalized in the form of ARIMA models, but there are such financial series, and there are most of them, when ARIMA models do not solve problems.

If you don't distinguish all these subtleties, the results obtained with linear regression are an empty numbers game.

it works fine for me )))) and it's just linear regression

summing up the coefficients is a crude method, I agree.

I've tried to analyze the significance of the coefficients and analysis of variance, but in practice it seems to me to be of little use

it is much easier and more convenient to see how the final curve behaves and how well it adjoins the theoretical values to the original data visually on the graph

that's why I take the solution vector as is, and for most cases that's enough

if it fits the data well, all is well.

I have tried other, better solutions, other methods - the result is not much different from that of the regression

i have noticed that some coefficients may float within certain limits, and this does not affect the final curve much

But this is not a problem either, these coefficients will be unstable anyway, they will gradually change over time, so there is no sense in evaluating them

about stationarity - of course it does not exist in the market, so what to do then?

it is probably not academic to do it the way i do.

but then what to take as a substitute?

 
transcendreamer:

.....

but then what should be substituted?

You have limited yourself to linear regression, but you could put the question as follows: choose the most appropriate type of regression depending on the tasks at hand. You can think of all the great many regressions (not just linear) as a bunch of black boxes and concentrate on meaningful problems in evaluating the results obtained.

To get out of a linear regression as out of short trousers you must spend a lot of time.

Next, decide on the type of what you are predicting, namely: are you going to predict the value, such as the price of a currency pair, or predict the direction of the price, some kind of qualitative characteristic of "long-short" or other orders of the terminal.

Now you have to decide on some kind of time investment.

At the first stage I recommend Rattle as a door to the world of more than 100 models. Judging by the level of your reasoning about linear regression it is a day or two of your time. You end up with 6 types of models, one of them almost your favourite, only called "generalised linear", but the others are much more interesting, with which you can actually make prediction models.

 
faa1947:

You have limited yourself to linear regression, but you could put the question as follows: choose the most appropriate type of regression depending on the tasks at hand. You can think of all the great many regressions (not just linear) as a bunch of black boxes and concentrate on the meaningful problems of evaluating the results.

To get out of a linear regression as out of short trousers you must spend a lot of time.

Next, decide on the type of what you are predicting, namely: are you going to predict the value, for example the price of a currency pair, or predict the direction of the price, some kind of qualitative characteristic "long-short" or other orders of the terminal.

Now you have to decide on some kind of time investment.

At the first stage I recommend Rattle as a door to the world of more than 100 models. Judging by the level of your reasoning about linear regression it is a day or two of your time. You end up with 6 types of models, one of them almost your favourite, only called "generalised linear", but the others are much more interesting, with which you can actually make prediction models.

unfortunately as i said other optimizers have not shown significantly better results compared to linear regression

maybe in some scientific applications they may give an advantage, but in trading the accurate prediction is an illusion.

GLM was developed for insurance, if I'm not mistaken, SVM and ADA are too narrowly focused, logistic regression is not suitable for obvious reasons

neural networks and random forests are versatile and more advantageous because they bypass the zero-root problem and any target function can be specified

but it's a real head-scratcher out there, at least for me, a humanitarian.

the principal components method was a discovery for me, but I haven't been able to apply it to my problem (portfolios)

the random forests are definitely worth attention and i plan to try them after some time, but i don't expect much effect.

Too bad there is no GA in rattle, or I couldn't find one

 

I wouldn't call linear regression "cheesy". And there's no need to assume that I haven't tried a bunch of other models.

Everyone knows that any non-linear model y = f(x1,x2,...) can be decomposed into a Taylor series:

y = a0 + a11*df/dx1*x1 + a12*df/dx2*x2 + ... + a21*d^2f/dx1^2*x1^2 + a22*d^2f/dx2^2*x2^2 + b11*d^2f/dx1/dx2*x1*x2 + ...

Those well versed in mathematics know that this is a decomposition of the function f(x1,x2,...) into polynomial (more precisely monomial) bases x, x^2, x^3 etc. A linear regression retains only the linear terms of this expansion, so it is a first-order approximation. Non-linear bases can be chosen from various well-known polynomials, e.g. Chebyshev, Hermite, Legendre, etc. But the correct method of polynomials selection is QR decomposition or in more general case selection of orthogonal polynomials taking into account statistical properties of inputs x1, x2, ... Neural networks try to do the same decomposition but on exp functions of inputs according to Kolmogorov's theorem. This is quite inconvenient decomposition as exponential functions of inputs are not orthogonal to each other which leads to a lot of numerical problems and variants of solution. In any case, all these decompositions of our non-linear function will have a linear model as a first order approximation. So if a linear approximation (regression) does not give us the expected result, then there is no point in going to higher degrees of non-linearity. Even linear regression can be solved by different methods (RMS, MNM, and other arbitrary error functions), which I have tried them all.

By the way, all econometric ARMA, ARIMA and other models are individual cases of the above model y[n] = f(x1[n-d1],x2[n-d2],...) where some inputs are delayed outputs, i.e. y[n-1], y[n-2], hence the name "autoregressive" models. Although it is not healthy to solve autoregressive models by RMS or CMM methods because the obtained coefficients lead to oscillatory models. We need Burg, Modified Covariance and so on. But I passed this "autoregressive" chapter long ago and I don't want to come back. Although my market model allows selection of a delayed exit as one of the inputs. But so far it has never chosen such an "autoregressive" input, which means that economic indicators are more suitable for predicting price than the price itself in the past (which forms the basis of the vast majority of traders' methods based on tech analysis)

 
faa1947:

I have a suggestion.

Drop the tsv.file with the names of the columns. Specify which (which) columns should be used as target variables. Naturally, the table row should refer to one point in time.

I'll run it in Rattle and with your permission I'll post the result here for 6 very decent models.


Suggestion accepted. Specify an acceptable data file format. Will mat do? That's a lot of data, CSV will eat up the whole disk. MAT only 6MB.

But I have a condition: forecasts are made for the period from 2000 to 2015, but only on the basis of the data available before the projected date. That is, if you make a forecast for Q1 2000, you operate with data up to Q1 2000. Selecting predictors from all available data, including 2015, and then using them to predict Q1 2000, even if the model coefficients are calculated from data before Q1 2000, is looking ahead. I've had this error in the past and my model has had staggeringly accurate predictions. In short, my condition is that predictors are selected and the predictive model itself is calculated from data BEFORE the predicted date.

 
gpwr:

Offer accepted. Specify an acceptable data file format. Is mat okay? Too much data, CSV will eat up the whole disk. MAT only 6MB.


First problem is the file. We'll have to think about it. I'm sure MAT takes R - R and MATLAB are very friendly, but I don't know how to do that. As I get ready, I'll write back.

 
gpwr:

I wouldn't call linear regression "cheesy".



"Pitiful" for non-stationary data.

And to summarise my posts: the tool has to fit the problem.

For regressions - the non-stationarity of financial series is the underlying problem. So when choosing a toolkit you need to look at how the chosen tool solves the non-stationarity problem. My mentioned ARIMA solves the non-stationarity problem to some extent, but I've never heard of Taylor series solving the non-stationarity problem. Within the framework of regressions ARIMA is not the only tool, although it is still used in US government structures and it is not the most advanced. Of the well-known ones, I will mention ARCH with a bunch of modifications.

The result of non-stationarity is superfitting of the model. It manifests itself in the fact that you can build a model with extraordinary accuracy, but it does not work outside the training sample, and sneakily: then it works, then it does not. Your words about the superiority of simple models over complex ones is a well-known fact and is based on the fact that a complex model is much easier to super-fit than a simple one.

 
gpwr:

I wouldn't call linear regression "cheesy". And there's no need to assume that I haven't tried a bunch of other models.

Everyone knows that any non-linear model y = f(x1,x2,...) can be decomposed into a Taylor series:

y = a0 + a11*df/dx1*x1 + a12*df/dx2*x2 + ... + a21*d^2f/dx1^2*x1^2 + a22*d^2f/dx2^2*x2^2 + b11*d^2f/dx1/dx2*x1*x2 + ...

Those well versed in mathematics know that this is a decomposition of the function f(x1,x2,...) into polynomial (more precisely monomial) bases x, x^2, x^3 etc. A linear regression retains only the linear terms of this expansion, so it is a first-order approximation. Non-linear bases can be chosen from various well-known polynomials, e.g. Chebyshev, Hermite, Legendre, etc. But the correct method of polynomials selection is QR decomposition or in more general case selection of orthogonal polynomials taking into account statistical properties of inputs x1, x2, ... Neural networks try to do the same decomposition but on exp functions of inputs according to Kolmogorov's theorem. This is quite inconvenient decomposition as exponential functions of inputs are not orthogonal to each other which leads to a lot of numerical problems and variants of solution. In any case, all these decompositions of our non-linear function will have a linear model as a first order approximation. So if a linear approximation (regression) does not give us the expected result, then there is no point in going to higher degrees of non-linearity. Even linear regression can be solved by different methods (RMS, MNM, and other arbitrary error functions), which I have tried them all.

By the way, all econometric ARMA, ARIMA and other models are individual cases of the above model y[n] = f(x1[n-d1],x2[n-d2],...) where some inputs are delayed outputs, i.e. y[n-1], y[n-2], hence the name "autoregressive" models. Although it is not healthy to solve autoregressive models by RMS or CMM methods because the obtained coefficients lead to oscillatory models. We need Burg, Modified Covariance and so on. But I passed this "autoregressive" chapter long ago and I don't want to come back. Although my market model allows selection of a delayed exit as one of the inputs. But so far it has never chosen such an "autoregressive" entry, which means that economic indicators are more suitable for predicting the price than the price itself in the past (which is the basis of the vast majority of traders' methods based on technical analysis)

i guess that's what i meant ))))

I build regression on the set of data and get "so-so" models, and other methods almost always give "so-so" models as well

and if a linear regression gives a "more or less" model, then I notice that other methods may improve it a bit

 
It would be helpful to give a clear definition, or at least a clarification of what is meant by "forecast", "prediction", etc. What is the horizon of a "forecast"? Without this, "forecasts" are meaningless. Because, depending on the horizon, the same "forecast" may be correct for one horizon and incorrect for another horizon. Moreover, such plots may alternate many times.