You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
is only true for the limited class of models that "your universities" have taught you.
I didn't study it in universities. I'm self-taught. I think with my own brains. I question and double-check everything. The necessity of stationarity itself came to me after unsuccessful multiple attempts to obtain a model on non-stationary data. I can prove in details, but I am sorry for time as everybody will stick to their opinions.
My interest in this topic started after watching the market news, where professor Steve Keen bragged about how his economic model predicted the crash of 2008, but the DSGE model used by Fed was unable to predict anything. So I studied the DSGE model and Keen's model. For those who want to follow my path, I suggest starting with this Matlab article about DSGE model. It has all the necessary codes, including the code to swap economic data from the FRED fedreserve database:
http://www.mathworks.com/help/econ/examples/modeling-the-united-states-economy.html
The Fed model uses the following predictors:
Then watch Steve Keen's lectures on YouTube:
https://www.youtube.com/watch?v=aJIE5QTSSYA
https://www.youtube.com/watch?v=DDk4c4WIiCA
https://www.youtube.com/watch?v=wb7Tmk2OABo
And read his articles.
/go?link=http://www.ideaeconomics.org/minsky/
ProfSteveKeen
And for the underdeveloped in easy-to-read language
For the Germans :)
https://translate.google.com.ua/translate?sl=en&tl=ru&js=y&prev=_t&hl=ru&ie=UTF-8&u=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FSteve_Keen&edit-text=
So, the task is to predict the S&P 500 index based on available economic indicators.
Step 1: Find the indicators. The indicators are publicly available here: http://research.stlouisfed.org/fred2/ There are 240,000 of them. The most important one is GDP growth. This indicator is calculated every quarter. Hence our step is 3 months. All indicators on shorter timeframe are recalculated to 3 months, the rest (annual) are discarded. We also discard the indicators for all countries except USA and the indicators which do not have a deep history (at least 15 years). So we laboriously sift out a bunch of indicators, and get about 10 thousand indicators. Let's formulate a more specific task - to forecast S&P 500 index one or two quarters ahead, having 10 thousand economic indicators with a quarterly period. I do everything in Matlab, but it is also possible to do it in R.
Step 2: Convert all the data to a stationary form by differentiating and normalizing. There are a lot of methods. The main thing is that the transformed data can be recovered from the original data. No model will work without stationarity. The S&P 500 series before and after transformation is shown below.
Step 3: Choose a model. You could have a neural network. It can be a multivariablelinear regression. Can be a multi-variable polynomial regression. After trying linear and non-linear models, we conclude that the data is so noisy that there is no point in fitting a non-linear model as the y(x) graph where y = S&P 500 and x = one of 10 thousand indicators is almost a round cloud. Thus, we formulate the task even more concretely: to predict the S&P 500 index for one or two quarters ahead having 10 thousand economic indicators with a quarterly period, using multivariable linear regression.
Step 4: Select the most important economic indicators out of 10 thousand (reduce the dimension of the problem). This is the most important and difficult step. Suppose we take the history of the S&P 500 which is 30 years long (120 quarters). In order to represent the S&P 500 as a linear combination of various economic indicators, it is sufficient to have 120 indicators to accurately describe the S&P 500 during these 30 years. Moreover, the indicators can be absolutely any kind of indicators, in order to create such an accurate model of 120 indicators and 120 values of S&P 500. Thus, we shall reduce the number of inputs below the number of described function values. For example, we are looking for 10-20 most important indicators/inputs. Such tasks of describing data by a small number of inputs selected from a large number of candidate bases (dictionary) are called sparse coding.
There are many methods of selecting predictor inputs. I've tried them all. Here are the main two:
Here are the first 10 indicators with the maximum correlation coefficient with the S&P 500:
Here are the top 10 indicators with maximum mutual information with the S&P 500:
Lag is the lag of the input series relative to the simulated S&P 500 series. As you can see from these tables, different methods of choosing the most important inputs result in different sets of inputs. Since my ultimate goal is to minimize model error, I chose the second method of input selection, i.e. enumerating all inputs and selecting the input that gave the smallest error.
Step 5: Choose a method to calculate the error and coefficients of the model. The simplest method is the RMS method, which is why linear regression using this method is so popular. The problem with the RMS method is that it is sensitive to outliers, i.e. these outliers have a significant effect on the model coefficients. To reduce this sensitivity, the sum of absolute error values can be used instead of the sum of squares of errors, which leads to a least modulus method (LMM) or robust regression. This method has no analytical solution for the model coefficients, unlike linear regression. Usually modules are replaced by smooth/differentiable approximating functions and the solution is numerical and long. I've tried both methods (linear regression and LNM) and haven't noticed any particular advantage of LNM. Instead of DOM, I went in a roundabout way. At the second step of obtaining stationary data by differentiating them, I added a non-linear normalization operation. That is, the original series x[1], x[2], ... x[i-1], x[i] ... is first converted to a difference series x[2]-x[1] ... x[i]-x[i-1] ... and then each difference is normalised by replacing it with sign(x[i]-x[i-1])*abs(x[i]-x[i-1])^u, where 0 < u < 1. When u=1, we get the classical RMS method with its sensitivity to outliers. At u=0, all input series values are replaced by binary +/-1 values with almost no outliers. At u=0.5, we get something close to RMS. The optimal value of u is somewhere between 0.5 and 1.
Note that one of the popular methods of converting data to a stationary form is replacing the values of the series by the difference of logarithms of these values, i.e. log(x[i]) - log(x[i-1]) or log(x[i]/x[i-1]). The choice of this transformation is dangerous in my case as there are many rows with zero and negative values in the dictionary of 10k inputs. The logarithm also has the advantage of reducing the sensitivity of the RMS method to outliers. As such, my sign(x)*|x|^u transform function has the same purpose as log(x) but without the problems associated with zero and negative values.
Step 6: We compute the model prediction by fitting the fresh input data and computing the model output using the same model coefficients as were found by the linear regression in the previous history section. It is important to remember that quarterly economic indicators and S&P 500 values come almost simultaneously (within 3 months). Therefore, in order to predict the S&P 500 for the next quarter, the model should be built between the current quarterly value of S&P 500 and entries delayed by at least 1 quarter (Lag>=1). To predict the S&P 500 one quarter ahead, the model should be plotted between the current quarterly S&P 500 value and inputs delayed by at least 2 quarters (Lag>=2). And so on. The accuracy of the predictions decreases significantly with delays longer than 2.
Step 7: Check the accuracy of the predictions on the previous history. The first method described above (write each input into previous history, pick the input with the lowest RMS, and use the latest value of that input to generate a prediction) gave even worse than random or null predictions. I wondered this: why should an input that fits well in the past have a good predictive ability of the future? It makes sense to select model inputs based on their prior prediction error, rather than based on the smallest regression error on known data.
After all, my model can be described step by step like this:
In short, the choice of predictor depends on its RMS of predictions of previous S&P 500 values. There is no looking into the future. The predictor can change over time, but at the end of the test segment it basically stops changing. My model has chosen PPICRM with a 2 quarter lag as the first input to predict Q2 2015. The linear regression of the S&P 500 by the selected PPICRM(2) input for 1960 - Q4 2014 is shown below. The black circles are the linear regression. Multicoloured circles are historical data for 1960 - Q4 2014. The colour of the circle indicates the time.
Predictions of S&P 500 in stationary form (red line):
S&P 500 predictions in raw form (red line):
The chart shows that the model predicts a rise in the S&P 500 in the second quarter of 2015. Adding a second input increases the prediction error:
1 err1=0.900298 err2=0.938355 PPICRM (2)
2 err1=0.881910 err2=0.978233 PERMIT1 (4)
Where err1 is the regression error. Obviously it decreases from adding a second input. err2 is the root-mean-square prediction error divided by the random prediction error. So err2>=1 means that my model's prediction is not better than random predictions. err2<1 means that my model's prediction is better than random predictions.
PPICRM = Producer Price Index: Crude Materials for Further Processing
PERMIT1 = New Private Housing Units Authorized by Building Permits - In Structures with 1 Unit
The model described above can be rephrased like this. We get 10 thousand economists together and ask them to predict the market for the quarter ahead. Each economist comes through with his or her prediction. But instead of choosing some prediction based on the number of textbooks they have written or Nobel prizes they have received in the past, we wait a few years, collecting their predictions. After a significant number of predictions, we see which economist is more accurate, and we start believing their predictions until some other economist outperforms them in accuracy.
The answer is simple - trade on annual timeframes....
Is this a joke?
:-) i don't know.... if the analysis is on years..... i don't know what to trade on... On m5 it's unlikely to have any practical effect...
As an option, try to apply your analysis to H4...
gpwr:
...After a significant number of predictions, we see which economist is more accurate and start believing his predictions until some other economist surpasses him in accuracy...
Mmmm, that kind of contradicts Taleb with his black swan. How can economists who predict well in one environment predict collapse?
I mean not how, but why will it happen? because they are pretty sure they are right, why would they revise that right, so we get lemmings enthusiastically rushing into the abyss.
Here's Keane's article on his model:
http://keenomics.s3.amazonaws.com/debtdeflation_media/papers/PaperPrePublicationProof.pdf
Although I will say right off the bat that I don't like his model. Its purpose is to explain economic cycles and collapses, not to predict the market or economic performance as GDP with any accuracy. For example, his model predicted that rising household debt would lead to the collapse of the economy. But when exactly did his model predict. Nor is it capable of predicting what will happen after the collapse. All of his theoretical curves go to infinity and sit there indefinitely although the market and economy in the USA recovered in 2009. That must be why he continues to be very negative about this recovery, not believing in it and claiming a worse great depression is coming than Japan's two decade long one. I think this is the problem with all dynamic economic models: they are hard to stabilise and if they become unstable they lock in and can no longer predict the future. Although a famous hedge fund has hired Kean as an economic adviser.