Market prediction based on macroeconomic indicators - page 12

 
avtomat:


is only true for the limited class of models that "your universities" have taught you.


I didn't study it in universities. I'm self-taught. I think with my own brains. I question and double-check everything. The necessity of stationarity itself came to me after unsuccessful multiple attempts to obtain a model on non-stationary data. I can prove in details, but I am sorry for time as everybody will stick to their opinions.

My interest in this topic started after watching the market news, where professor Steve Keen bragged about how his economic model predicted the crash of 2008, but the DSGE model used by Fed was unable to predict anything. So I studied the DSGE model and Keen's model. For those who want to follow my path, I suggest starting with this Matlab article about DSGE model. It has all the necessary codes, including the code to swap economic data from the FRED fedreserve database:

http://www.mathworks.com/help/econ/examples/modeling-the-united-states-economy.html

The Fed model uses the following predictors:

% FRED Series Description% ---------------------------------------------------------------------------% COE Paid compensation of employees in $ billions% CPIAUCSL Consumer price index% FEDFUNDS Effective federal funds rate% GCE Government consumption expenditures and investment in $ billions% GDP Gross domestic product in $ billions% GDPDEF Gross domestic product price deflator% GPDI Gross private domestic investment in $ billions% GS10 Ten- year treasury bond yield % HOyear treasury bond yield% HOANBS Non-farm business sector index of hours worked% M1SL M1 money supply (narrow money)% M2SL M2 money supply (broad money)% PCEC Personal consumption expenditures in $ billions% TB3MS Three-month treasury bill yield% UNRATE Unemployment rate


Then watch Steve Keen's lectures on YouTube:

https://www.youtube.com/watch?v=aJIE5QTSSYA

https://www.youtube.com/watch?v=DDk4c4WIiCA

https://www.youtube.com/watch?v=wb7Tmk2OABo

And read his articles.

Modeling the United States Economy - MATLAB & Simulink Example
  • www.mathworks.com
The Smets-Wouters model (2002, 2004, 2007) is a nonlinear system of equations in the form of a Dynamic Stochastic General Equilibrium (DSGE) model that seeks to characterize an economy derived from economic first principles. The basic model works with 7 time series: output, prices, wages, hours worked, interest rates, consumption, and...
 
Theminsky program(economic simulator) is attached, and the site where it is pulled from, the site has a lot of videos explaining how it works, and lots of other stuff.

/go?link=http://www.ideaeconomics.org/minsky/

Files:
 
ProfSteveKeen
ProfSteveKeen
  • www.youtube.com
Rethinking Economics at the London School of Economics I was invited by the Rethinking Economics student association at the London School of Economics to give a talk about Greece, Austerity, Post Keynesian Economics and anticipating the crisis. There...
 
And for the underdeveloped, in legible language
 
Vinin:
And for the underdeveloped in easy-to-read language

For the Germans :)

https://translate.google.com.ua/translate?sl=en&tl=ru&js=y&prev=_t&hl=ru&ie=UTF-8&u=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FSteve_Keen&edit-text=

 
gpwr:

So, the task is to predict the S&P 500 index based on available economic indicators.

Step 1: Find the indicators. The indicators are publicly available here: http://research.stlouisfed.org/fred2/ There are 240,000 of them. The most important one is GDP growth. This indicator is calculated every quarter. Hence our step is 3 months. All indicators on shorter timeframe are recalculated to 3 months, the rest (annual) are discarded. We also discard the indicators for all countries except USA and the indicators which do not have a deep history (at least 15 years). So we laboriously sift out a bunch of indicators, and get about 10 thousand indicators. Let's formulate a more specific task - to forecast S&P 500 index one or two quarters ahead, having 10 thousand economic indicators with a quarterly period. I do everything in Matlab, but it is also possible to do it in R.

Step 2: Convert all the data to a stationary form by differentiating and normalizing. There are a lot of methods. The main thing is that the transformed data can be recovered from the original data. No model will work without stationarity. The S&P 500 series before and after transformation is shown below.

Step 3: Choose a model. You could have a neural network. It can be a multivariablelinear regression. Can be a multi-variable polynomial regression. After trying linear and non-linear models, we conclude that the data is so noisy that there is no point in fitting a non-linear model as the y(x) graph where y = S&P 500 and x = one of 10 thousand indicators is almost a round cloud. Thus, we formulate the task even more concretely: to predict the S&P 500 index for one or two quarters ahead having 10 thousand economic indicators with a quarterly period, using multivariable linear regression.

Step 4: Select the most important economic indicators out of 10 thousand (reduce the dimension of the problem). This is the most important and difficult step. Suppose we take the history of the S&P 500 which is 30 years long (120 quarters). In order to represent the S&P 500 as a linear combination of various economic indicators, it is sufficient to have 120 indicators to accurately describe the S&P 500 during these 30 years. Moreover, the indicators can be absolutely any kind of indicators, in order to create such an accurate model of 120 indicators and 120 values of S&P 500. Thus, we shall reduce the number of inputs below the number of described function values. For example, we are looking for 10-20 most important indicators/inputs. Such tasks of describing data by a small number of inputs selected from a large number of candidate bases (dictionary) are called sparse coding.

There are many methods of selecting predictor inputs. I've tried them all. Here are the main two:

  1. We classify all 10k data by their predictive ability of the S&P 500. Predictive ability can be measured by correlation coefficient or mutual information.
  2. Let's look through all 10 thousand indicators one by one and select the one that has given the linear model y_mod = a + b*x1 describing S&P 500 with the minimum error. Then we select the second input again by trying the remaining 10 thousand -1 indicators so that it describes the residue y - y_mod = c + d*x2 with the minimum error. And so on. This method is called stepwise regression or matching pursuit.

Here are the first 10 indicators with the maximum correlation coefficient with the S&P 500:

Series ID Lag Corr Mut Info
'PPICRM 2 0.315 0.102
'CWUR0000SEHE' 2 0.283 0.122
'CES1021000001' 1 0.263 0.095
'B115RC1Q027SBEA' 2 0.262 0.102
'CES1000000034' 1 0.261 0.105
'A371RD3Q086SBEA' 2 0.260 0.085
'B115RC1Q027SBEA' 1 0.256 0.102
'CUUR0000SAF111' 1 0.252 0.117
'CUUR0000SEHE'. 2 0.251 0.098
'USMINE' 1 0.250 0.102

Here are the top 10 indicators with maximum mutual information with the S&P 500:

Series ID Lag Corr Mut Info
CPILEGSL 3 0.061 0.136
'B701RC1Q027SBEA' 3 0.038 0.136
'CUSR0000SAS' 3 0.043 0.134
'GDPPOT' 3 0.003 0.134
'NGDPPOT' 5 0.102 0.134
'OTHSEC' 4 0.168 0.133
3 'LNU01300060' 3 0.046 0.132
'LRAC25TTUSM156N' 3 0.046 0.132
'LRAC25TTUSQ156N' 3 0.046 0.131
'CUSR0000SAS' 1 0.130 0.131

Lag is the lag of the input series relative to the simulated S&P 500 series. As you can see from these tables, different methods of choosing the most important inputs result in different sets of inputs. Since my ultimate goal is to minimize model error, I chose the second method of input selection, i.e. enumerating all inputs and selecting the input that gave the smallest error.

Step 5: Choose a method to calculate the error and coefficients of the model. The simplest method is the RMS method, which is why linear regression using this method is so popular. The problem with the RMS method is that it is sensitive to outliers, i.e. these outliers have a significant effect on the model coefficients. To reduce this sensitivity, the sum of absolute error values can be used instead of the sum of squares of errors, which leads to a least modulus method (LMM) or robust regression. This method has no analytical solution for the model coefficients, unlike linear regression. Usually modules are replaced by smooth/differentiable approximating functions and the solution is numerical and long. I've tried both methods (linear regression and LNM) and haven't noticed any particular advantage of LNM. Instead of DOM, I went in a roundabout way. At the second step of obtaining stationary data by differentiating them, I added a non-linear normalization operation. That is, the original series x[1], x[2], ... x[i-1], x[i] ... is first converted to a difference series x[2]-x[1] ... x[i]-x[i-1] ... and then each difference is normalised by replacing it with sign(x[i]-x[i-1])*abs(x[i]-x[i-1])^u, where 0 < u < 1. When u=1, we get the classical RMS method with its sensitivity to outliers. At u=0, all input series values are replaced by binary +/-1 values with almost no outliers. At u=0.5, we get something close to RMS. The optimal value of u is somewhere between 0.5 and 1.

Note that one of the popular methods of converting data to a stationary form is replacing the values of the series by the difference of logarithms of these values, i.e. log(x[i]) - log(x[i-1]) or log(x[i]/x[i-1]). The choice of this transformation is dangerous in my case as there are many rows with zero and negative values in the dictionary of 10k inputs. The logarithm also has the advantage of reducing the sensitivity of the RMS method to outliers. As such, my sign(x)*|x|^u transform function has the same purpose as log(x) but without the problems associated with zero and negative values.

Step 6: We compute the model prediction by fitting the fresh input data and computing the model output using the same model coefficients as were found by the linear regression in the previous history section. It is important to remember that quarterly economic indicators and S&P 500 values come almost simultaneously (within 3 months). Therefore, in order to predict the S&P 500 for the next quarter, the model should be built between the current quarterly value of S&P 500 and entries delayed by at least 1 quarter (Lag>=1). To predict the S&P 500 one quarter ahead, the model should be plotted between the current quarterly S&P 500 value and inputs delayed by at least 2 quarters (Lag>=2). And so on. The accuracy of the predictions decreases significantly with delays longer than 2.

Step 7: Check the accuracy of the predictions on the previous history. The first method described above (write each input into previous history, pick the input with the lowest RMS, and use the latest value of that input to generate a prediction) gave even worse than random or null predictions. I wondered this: why should an input that fits well in the past have a good predictive ability of the future? It makes sense to select model inputs based on their prior prediction error, rather than based on the smallest regression error on known data.

After all, my model can be described step by step like this:

  1. We download economic data from stlouisfed (about 10k indicators).
  2. Transform data to a stationary form and normalize it.
  3. Select a linear model of the S&P 500 index, analytically solved by the RMS method (linear regression).
  4. We select the length of history (1960 - Q2 2015) and partition it into a training period (1960 - Q4 1999) and a test period (Q1 2000 - Q2 2015).
  5. We start the predictions with 1960 + N + 1 years, where N*4 is the initial number of known quarterly values of the S&P 500.
  6. From the first N data a linear model y_mod = a + b*x is constructed for each economic indicator, where y_mod is the S&P 500 model and x is one of the economic indicators.
  7. We predict N + 1 bars with each model.
  8. Calculate prediction errors of N + 1 bars by each model. Remember those errors.
  9. We increase the number of known values of the S&P 500 by 1, i.e. N + 1, and repeat steps 6-9 until we reach the end of the training period (Q4 1999). At this step, we have memorized the prediction errors from 1960 + N +1 years to Q4 1999 for each economic indicator.
  10. We start testing the model on the second history interval (Q1 2000 - Q2 2015).
  11. For each of 10 thousand inputs we calculate root-mean-square error of predictions for the period 1960 - Q4 1999.
  12. From 10 thousand inputs, choose the one with the lowest RMS prediction error from 1960 - Q4 1999.
  13. We construct a linear model y_mod = a + b*x for each economic indicator for 1960 - Q4 1999.
  14. We predict Q1 2000 by each model.
  15. We select the prediction of the selected input with the lowest RMS of the predictions for the previous time period (1960 - Q4 1999) as our main prediction of Q1 2000.
  16. Calculate the prediction errors of all inputs on Q1 2000 and add them to the RMS of the same inputs in the previous time interval (1960 - Q4 1999).
  17. We go to Q2 2000 and repeat steps 12-17 until we reach the end of the test section (Q2 2015) with an unknown value of the S&P 500, the prediction of which is our main target.
  18. We accumulate the prediction errors for Q1 2000 - Q4 2014 made by the inputs with the lowest RMS of the predictions in the previous segments. This error (err2) is our out-of-sample prediction error model.

In short, the choice of predictor depends on its RMS of predictions of previous S&P 500 values. There is no looking into the future. The predictor can change over time, but at the end of the test segment it basically stops changing. My model has chosen PPICRM with a 2 quarter lag as the first input to predict Q2 2015. The linear regression of the S&P 500 by the selected PPICRM(2) input for 1960 - Q4 2014 is shown below. The black circles are the linear regression. Multicoloured circles are historical data for 1960 - Q4 2014. The colour of the circle indicates the time.


Predictions of S&P 500 in stationary form (red line):

S&P 500 predictions in raw form (red line):

The chart shows that the model predicts a rise in the S&P 500 in the second quarter of 2015. Adding a second input increases the prediction error:

1 err1=0.900298 err2=0.938355 PPICRM (2)

2 err1=0.881910 err2=0.978233 PERMIT1 (4)

Where err1 is the regression error. Obviously it decreases from adding a second input. err2 is the root-mean-square prediction error divided by the random prediction error. So err2>=1 means that my model's prediction is not better than random predictions. err2<1 means that my model's prediction is better than random predictions.

PPICRM = Producer Price Index: Crude Materials for Further Processing

PERMIT1 = New Private Housing Units Authorized by Building Permits - In Structures with 1 Unit

The model described above can be rephrased like this. We get 10 thousand economists together and ask them to predict the market for the quarter ahead. Each economist comes through with his or her prediction. But instead of choosing some prediction based on the number of textbooks they have written or Nobel prizes they have received in the past, we wait a few years, collecting their predictions. After a significant number of predictions, we see which economist is more accurate, and we start believing their predictions until some other economist outperforms them in accuracy.

The answer is simple - trade on annual timeframes....
 
IvanIvanov:
The answer is simple - trade on annual timeframes....
Is this a joke?
 
gpwr:
Is this a joke?

:-) i don't know.... if the analysis is on years..... i don't know what to trade on... On m5 it's unlikely to have any practical effect...

As an option, try to apply your analysis to H4...

 

gpwr:

...After a significant number of predictions, we see which economist is more accurate and start believing his predictions until some other economist surpasses him in accuracy...


Mmmm, that kind of contradicts Taleb with his black swan. How can economists who predict well in one environment predict collapse?

I mean not how, but why will it happen? because they are pretty sure they are right, why would they revise that right, so we get lemmings enthusiastically rushing into the abyss.

 

Here's Keane's article on his model:

http://keenomics.s3.amazonaws.com/debtdeflation_media/papers/PaperPrePublicationProof.pdf

Although I will say right off the bat that I don't like his model. Its purpose is to explain economic cycles and collapses, not to predict the market or economic performance as GDP with any accuracy. For example, his model predicted that rising household debt would lead to the collapse of the economy. But when exactly did his model predict. Nor is it capable of predicting what will happen after the collapse. All of his theoretical curves go to infinity and sit there indefinitely although the market and economy in the USA recovered in 2009. That must be why he continues to be very negative about this recovery, not believing in it and claiming a worse great depression is coming than Japan's two decade long one. I think this is the problem with all dynamic economic models: they are hard to stabilise and if they become unstable they lock in and can no longer predict the future. Although a famous hedge fund has hired Kean as an economic adviser.