You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
3: Yields on securities of different maturities plus one more
Buy & hold annual percentage rate, 1974 - present: APR = 7.35%
Buy & sell strategy using economic indicators: APR = 13.18%.
This strategy gave a sell signal in December 2019. No buy signal has been given so far. Apparently the market will go down.
Buy and hold.
It would be interesting to see a forward test of such a model, but it is not possible here.
Right now, as far as I understand, everyone is waiting for the election.
3: Yields on securities of different maturities plus one more
Buy & hold annual percentage rate, 1974 - present: APR = 7.35%
Buy & sell strategy using economic indicators: APR = 13.18%.
This strategy gave a sell signal in December 2019. So far no buy signal has been given. Apparently the market will go down.
Are we talking about a specific instrument or a general indicator?
So, the task is to predict the S&P 500 index based on the available economic indicators.
Step 1: Find the indicators. The indicators are publicly available here: http://research.stlouisfed.org/fred2/ There are 240,000 of them. The most important is the growth of GDP. This indicator is calculated every quarter. Hence our step - 3 months. All indicators for a shorter period are recalculated for a 3-month period, the rest (annual) are discarded. We also discard indicators for all countries except the United States and indicators that do not have a deep history (at least 15 years). So, with painstaking work, we filter out a bunch of indicators and get about 10 thousand indicators. We formulate a more specific task of predicting the S&P 500 index one or two quarters ahead, having 10 thousand economic indicators available with a quarterly period. I do everything in MatLab, although it is possible in R.
Step 2: Convert all data to stationary form by differentiation and normalization. There are many methods here. The main thing is that the original data can be restored from the converted data. Without stationarity, no model will work. The S&P 500 series before and after the conversion is shown below.
Step 3: Choose a model. Maybe a neural network. One can do multi-variable linear regression . You can do multi-variable polynomial regression. After testing linear and nonlinear models, we come to the conclusion that the data is so noisy that it makes no sense to enter a nonlinear model. y(x) graph, where y = S&P 500 and x = one of 10 thousand indicators, is a nearly circular cloud. Thus, we formulate the task even more specifically: predict the S&P 500 index one or two quarters ahead, having 10 thousand economic indicators with a quarterly period, using multi-variable linear regression.
Step 4: We select the most important economic indicators from 10 thousand (reduce the dimension of the problem). This is the most important and difficult step. Let's say we take the history of the S&P 500 as long as 30 years (120 quarters). To represent the S&P 500 as a linear combination of economic indicators of various kinds, it is enough to have 120 indicators to accurately describe the S&P 500 over these 30 years. Moreover, the indicators can be absolutely any in order to create such an accurate model of 120 indicators and 120 S&P 500 values. So you need to reduce the number of inputs below the number of described function values. For example, we are looking for the 10-20 most important input indicators. Such tasks of describing data with a small number of inputs selected from a huge number of candidate bases (dictionary) are called sparse coding.
There are many methods for selecting predictor inputs. I tried them all. Here are the main two:
Here are the top 10 indicators with the highest correlation coefficient with the S&P 500:
Here are the top 10 indicators with the most mutual information with the S&P 500:
Lag is the delay of the input series with respect to the simulated S&P 500 series. As can be seen from these tables, different methods of choosing the most important inputs result in different sets of inputs. Since my ultimate goal is to minimize model error, I chose the second input selection method, i.e. enumeration of all inputs and selection of the input that gave the least error.
Step 5: Choose a method for calculating the error and coefficients of the model. The simplest method is the COEX method, which is why linear regression using this method is so popular. The problem with the RMS method is that it is sensitive to outliers, i.e. these outliers significantly affect the coefficients of the model. To reduce this sensitivity, the sum of the absolute values of the errors can be used instead of the sum of the squared errors, which leads to the method of least moduli (MLM) or robust regression. This method does not have an analytical solution for the model coefficients unlike linear regression. Usually modules are replaced by smooth/differentiable approximating functions and the solution is carried out by numerical methods and takes a long time. I tried both methods (lean regression and MHM) and did not notice much advantage of MHM. Instead of MHM, I took a detour. In the second step of obtaining stationary data by differentiating them, I added a non-linear normalization operation. That is, the original series x[1], x[2], ... x[i-1], x[i] ... is first converted into a difference series x[2]-x[1] ... x [i]-x[i-1] ... and then each difference is normalized by replacing it with sign(x[i]-x[i-1])*abs(x[i]-x[i-1] )^u, where 0 < u < 1. For u=1, we get the classical COSE method with its sensitivity to outliers. At u=0, all values of the input series are replaced by binary values +/-1 with almost no outliers. For u=0.5, we get something close to MNM. The optimal value of u lies somewhere between 0.5 and 1.
It should be noted that one of the popular methods for converting data to a stationary form is to replace the values of the series with the difference in the logarithms of these values, i.e. log(x[i]) - log(x[i-1]) or log(x[i]/x[i-1]). The choice of such a transformation is dangerous in my case, since the dictionary of 10 thousand entries has many rows with zero and negative values. The logarithm also has the benefit of reducing the sensitivity of the RMS method to outliers. Essentially, my transformation function sign(x)*|x|^u has the same purpose as log(x), but without the problems associated with zero and negative values.
Step 6: Calculate the model prediction by substituting the fresh input data and calculating the model output using the same model coefficients that were found by linear regression in the previous history segments. Here it is important to keep in mind that the quarterly values of economic indicators and the S&P 500 come almost simultaneously (with an accuracy of 3 months). Therefore, to predict the S&P 500 for the next quarter, the model must be built between the current quarterly value of the S&P 500 and entries delayed by at least 1 quarter (Lag>=1). To predict the S&P 500 one quarter ahead, the pattern must be built between the current quarterly value of the S&P 500 and entries delayed by at least 2 quarters (Lag>=2). Etc. The accuracy of predictions decreases significantly with increasing delay greater than 2.
Step 7: Check the accuracy of the predictions on the previous history. The original technique described above (putting each input into the previous history, choosing the input that gives the smallest MSD, and calculating the prediction from the fresh value of that input) produced a prediction MSD that was even worse than the random or null predictions. I asked myself this question: why should an entrance that fits well into the past have a good predictable ability for the future? It makes sense to select model inputs based on their previous prediction error, rather than based on the smallest regression error on the known data.
In the end, my model can be described step by step like this:
In short, the choice of a predictor depends on their RMS of previous S&P 500 predictions. There is no looking ahead. The predictor can change over time, but at the end of the test segment, it basically stops changing. My model selected PPICRM with a 2-quarter lag as the first input to predict Q2 2015. Linear regression of the S&P 500 with the selected PPICRM(2) input for 1960 - Q4 2014 is shown below. Black circles - linear regression. Multi-colored circles - historical data for 1960 - Q4 2014. The color of the circle indicates the time.
Stationary S&P 500 predictions (red line):
S&P 500 predictions in raw form (red line):
The graph shows that the model predicts the growth of the S&P 500 in the second quarter of 2015. Adding a second input increases the prediction error:
1 err1=0.900298 err2=0.938355 PPICRM (2)
2 err1=0.881910 err2=0.978233 PERMIT1 (4)
where err1 is the regression error. It is obvious that it decreases from the addition of a second input. err2 is the root-mean-square prediction error divided by the random prediction error. That is, err2>=1 means that my model's prediction is no better than random predictions. err2<1 means my model's prediction is better than random predictions.
PPICRM = Producer Price Index: Crude Materials for Further Processing
PERMIT1 = New Private Housing Units Authorized by Building Permits - In Structures with 1 Unit
The model described above can be rephrased in this way. We gather 10 thousand economists and ask them to predict the market for the quarter ahead. Each economist comes up with his own prediction. But instead of picking a prediction based on the number of textbooks they've written or the number of Nobel Prizes they've won in the past, we wait a few years collecting their predictions. After a significant number of predictions, we see which economist is more accurate and begin to believe his predictions until some other economist surpasses him in accuracy.
Wrong. Even though the topic is called "predicting the market based on macroeconomic indicators", the indicators are irrelevant in this analysis. Just variables substituted into some formula after being mathematically depersonalised and de-identified with all the external semantic and logical links to the World. Dry numbers, arranged in abstract numerical series, serve as a model for a neural network, which predicts... no, not the market, but the very same numerical series.
It turns out the technical analysis on the fundamental data.
Fundamental analysis is not that simple.There are many factors affecting prices not falling under economic indicators. These are elections, Brexit, all sorts of rumours and so on. They can affect the price more than all the economic indicators.
You get technical analysis onfundamental data.
It's not all that simple withfundamental analysis.There are many factors affecting prices which do not fall under economic indicators. They are elections, Brexit, rumours of all kinds and so on. They can affect the price more than all the economic indicators.
To Peter: I am not predicting the S&P500 directly. The purpose of this paper is to predict recessions in order to get out of the market before they occur and improve the profitability of the buy&hold strategy. Although the S&P500 contains stocks of 500 companies, it is driven by institutional investors who buy and sell the index itself (or its options), not its components. 13% a year doesn't seem like much, but enough for big money where turnover is important. Bernie Madoff attracted his clients by promising them a modest 10% a year, which he failed to achieve.
To Uladzimir: I agree that price fluctuations depend on different social and political events, elections, brexits, infections etc. In the end it all comes down to supply and demand for products/services, unemployment, and other indicators of the economy. I don't care about day-to-day market price fluctuations. Even a simple buy&hold strategy earns 7.4% a year. I care about avoiding long positions during recessions and improving the profitability of this strategy. By the way, another strategy is buying real estate. But that only yields 5% a year, in the US.
Wrong. Even though the topic is called "predicting the market based on macroeconomic indicators", the indicators are irrelevant in this analysis. Just variables substituted into some formula after being mathematically depersonalised and de-identified with all the external semantic and logical links to the World. Dry numbers, arranged in abstract numerical series, serve as a model for a neural network, which predicts... no, not the market, but the very same numerical series.
So, what is the forecast for the S&P500 ?
I'm sorry, but all this for 5-13% a year??? It's not worth the effort.)