Market prediction based on macroeconomic indicators - page 51

 
Vladimir:
I agree, I've even written such a thing myself somewhere here. Picking a predictor on all history and then using a forward test from the same history is a self-deception that everyone from traders to scientists does. Many articles written on predicting the economy start with a list of selected predictors and then report "great" results. Traders choose strategies based on e.g. rebound or breakout because "it worked in the past" and hope it will work in the future and show forward tests from the past without realizing that their choice of the strategy itself was based on their study of ALL history, including history for forward testing. For me, the forward test of my GDP and market model will be the future, so I opened this thread - posting predictions, see how they came true in real time. The work is not finished. There are a lot of ideas for non-linear data transformation. For example, some predictors like HOUST affect GDP growth via some threshold function.

Non-linearity, yes.

But how do you find a non-linear function? By trying different variants? Or just use neural networks?

 
Дмитрий:

Non-linearity, yes.

But how do you find a non-linear function? By trying different variants? Or just use neural networks?

You could also try Random Forest, it's easier to use and also simulates non-linearities.

Example: https://www.quora.com/How-does-random-forest-work-for-regression-1

The appearance of the resulting function can also be evaluated with the built-in tools.

How does random forest work for regression? - Quora
  • www.quora.com
I think the first step would be to understand how decision trees work in a regression problem. You might be aware of CART - Classification and Regression Trees. When dealing with regression problem you try to predict real valued numbers at the leaf nodes which would look something like this for singular scale feature: Now the question comes how...
 
Alexey Burnakov:

You can also try Random Forest, it is easier to use and also simulates non-linearities.

Example: https://www.quora.com/How-does-random-forest-work-for-regression-1

The appearance of the resulting function can also be evaluated with the built-in tools.

Thank you, Random Forest is familiar to me
 
Дмитрий:

Non-linearity, yes.

But how do you find a non-linear function? By trying different variants? Or just use neural networks?

So let's think together. I wanted to pick a simple step function:

out = -1 if input < threshold, +1 if input > threshold

Where threshold is our unknown threshold, different for different predictors. For example, for the S&P500 and GDP increments threshold = 0, i.e. the falls of these indicators themselves are important, not the threshold cutoff. For other economic indicators it is not so simple. The threshold needs to be adjusted. The modelling could look like this:

1. determine the type of data: rising (S&P500, GDP,...) or ranging (unemployment rate, federal rates,...) by comparing the values at the beginning and end of history - you should think of a robust automatic method of determining the data.

2. If the data is increasing, then replace it with increments of x[i] - x[i-1]. If range, then don't change.

3. Choose a simulated output such as GDP increments (growth) and apply a step function to it with zero threshold, i.e. GDP growth is replaced by a +/-1 binary series.

4. We start to enumerate all predictors and their delayed versions for predictive ability in this way. We take a predictor or its increments depending on point 2, measure its range over the entire history, divide this range by e.g. 10 and obtain 9 thresholds. Using each of the 9 thresholds, replace the predictor with a binary series of +/-1, and count the number of +1 and -1 of our predictor that coincided with +1 and -1 of our simulated series (GDP), obtaining M coincidences for N complete bars in the history. We calculate a function of M/N for each of 9 predictors and leave the threshold which gives the highest frequency of coincidence. And repeat this for each predictor. This should be a quick calculation.

If anyone wants to help, take data I posted here a few pages ago and try it. I want to finish the linear model for now and then move on to the non-linear one.

PS: Since there are many more positive values (+1) than negative (-1) in the series of S&P500 and GDP increments, you can come up with a modification of the method described above to weight the coincidence of negative values more heavily, thus emphasizing the declines rather than increases in these indicators. For example, the goodness of fit indicator might look like this:

J = M(+1)/N + W*M(-1)/N

where W is a weighting >1 reflecting how much less negative values there are in GDP growth than positive values.

A big problem will arise if we want to find a model with 2 or more predictors. We have to think how to connect these predictors: with AND, OR or XOR functions. When connected, the thresholds will need to be optimized again.

 
Vladimir:

A big problem will arise if we want to find a model with 2 or more predictors. Here we need to think about how to connect these predictors: with AND, OR or XOR functions. When connected, the thresholds will need to be optimized again.

If you feed data to the grid, it will find the thresholds "automatically", and at once for all predictors that have been included in the input vector.
 
Vladimir:

If anyone wants to help, take the data I posted here a few pages ago and try it out.

Can the same data be converted to csv?
 
Stanislav Korotky:
Can the same data be converted to csv?
Attached. The first column shows dates in Matlab format, from Q1 1959 to Q4 2015. The remaining columns are unconverted economic and financial figures. GDP is in the 1168th column.
Files:
Data.zip  1037 kb
 

Finished the linear GDP predictions. Here are two quarters ahead:


There are 4 predictors in the model, although 3 is sufficient. After 3-4 predictors, the remainder looks like noise. Predicting the S&P500 with the same method as GDP works very poorly. I don't even show it here. I also quickly tried non-linear transformations with a step function as I described earlier. It works worse than linear regression.

Waiting for the release of the new GDP value at the end of April. Resting for now.

 
Vladimir:
Attached. The first column shows dates in Matlab format, from Q1 1959 to Q4 2015. The remaining columns are unconverted economic and financial figures. GDP is in 1168 column.
Thank you. However, it would be desirable to have the names of all the columns. Also, as far as I understand, the dates were not copied quite correctly (with loss of number accuracy ), so the entries go in groups of 11 with the same date.
 
Vladimir:

Finished the linear GDP predictions. Here's two quarters ahead:

The picture is nice, but could we instead calculate at each step the product of the predicted change by the actual change, sum over the whole period and divide by the same product, but in which the predicted and actual changes are taken modulo?