Machine learning in trading: theory, models, practice and algo-trading - page 91

 

a package that is able to select BPs that can be predicted and those that cannot, if I understand correctly

http://www.gmge.org/2012/05/foreca-forecastable-component-analysis/

http://www.gmge.org/2015/01/may-the-forec-be-with-you-r-package-foreca-v0-2-0/

ForeCA: Forecastable Component Analysis
  • 2012.05.22
  • Georg
  • www.gmge.org
Forecastable component analysis (ForeCA) is a novel dimension reduction (DR) technique to find optimally forecastable signals from multivariate time series (published at JMLR). ForeCA works similar to PCA or ICA, but instead of finding high-variance or statistically independent components, it finds forecastable linear combinations. ForeCA is...
 
Vizard_:
And all comers. The z1 archive contains two files train and test. For Target, build model on train, apply to test, post results in % (successfully predicted
of cases) for both samples (train = xx%, test = xx%). Methods and models do not need to be announced, just numbers. It is allowed to use any data manipulation
and mining methods.

1. All of your predictors have no predictive power - without exception, they are all noise.

2. Three models were built: rf, ada, SVM. Here are the results

rf

Call:

randomForest(formula = TFC_Target ~ ,

data = crs$dataset[crs$sample, c(crs$input, crs$target)]

ntree = 500, mtry = 3, importance = TRUE, replace = FALSE, na.action = randomForest::na.roughfix)


Type of random forest: classification

Number of trees: 500

No. of variables tried at each split: 3


OOB estimate of error rate: 49.71%

Confusion matrix:

[0, 0] (0, 1] class.error

[0, 0] 197 163 0.4527778

(0, 1] 185 155 0.5441176

ada

Call:

ada(TFC_Target ~ ., data = crs$dataset[crs$train, c(crs$input,

crs$target)], control = rpart::rpart.control(maxdepth = 30,

cp = 0.01, minsplit = 20, xval = 10), iter = 50)


Loss: exponential Method: discrete Iteration: 50


Final Confusion Matrix for Data:

Final Prediction

True value (0,1] [0,0]

(0,1] 303 37

[0,0] 29 331


Train Error: 0.094


Out-Of-Bag Error: 0.157 iteration= 50

SVM

Summary of the SVM model (built using ksvm):


Support Vector Machine object of class "ksvm"


SV type: C-svc (classification)

parameter : cost C = 1


Gaussian Radial Basis kernel function.

Hyperparameter : sigma = 0.12775132444179


Number of Support Vectors : 662


Objective Function Value : -584.3646

Training error : 0.358571

Probability model included.


Time taken: 0.17 secs

On the test set (I mean rattle, not yours)

Error matrix for the Ada Boost model on test.csv [validate] (counts):


Predicted

Actual (0,1] [0,0]

[0,0] 33 40

(0,1] 35 42


Error matrix for the Ada Boost model on test.csv [validate] (proportions):


Predicted

Actual (0,1] [0,0] Error

[0,0] 0.22 0.27 0.55

(0,1] 0.23 0.28 0.45


Overall error: 50%, Averaged class error: 50%


Rattle timestamp: 2016-08-08 15:48:15 user

======================================================================

Error matrix for the Random Forest model on test.csv [validate] (counts):


Predicted

Actual [0,0] (0,1]

[0,0] 44 29

(0,1] 44 33


Error matrix for the Random Forest model on test.csv [validate] (proportions):


Predicted

Actual [0,0] (0,1] Error

[0,0] 0.29 0.19 0.40

(0,1] 0.29 0.22 0.57


Overall error: 49%, Averaged class error: 48%


Rattle timestamp: 2016-08-08 15:48:15 user

======================================================================

Error matrix for the SVM model on test.csv [validate] (counts):


Predicted

Actual [0,0] (0,1]

[0,0] 41 32

(0,1] 45 32


Error matrix for the SVM model on test.csv [validate] (proportions):


Predicted

Actual [0,0] (0,1] Error

[0,0] 0.27 0.21 0.44

(0,1] 0.30 0.21 0.58


Overall error: 51%, Averaged class error: 51%


Rattle timestamp: 2016-08-08 15:48:15 user

ROC analysis for randomforest.

Confirms the above.

Conclusion.

Your set of predictors is hopeless.

 
Alexey Burnakov:That is, we train until we are blue in the face with the best model on train. Maybe two or three models. Then their one-time test.
Yes, that's what it says in the conditions (build a model in train, apply it in test).
 
mytarmailS:

package that can select BPs which can be predicted and which can't, if I understand correctly

I read, judging by the description it's a very good package (ForeCA, it's even in the R repository, no need to download something from githab). The main feature is that it rates the "predictability" of the data.
And plus it's also important - it can be applied to reduce dimensionality of data. That is, from existing predictors this package will make two new ones, with surprisingly good predictability. At the same time it will sift out the garbage, etc. Reminds something of the Principal Component Method, only instead of components it will produce something of its own.

Very simply - give this package a table with a bunch of predictors (prices, indicators, deltas, garbage, etc.). ForeCA will give out a new table instead of the original one. We use this new table to train our predictive model (gbm, rf, nnet, etc.).
If a little more complicated, it is another package for nuclear transformation of data, with a bias for the stock market.

It all sounds great, just great, even too great, I'll have to check it out.

 
mytarmailS:

a package that is able to select BPs that can be predicted and those that cannot, if I understand correctly

http://www.gmge.org/2012/05/foreca-forecastable-component-analysis/

http://www.gmge.org/2015/01/may-the-forec-be-with-you-r-package-foreca-v0-2-0/

Extremely curious.

The package is installed, documentation is available.

Maybe someone will try it and post the result?

 
Dr.Trader:

I read, judging by the description it is a very good package (ForeCA, it is even in the R repository, no need to download something from githab). The main feature is that it rates the "predictability" of the data.
And plus it's also important - it can be applied to reduce dimensionality of data. That is, from existing predictors this package will make two new ones, with surprisingly good predictability. At the same time it will sift out the garbage, etc. Reminds something of the Principal Component Method, only instead of components it will produce something of its own.

Very simply - give this package a table with a bunch of predictors (prices, indicators, deltas, garbage, etc.). ForeCA will give out a new table instead of the original one. We use this new table to train our predictive model (gbm, rf, nnet, etc.).
If a little more complicated, it is another package for nuclear transformation of data, with a bias for the stock exchange.

It all sounds great, just great, even too great, so I'll have to check it out.

Wouldn't that require a pre-screening?

Guys, take it!

 
SanSanych Fomenko:

Conclusion.

Your set of predictors is hopeless.

Okay)))) but read the conditions carefully -
"post results in % (successfully predicted cases) for both samples (train = xx%, test = xx%). Methods and models don't need to be announced, only numbers".
We are waiting for more results. It is interesting what conclusions are obtained by Mihail Marchukajtes.
 
Vizard_:
Okay)))) but read the conditions carefully -
"post results in % (successfully predicted cases) for both samples (train = xx%, test = xx%). Methods and models don't need to be announced, only numbers".
We are waiting for more results. It is interesting what conclusions are obtained by Mihail Marchukajtes.

You don't need a test!

The model cannot be trained! You can't test an empty space.

 
Let me try..... Just saw....
 
Dr.Trader:

I read the description and it seems to be a very good package (ForeCA, ..............

I don't understand how this "predictability" is calculated and whether it makes any sense if the target is not taken into account