Bayesian regression - Has anyone made an EA using this algorithm? - page 3

 
new-rena:

I would probably try 10 out of 10... //Bayesian doesn't seem to work at all.

http://datareview.info/article/10-tipov-regressii-kakoy-vyibrat/

What do you think?

The idea was to see if the real trading strategy that was presented by Devavrat Shah and Kang Zhang Laboratory for Information and Decision Systems, Department of EECS Massachusetts Institute of Technology was suitable for forex.
 
lilita bogachkova:
The idea was to see if the real trading strategy that was presented was suitable for forex
I thought I made it clear that it wasn't. But the good news is the markets are not limited to forex. Cryptocurrencies have an entry threshold of a few quid.
 
lilita bogachkova:
The idea was to see if the real trading strategy that was presented by Devavrat Shah and Kang Zhang Laboratory for Information and Decision Systems, Department of EECS Massachusetts Institute of Technology was suitable for forex.

I said no, it doesn't work.

The Russian version of the link describes the calculation principle.

"6. Bayesian regression is similar to comb regression, but it is based on the assumption that noise (error) in the data is normally distributed - therefore it is assumed that there is already a general understanding of the structure of the data "

There is no noise in forex, much less a normal distribution. If there were, there would be no borderline between a flat and a trend, there would be no trend reversals, i.e. with a normal distribution. If we had a normal distribution, the price would go with the noise to one side at some angle and then we would be screwed.

 
new-rena:

I said no, it doesn't work.

The Russian version of the link describes the calculation principle.

"6. Bayesian regression is similar to ridge regression, but it is based on the assumption that the noise (error) in the data is normally distributed - so it is assumed that a general understanding of the data structure already exists".

There is no noise in forex, much less its normal distribution. If there were, there would be no borderline between flat and trend, there would be no trend reversals, i.e. with the normal distribution. If we had a normal distribution, the price would go with the noise to one side at some angle and then we would be screwed.

You'd think 'Bitcoin' with noise would go in the same direction at some angle and bleep.

If there is a topic, then the topic is expected to be the topic of conversation. The topic is about the presented strategy(Bayesian regression - Has anyone done an EA using this algorithm?) but not as a choice between regression calculation methods.

 

It seems that everything is implemented by mql - there is ALGLIB- k-mean and multivariate linear-fit - available. It remains to find out how the algorithm works (and no one is interested - some praise R, others are stuck in regression, who cares about what in general). Anyone want to discuss the algorithm?

 
Valerii Mazurenko:

It seems that everything is implemented by mql - there is ALGLIB- k-mean and multivariate linear-fit - available. It remains to find out how the algorithm works (and no one is interested - some praise R, others are stuck in regression, who cares about what in general). Would anyone like to discuss the algorithm?

To do that you need to send everyone in the right direction. k-mean is implemented byCKMeans classlocated indataanalysis.mqh file.

Here is the class itself:

class CKMeans
  {
private:
   //--- private method
   static bool       SelectCenterPP(CMatrixDouble &xy,const int npoints,const int nvars,CMatrixDouble &centers,bool &cbusycenters[],const int ccnt,double &d2[],double &p[],double &tmp[]);
public:
   //--- constructor, destructor
                     CKMeans(void);
                    ~CKMeans(void);
   //--- public method
   static void       KMeansGenerate(CMatrixDouble &xy,const int npoints,const int nvars,const int k,const int restarts,int &info,CMatrixDouble &c,int &xyc[]);
  };
//+------------------------------------------------------------------+
//| Constructor without parameters                                   |
//+------------------------------------------------------------------+
CKMeans::CKMeans(void)
  {

  }
//+------------------------------------------------------------------+
//| Destructor                                                       |
//+------------------------------------------------------------------+
CKMeans::~CKMeans(void)
  {

  }
//+------------------------------------------------------------------+
//| k-means++ clusterization                                         |
//| INPUT PARAMETERS:                                                |
//|     XY          -   dataset, array [0..NPoints-1,0..NVars-1].    |
//|     NPoints     -   dataset size, NPoints>=K                     |
//|     NVars       -   number of variables, NVars>=1                |
//|     K           -   desired number of clusters, K>=1             |
//|     Restarts    -   number of restarts, Restarts>=1              |
//| OUTPUT PARAMETERS:                                               |
//|     Info        -   return code:                                 |
//|                     * -3, if task is degenerate (number of       |
//|                           distinct points is less than K)        |
//|                     * -1, if incorrect                           |
//|                           NPoints/NFeatures/K/Restarts was passed|
//|                     *  1, if subroutine finished successfully    |
//|     C           -   array[0..NVars-1,0..K-1].matrix whose columns|
//|                     store cluster's centers                      |
//|     XYC         -   array[NPoints], which contains cluster       |
//|                     indexes                                      |
//+------------------------------------------------------------------+
static void CKMeans::KMeansGenerate(CMatrixDouble &xy,const int npoints,
                                    const int nvars,const int k,
                                    const int restarts,int &info,
                                    CMatrixDouble &c,int &xyc[])
  {
//--- create variables
   int    i=0;
   int    j=0;
   double e=0;
   double ebest=0;
   double v=0;
   int    cclosest=0;
   bool   waschanges;
   bool   zerosizeclusters;
   int    pass=0;
   int    i_=0;
   double dclosest=0;
//--- creating arrays
   int    xycbest[];
   double x[];
   double tmp[];
   double d2[];
   double p[];
   int    csizes[];
   bool   cbusy[];
   double work[];
//--- create matrix
   CMatrixDouble ct;
   CMatrixDouble ctbest;
//--- initialization
   info=0;
//--- Test parameters
   if(npoints<k || nvars<1 || k<1 || restarts<1)
     {
      info=-1;
      return;
     }
//--- TODO: special case K=1
//--- TODO: special case K=NPoints
   info=1;
//--- Multiple passes of k-means++ algorithm
   ct.Resize(k,nvars);
   ctbest.Resize(k,nvars);
   ArrayResizeAL(xyc,npoints);
   ArrayResizeAL(xycbest,npoints);
   ArrayResizeAL(d2,npoints);
   ArrayResizeAL(p,npoints);
   ArrayResizeAL(tmp,nvars);
   ArrayResizeAL(csizes,k);
   ArrayResizeAL(cbusy,k);
//--- change value
   ebest=CMath::m_maxrealnumber;
//--- calculation
   for(pass=1;pass<=restarts;pass++)
     {
      //--- Select initial centers  using k-means++ algorithm
      //--- 1. Choose first center at random
      //--- 2. Choose next centers using their distance from centers already chosen
      //--- Note that for performance reasons centers are stored in ROWS of CT,not
      //--- in columns. We'll transpose CT in the end and store it in the C.
      i=CMath::RandomInteger(npoints);
      for(i_=0;i_<=nvars-1;i_++)
         ct[0].Set(i_,xy[i][i_]);
      cbusy[0]=true;
      for(i=1;i<=k-1;i++)
         cbusy[i]=false;
      //--- check
      if(!SelectCenterPP(xy,npoints,nvars,ct,cbusy,k,d2,p,tmp))
        {
         info=-3;
         return;
        }
      //--- Update centers:
      //--- 2. update center positions
      for(i=0;i<=npoints-1;i++)
         xyc[i]=-1;
      //--- cycle
      while(true)
        {
         //--- fill XYC with center numbers
         waschanges=false;
         for(i=0;i<=npoints-1;i++)
           {
            //--- change values
            cclosest=-1;
            dclosest=CMath::m_maxrealnumber;
            for(j=0;j<=k-1;j++)
              {
               //--- calculation
               for(i_=0;i_<=nvars-1;i_++)
                  tmp[i_]=xy[i][i_];
               for(i_=0;i_<=nvars-1;i_++)
                  tmp[i_]=tmp[i_]-ct[j][i_];
               v=0.0;
               for(i_=0;i_<=nvars-1;i_++)
                  v+=tmp[i_]*tmp[i_];
               //--- check
               if(v<dclosest)
                 {
                  cclosest=j;
                  dclosest=v;
                 }
              }
            //--- check
            if(xyc[i]!=cclosest)
               waschanges=true;
            //--- change value
            xyc[i]=cclosest;
           }
         //--- Update centers
         for(j=0;j<=k-1;j++)
            csizes[j]=0;
         for(i=0;i<=k-1;i++)
           {
            for(j=0;j<=nvars-1;j++)
               ct[i].Set(j,0);
           }
         //--- change values
         for(i=0;i<=npoints-1;i++)
           {
            csizes[xyc[i]]=csizes[xyc[i]]+1;
            for(i_=0;i_<=nvars-1;i_++)
               ct[xyc[i]].Set(i_,ct[xyc[i]][i_]+xy[i][i_]);
           }
         zerosizeclusters=false;
         for(i=0;i<=k-1;i++)
           {
            cbusy[i]=csizes[i]!=0;
            zerosizeclusters=zerosizeclusters || csizes[i]==0;
           }
         //--- check
         if(zerosizeclusters)
           {
            //--- Some clusters have zero size - rare,but possible.
            //--- We'll choose new centers for such clusters using k-means++ rule
            //--- and restart algorithm
            if(!SelectCenterPP(xy,npoints,nvars,ct,cbusy,k,d2,p,tmp))
              {
               info=-3;
               return;
              }
            continue;
           }
         //--- copy
         for(j=0;j<=k-1;j++)
           {
            v=1.0/(double)csizes[j];
            for(i_=0;i_<=nvars-1;i_++)
               ct[j].Set(i_,v*ct[j][i_]);
           }
         //--- if nothing has changed during iteration
         if(!waschanges)
            break;
        }
      //--- 3. Calculate E,compare with best centers found so far
      e=0;
      for(i=0;i<=npoints-1;i++)
        {
         for(i_=0;i_<=nvars-1;i_++)
            tmp[i_]=xy[i][i_];
         for(i_=0;i_<=nvars-1;i_++)
            tmp[i_]=tmp[i_]-ct[xyc[i]][i_];
         //--- calculation
         v=0.0;
         for(i_=0;i_<=nvars-1;i_++)
            v+=tmp[i_]*tmp[i_];
         e=e+v;
        }
      //--- check
      if(e<ebest)
        {
         //--- store partition.
         ebest=e;
         //--- function call
         CBlas::CopyMatrix(ct,0,k-1,0,nvars-1,ctbest,0,k-1,0,nvars-1);
         //--- copy
         for(i=0;i<=npoints-1;i++)
            xycbest[i]=xyc[i];
        }
     }
//--- Copy and transpose
   c.Resize(nvars,k);
//--- function call
   CBlas::CopyAndTranspose(ctbest,0,k-1,0,nvars-1,c,0,nvars-1,0,k-1);
//--- copy
   for(i=0;i<=npoints-1;i++)
      xyc[i]=xycbest[i];
  }
//+------------------------------------------------------------------+
//| Select center for a new cluster using k-means++ rule             |
//+------------------------------------------------------------------+
static bool CKMeans::SelectCenterPP(CMatrixDouble &xy,const int npoints,
                                    const int nvars,CMatrixDouble &centers,
                                    bool &cbusycenters[],const int ccnt,
                                    double &d2[],double &p[],double &tmp[])
  {
//--- create variables
   bool   result;
   int    i=0;
   int    j=0;
   int    cc=0;
   double v=0;
   double s=0;
   int    i_=0;
//--- create array
   double busycenters[];
//--- copy
   ArrayCopy(busycenters,cbusycenters);
//--- initialization
   result=true;
//--- calculation
   for(cc=0;cc<=ccnt-1;cc++)
     {
      //--- check
      if(!busycenters[cc])
        {
         //--- fill D2
         for(i=0;i<=npoints-1;i++)
           {
            d2[i]=CMath::m_maxrealnumber;
            for(j=0;j<=ccnt-1;j++)
              {
               //--- check
               if(busycenters[j])
                 {
                  for(i_=0;i_<=nvars-1;i_++)
                     tmp[i_]=xy[i][i_];
                  for(i_=0;i_<=nvars-1;i_++)
                     tmp[i_]=tmp[i_]-centers[j][i_];
                  //--- calculation
                  v=0.0;
                  for(i_=0;i_<=nvars-1;i_++)
                     v+=tmp[i_]*tmp[i_];
                  //--- check
                  if(v<d2[i])
                     d2[i]=v;
                 }
              }
           }
         //--- calculate P (non-cumulative)
         s=0;
         for(i=0;i<=npoints-1;i++)
            s=s+d2[i];
         //--- check
         if(s==0.0)
            return(false);
         //--- change value
         s=1/s;
         for(i_=0;i_<=npoints-1;i_++)
            p[i_]=s*d2[i_];
         //--- choose one of points with probability P
         //--- random number within (0,1) is generated and
         //--- inverse empirical CDF is used to randomly choose a point.
         s=0;
         v=CMath::RandomReal();
         //--- calculation
         for(i=0;i<=npoints-1;i++)
           {
            s=s+p[i];
            //--- check
            if(v<=s || i==npoints-1)
              {
               for(i_=0;i_<=nvars-1;i_++)
                  centers[cc].Set(i_,xy[i][i_]);
               busycenters[cc]=true;
               //--- break the cycle
               break;
              }
           }
        }
     }
//--- return result
   return(result);
  }
 

Note the parameter:

K           -   desired number of clusters, K>=1

So you must set the desired number of centres yourself.

 
lilita bogachkova:

You'd think 'Bitcoin' and noise would go in the same direction at some angle and ppl.

If there is a topic, then the topic is expected to be a topic. The topic is about the presented strategy(Bayesian regression - has anyone made an EA using this algorithm?) but not about the choice between regression calculation methods.

Valerii Mazurenko:

Seems to be implemented by mql - there's ALGLIB- k-mean and multivariate linear-fit - available. It remains to find out how the algorithm works (and no one is interested - some people praise R, others are stuck in regression, who cares about what in general). Anyone interested in discussing the algorithm?

Ok.

Practical implementation always starts with a project, provided the project is worthwhile.

Why have you decided that this method is applicable to forex?

 
new-rena:

OK.

Practical implementation always starts with a project, provided the project is worthwhile.

What makes you think this method is applicable to forex?

Now we are talking about how the algorithm works.

About applicability, there will be some task for which it will come in handy. Clustering prices won't work.

 
Dmitry Fedoseev:

We are now talking about how the algorithm works.

That's what we're talking about.

new-rena:

Okay.

Practical implementation always starts with a project, provided the project is worthwhile.

Why have you decided that this method is applicable to forex?

The researchers chose a period without a pronounced trend, which is why the results are interesting.

Bitcoin 2014.01 - 2014.09