Causal analysis of time series using transfer entropy

MetaTrader 5 — Examples | 26 July 2024, 19:26

938

Francis Dube

Introduction

Transfer entropy is a statistical tool that quantifies the amount of information transferred from one time series to another, providing insights into the nature and behavior of a target variable. In this article, we delve into the concept of statistical causality, calculated in terms of transfer entropy. We explore how this method can reveal the direction of causal influence between various processes. Additionally, we provide a detailed description of an MQL5 implementation for measuring transfer entropy, demonstrating how this technique can be practically applied to analyze potentially coupled time series. By leveraging transfer entropy, we aim to identify variables that can enhance prediction tasks.

Causality

Empirical data can be deceiving. Just because two variables seem to move in tandem does not mean one causes the other, which is why the saying "correlation is not causation" rings true. Correlation simply measures how connected two variables are, not why they are connected. For instance, imagine a strong correlation between ice cream sales and a stock price during the summer. This doesn't mean buying ice cream makes the stock go up! A more likely culprit is a hidden factor, like the season itself, affecting both variables independently. Similarly, a link between a company's stock and gold prices might exist, but the real cause could be something else entirely, like overall market sentiment or inflation influencing both prices. These examples highlight that correlated data can be misleading. They show a connection, but not the reason behind it. To truly understand if one thing causes another, we need more advanced tools.

Pendulum

The concept of causality, the notion that one event brings about another, is fundamental to scientific exploration. However, defining causality precisely presents a multifaceted challenge with deep philosophical, physical, and statistical considerations. Ideally, a cause would invariably produce a singular effect. However, isolating a single causal factor from the often complex web of influences impacting an outcome can be difficult. For instance, a surge in trading volume might correlate with a rise in stock price, but other factors, such as market sentiment and economic data releases, could also play a significant role. In such scenarios, researchers employ statistical techniques to infer causal relationships.

The nature of causality, whether deterministic (guaranteed outcome) or probabilistic (influencing the likelihood of the subsequent event), hinges on the underlying process. In deterministic systems, the first event demonstrably leads to the second, as observed in the predictable fall of a dropped object. Conversely, in probabilistic systems, the focus shifts to whether the first event enhances our ability to predict the occurrence of the second. For example, while recent rainfall might be associated with subsequent blooming flowers, other environmental factors could also contribute. In such cases, the question becomes whether knowledge of the first event improves our ability to predict the second event.

Economist Clive Granger, building on Norbert Wiener's work, developed the concept of causal relations, positing that a first signal causes a second signal if future values of the second signal can be better explained using past information from both the first and second signals, as opposed to just using lagged values of the second signal alone. Granger based his definition of causality on two principles. First, an effect cannot manifest before its cause. Second, a cause will contain unique information transferred to the effect. These principles suggest that, to quantify causality, we need to understand the temporal properties of the variables involved, as well as some measure of their information content. This makes time series well-suited to causal analysis.

Cause and Effect

Due to the nature of time series data, we can analyze how information from one series at a certain time affects the predictability of another series at a later time. Granger defines causality as a reduction in uncertainty. If knowing the past values of series X improves our prediction of the future value of series Y compared to using just the past values of Y itself, then X is said to be predictive of Y. Based on this idea, Granger developed tests for causality using lagged time series and autoregressive models. He proposed that there's no causal relationship between X and Y unless including past values of X significantly improves the prediction of Y's future values. This improvement is typically measured by a reduction in prediction error, such as a better fit in a regression model. Granger causality allows us to statistically detect relationships between time series. However, it is important to note its limitations. Granger causality only identifies a directional relationship, not necessarily a definitive causal mechanism. There could be other causes at play outside of those we have data on. Additionally, since Granger causality is essentially built on the autoregression framework, it is most effective at uncovering linear causal relations. Non-linear causality requires a different approach.

Mathematically, Granger causality can be expressed by considering two time series, X and Y. Lagged values from each are denoted by X(t−k) and Y(t−k), representing a lag at k. The maximum lag considered is denoted as p. Applying the autoregression model, the future value of Y is regressed on its own past values.

Autoregression formula

This expression considers the future value of Y in terms of its past values only. By introducing X to the model, the future values are expressed in terms of past values from both X and Y.

Vector Autoregression formula

If the inclusion of the past values of X significantly improves the prediction of Y compared to the model using only the past values of Y, then X is said to Granger-cause Y. This is typically assessed by testing the null hypothesis that the coefficients are jointly zero. If this null hypothesis is rejected, it indicates that X provides significant predictive information about Y beyond what is contained in the past values of Y alone. To test whether X Granger-causes Y, we compare the two models under the hypothesis test:

Null Hypothesis: X does not Granger-cause Y.

Alternative Hypothesis: X Granger-causes Y.

An F-test is used to compare the fit of the restricted model (without X) and the unrestricted model (with X) by examining the residuals of the respective models. The restricted sum of squared residuals are the residuals from the model without X, and the unrestricted sum of squared residuals come from the model with X. The F-statistic is calculated as:

F-statistic formula

Where n is the number of observations. The calculated F-statistic is compared to the critical value from the F-distribution with p and n − 2p − 1 degrees of freedom. If the F-statistic is greater than the critical value, we reject the null hypothesis and conclude that X Granger-causes Y. Alternatively, the F-statistic can also be calculated using a one-way analysis of variance (ANOVA) test. Whose formula is given below.

ANOVA based Granger Causality

Transfer entropy

In the early days of information theory, scientists used mutual information to understand how coupled processes interacted. This concept, based on Claude Shannon's entropy, tells us if the information within one time series overlaps with another. In simpler terms, it reveals if we can encode both series together using less information than encoding them separately. Because of this, mutual information is sometimes called redundancy. One process shares information with another, allowing the second process to be described efficiently by reusing information already captured from the first process.

Formally, mutual information between two stochastic processes, X(t) and Y(t), is established when the sum of their marginal entropies exceeds the joint entropy of the combined system. This mathematical relationship reflects the reduction in uncertainty about the combined system compared to the individual processes. In other words, it captures the degree to which information about one process can be used to reduce the inherent entropy associated with the other. Since entropy is solely determined by the underlying probability distribution, any such distribution can be characterized by an associated entropy value. This value quantifies the level of unexpectedness associated with a particular outcome, given the known probability distribution.

This concept becomes particularly relevant in the context of Granger causality. When investigating potential causal relationships between time series, the goal is to reduce the uncertainty associated with a target process by incorporating information from a potential source process. If the inclusion of a secondary time series provably reduces the entropy of the target process distribution, it suggests the presence of a statistical causal influence from the source series to the target series. This reduction is referred to as transfer entropy.

Transfer entropy (TE) builds on the concept of Kullback-Leibler divergence to measure the direction of information transfer between two time series. Specifically, TE is based on the idea of conditional mutual information and can be expressed using Kullback-Leibler (KL) distance, also known as Kullback-Leibler divergence or relative entropy. KL divergence measures the difference between two probability distributions. In TE, the KL distance measures the difference between the joint probability distribution of the current state of Y and the past states of X and Y, and the product of the marginal distributions of these states. Mathematically, the transfer of information from a time series X to a time series Y can be expressed as:

Transfer Entropy formula

where y(t+1) is the future state of Y, y(t) is the past state of Y, and x(t) is the past state of X. This formulation highlights that transfer entropy measures how much the probability distribution of y(t+1) changes when considering the information from x(t) in addition to y(t).

In 2009, Lionel Barnett, Adam Barrett, and Anil Seth co-authored the paper, "Granger Causality and Transfer Entropy Are Equivalent for Gaussian Variables," demonstrating that when time series follow a Gaussian distribution, transfer entropy is equivalent to half of the F-statistic for Granger causality.

This result provides the definition of linear transfer entropy that we will implement in code later on. To account for non-linear causality, we extend the concept of uncertainty reduction by following Thomas Schreiber's work, which treats time series as a Markov process with distinct transition probability distributions.

Schreiber's approach to modeling uncertainty reduction leverages information theory by treating the time series X(t) and Y(t) as Markov processes with known transition probabilities p(x) and q(x). Unlike Granger's autoregressive model, which relies on linear models, this approach uses conditional mutual information to describe information transfer. Since mutual information is derived from the difference in entropies, conditional mutual information is obtained by conditioning each entropy term on additional information. Transfer entropy is then calculated by substituting lagged variables into the conditional mutual information equation, allowing us to analyze information transfer from X(t) to Y(t) at a specific lag k using conditional mutual entropy.

Computationally, this method is attractive because joint entropy requires only a probability distribution. Transfer entropy for a single lag k can be expressed as four separate joint entropy terms, which are easily computed with an accurate probability distribution from the data. The advantage of this formula is its ability to handle more lagged dimensions. However, each additional lag increases the state space dimensionality by two, significantly impacting the ability to accurately quantify transfer entropy due to the exponential growth in finite data issues associated with estimating probability densities.

A key strength of this approach lies in its non-parametric nature. Unlike other methods, it makes no assumptions about the underlying data distribution beyond stationarity, allowing for application without prior knowledge of the data-generating processes. However, this advantage comes with a caveat: the results heavily rely on accurate estimation of the underlying distribution. Transfer entropy requires approximating the true probability distribution of the stochastic processes involved, using limited data to calculate the four entropy terms. The accuracy of this estimation significantly impacts the reliability of the transfer entropy findings. Keeping this in mind, one then must consider the possibility that the calculated entropy values may be spurious. We therefore need some way to determine the robustness of the results.

Our insistence on employing a non-parametric approach to estimate transfer entropy comes with the considerable challenge of ensuring that the results convey some truth as opposed to garbage. It therefore behooves us to consider a more informative approach to interpreting transfer entropy which involves assessing the statistical significance of the estimated value. Common significance tests involve shuffling the time series data, a predefined number of times. Transfer entropy is then calculated for each shuffled version. The p-value is subsequently calculated as the proportion of shuffled data with transfer entropy lower than the original value.

Another approach requires one to calculate the number of standard deviations a result lies from the mean of shuffled data. Since shuffling disrupts the temporal structure, the mean of shuffled transfer entropy values is expected to be close to zero. The spread of the data around this mean reflects the significance of the original result. The calculated value is called a z-score. Z-scores typically require fewer shuffles compared to p-values, making them computationally more efficient.

In the case of the p-value, the goal is to obtain a probability as close to zero as possible. Whilst a z-score indicative of statistical significance should be above 3.0.

MQL5 implementation

The code that implements the tools for quantifying transfer entropy and determining its significance are enclosed in transfer_entropy.mqh. The file contains the definition of the class, CTransEntropy, along with other helper classes and functions. This class offers a framework for the statistical analysis of time series data, specifically geared towards assessing causal relationships between variables. It exposes two distinct methods for quantifying linear Granger causality (linear transfer entropy) and non-linear transfer entropy. Which is calculated in both directions, providing a more complete picture of the information flow between the variables.

To address potential non-stationarity in the data, the class incorporates a windowing procedure. Users can define the window size and stride, enabling the analysis of the data in smaller, overlapping segments. This approach yields results specific to each window, facilitating the identification of temporal variations in causality strength. Additionally, it mitigates the challenges associated with analyzing non-stationary data. The class also provides an integrated significance testing mechanism. Users can specify the number of data shuffles to perform while preserving the marginal distributions. Based on these shuffled datasets, the class calculates p-values and z-scores for transfer entropy in each direction. These statistical values provide essential insights into the likelihood that the observed causal relationships or information transfer are due to chance, enhancing the robustness of the analysis.

An instance of the class is instantiated using the parameterless default constructor.

public:
                     CTransEntropy(void)
     {
      if(!m_transfer_entropies.Resize(2))
         Print(__FUNCTION__, " error ", GetLastError());

     }

Users should then call the Initialize() method, which initializes the object with a given dataset and sets up various parameters for analysis.

bool              Initialize(matrix &in, ulong endog_index, ulong exog_index, ulong lag, bool maxLagOnly=true, ulong winsize=0,ulong winstride=0)
     {
      if(!lag || lag>in.Rows()/2)
        {
         Print(__FUNCTION__, " Invalid parameter(s) : lag must be > 0  and < rows/2");
         return false;
        }

      if(endog_index==exog_index)
        {
         Print(__FUNCTION__, " Invalid parameter(s) : endog cannot be = exog ");
         return false;
        }

      if(!m_dataset.Resize(in.Rows(),2))
        {
         Print(__FUNCTION__, " error ", GetLastError());
         return false;
        }

      if(!m_dataset.Col(in.Col(endog_index),0) || !m_dataset.Col(in.Col(exog_index),1))
        {
         Print(__FUNCTION__, " error ", GetLastError());
         return false;
        }

      if(!m_wins.Initialize(m_dataset,lag,maxLagOnly,winsize,winstride))
         return false;

      m_tlag = lag;
      m_endog = endog_index;
      m_exog = exog_index;
      m_maxlagonly = maxLagOnly;

      return true;
     }

The first parameter required is a matrix with at least two columns, where the time series to be analyzed should be placed in the columns of the input matrix. If dealing with non-stationary data, it is recommended to difference the data beforehand. The second and third parameters are the column indices of the input data matrix, indicating the endogenous (dependent) time series and exogenous (independent) time series, respectively.

The fourth parameter, lag, defines the lag parameter considered in the analysis. The next boolean parameter, maxLagOnly, determines whether lag defines a single term (if true) or all lagged values up to and including lag (if false). The second-to-last parameter, winsize, denotes the window length. If set to 0, no windowing will be applied to the data. Lastly, winstride optionally sets the stride of the window for windowing operations, defining the step between consecutive windows as they pass over the time series data.

The method begins by ensuring that the endogenous and exogenous indices are not the same. If they are, it prints an error message and returns false. The internal matrix m_dataset is resized to store the bivariate dataset to be analyzed. It then copies the columns specified by endog_index and exog_index from the input matrix in to the first and second columns of m_dataset, respectively. If windowing is requested, the helper class CDataWindows is used to window the m_dataset matrix. Once this is done, the method sets internal variables with the provided parameters for later use.

//+------------------------------------------------------------------+
//|class that generates windows of the dataset to be analyzed        |
//+------------------------------------------------------------------+
class CDataWindows
  {
private:
   matrix m_dwins[],
          m_data;
   ulong  m_lag,
          m_win_size,
          m_stride_size;

   bool m_max_lag_only,
        m_has_windows;

   matrix            applylags(void)
     {
      matrix out=np::sliceMatrixRows(m_data,m_lag);

      if(m_max_lag_only)
        {
         if(!out.Resize(out.Rows(),m_data.Cols()+2))
           {
            Print(__FUNCTION__, " error ", GetLastError());
            return matrix::Zeros(1,1);
           }

         for(ulong i = 2; i<4; i++)
           {
            vector col = m_data.Col(i-2);
            col = np::sliceVector(col,0,col.Size()-m_lag);

            if(!out.Col(col,i))
              {
               Print(__FUNCTION__, " error ", GetLastError());
               return matrix::Zeros(1,1);
              }
           }
        }
      else
        {
         if(!out.Resize(out.Rows(),m_data.Cols()+(m_lag*2)))
           {
            Print(__FUNCTION__, " error ", GetLastError());
            return matrix::Zeros(1,1);
           }

         for(ulong i = 0,k = 2; i<2; i++)
           {
            for(ulong t = 1; t<(m_lag+1); t++,k++)
              {
               vector col = m_data.Col(i);
               col = np::sliceVector(col,m_lag-t,col.Size()-t);

               if(!out.Col(col,k))
                 {
                  Print(__FUNCTION__, " error ", GetLastError());
                  return matrix::Zeros(1,1);
                 }
              }

           }
        }

      return out;

     }

   bool              applywindows(void)
     {
      if(m_dwins.Size())
         ArrayFree(m_dwins);

      for(ulong i = (m_stride_size+m_win_size); i<m_data.Rows(); i+=ulong(MathMax(m_stride_size,1)))
        {
         if(ArrayResize(m_dwins,int(m_dwins.Size()+1),100)<0)
           {
            Print(__FUNCTION__," error ", GetLastError());
            return false;
           }
         m_dwins[m_dwins.Size()-1] = np::sliceMatrixRows(m_data,i-m_win_size,(i-m_win_size)+m_win_size);
        }

      return true;
     }


public:
                     CDataWindows(void)
     {

     }

                    ~CDataWindows(void)
     {

     }

   bool              Initialize(matrix &data, ulong lag, bool max_lag_only=true, ulong window_size=0, ulong window_stride =0)
     {
      if(data.Cols()<2)
        {
         Print(__FUNCTION__, " matrix should contain at least 2 columns ");
         return false;
        }

      m_data = data;

      m_max_lag_only = max_lag_only;

      if(lag)
        {
         m_lag = lag;
         m_data = applylags();
        }

      if(window_size)
        {
         m_win_size = window_size;
         m_stride_size = window_stride;
         m_has_windows = true;
         if(!applywindows())
            return false;
        }
      else
        {
         m_has_windows = false;

         if(m_dwins.Size())
            ArrayFree(m_dwins);

         if(ArrayResize(m_dwins,1)<0)
           {
            Print(__FUNCTION__," error ", GetLastError());
            return false;
           }

         m_dwins[0]=m_data;
        }

      return true;
     }

   matrix            getWindowAt(ulong ind)
     {
      if(ind < ulong(m_dwins.Size()))
         return m_dwins[ind];
      else
        {
         Print(__FUNCTION__, " Index out of bounds ");
         return matrix::Zeros(1,1);
        }
     }

   ulong             numWindows(void)
     {
      return ulong(m_dwins.Size());
     }

   bool              hasWindows(void)
     {
      return m_has_windows;
     }
  };

If the Initialize() method completes successfully, users can call either Calculate_Linear_TE() or Calculate_NonLinear_TE() to test for linear and non-linear transfer entropy, respectively. Both methods return a boolean value upon completion. The method Calculate_Linear_TE() can take a single optional parameter, n_shuffles. If n_shuffles is zero (the default), no significance tests are conducted.

bool              Calculate_Linear_TE(ulong n_shuffles=0)
     {
      ulong c = m_wins.numWindows();

      matrix TE(c,2);
      matrix sTE(c,2);
      matrix pvals(c,2);
      matrix zscores(c,2);

      for(ulong i=0; i<m_wins.numWindows(); i++)
        {
         matrix df = m_wins.getWindowAt(i);

         m_transfer_entropies[0] = linear_transfer(df,0,1);

         m_transfer_entropies[1] = linear_transfer(df,1,0);


         if(!TE.Row(m_transfer_entropies,i))
           {
            Print(__FUNCTION__, " error ", GetLastError());
            return false;
           }

         SigResult rlts;

         if(n_shuffles)
           {
            significance(df,m_transfer_entropies,m_endog,m_exog,m_tlag,m_maxlagonly,n_shuffles,rlts);

            if(!sTE.Row(rlts.mean,i) || !pvals.Row(rlts.pvalue,i) || !zscores.Row(rlts.zscore,i))
              {
               Print(__FUNCTION__, " error ", GetLastError());
               return false;
              }

           }

        }

      m_results.TE_XY = TE.Col(0);
      m_results.TE_YX = TE.Col(1);
      m_results.p_value_XY = pvals.Col(0);
      m_results.p_value_YX = pvals.Col(1);
      m_results.z_score_XY = zscores.Col(0);
      m_results.z_score_YX = zscores.Col(1);
      m_results.Ave_TE_XY = sTE.Col(0);
      m_results.Ave_TE_YX = sTE.Col(1);

      return true;
     }

The method calculates linear transfer entropy using the Grangers method. This is implemented in the private method, linear_transfer(). The last two parameters of this routine, identify the dependent and independent variable (column) in the input matrix. By calling the method twice with the column indices switched we can get the transfer entropy in both directions.

double            linear_transfer(matrix &testdata,long dep_index, long indep_index)
     {
      vector joint_residuals,independent_residuals;
      double entropy=0.0;

      OLS ols;

      double gc;
      vector y;
      matrix x,xx;

      matrix joint;
      if(m_maxlagonly)
         joint = np::sliceMatrixCols(testdata,2);
      else
        {
         if(!joint.Resize(testdata.Rows(), testdata.Cols()-1))
           {
            Print(__FUNCTION__, " error ", GetLastError());
            return entropy;
           }
         matrix sliced = np::sliceMatrixCols(testdata,2);
         if(!np::matrixCopyCols(joint,sliced,1) || !joint.Col(testdata.Col(indep_index),0))
           {
            Print(__FUNCTION__, " error ", GetLastError());
            return entropy;
           }
        }
      matrix indep = (m_maxlagonly)?np::sliceMatrixCols(testdata,dep_index+2,dep_index+3):np::sliceMatrixCols(testdata,(dep_index==0)?2:dep_index+m_tlag+1,(dep_index==0)?2+m_tlag:END);

      y = testdata.Col(dep_index);

      if(dep_index>indep_index)
        {
         if(m_maxlagonly)
           {
            if(!joint.SwapCols(0,1))
              {
               Print(__FUNCTION__, " error ", GetLastError());
               return entropy;
              }
           }
         else
           {
            for(ulong i = 0; i<m_tlag; i++)
              {
               if(!joint.SwapCols(i,i+m_tlag))
                 {
                  Print(__FUNCTION__, " error ", GetLastError());
                  return entropy;
                 }
              }
           }
        }

      if(!addtrend(joint,xx))
         return entropy;

      if(!ols.Fit(y,xx))
         return entropy;

      joint_residuals = ols.Residuals();

      if(!addtrend(indep,x))
         return entropy;

      if(!ols.Fit(y,x))
         return entropy;

      independent_residuals = ols.Residuals();

      gc = log(independent_residuals.Var()/joint_residuals.Var());

      entropy = gc/2.0;

      return entropy;

     }

The method Calculate_NonLinear_TE() takes an additional parameter, numBins, alongside n_shuffles. This parameter defines the number of bins used in estimating the probability density of the variables.

bool              Calculate_NonLinear_TE(ulong numBins, ulong n_shuffles=0)
     {
      ulong c = m_wins.numWindows();

      matrix TE(c,2);
      matrix sTE(c,2);
      matrix pvals(c,2);
      matrix zscores(c,2);

      for(ulong i=0; i<m_wins.numWindows(); i++)
        {
         matrix df = m_wins.getWindowAt(i);

         m_transfer_entropies[0] = nonlinear_transfer(df,0,1,numBins);

         m_transfer_entropies[1] = nonlinear_transfer(df,1,0,numBins);


         if(!TE.Row(m_transfer_entropies,i))
           {
            Print(__FUNCTION__, " error ", GetLastError());
            return false;
           }

         SigResult rlts;

         if(n_shuffles)
           {
            significance(df,m_transfer_entropies,m_endog,m_exog,m_tlag,m_maxlagonly,n_shuffles,rlts,numBins,NONLINEAR_TE);

            if(!sTE.Row(rlts.mean,i) || !pvals.Row(rlts.pvalue,i) || !zscores.Row(rlts.zscore,i))
              {
               Print(__FUNCTION__, " error ", GetLastError());
               return false;
              }

           }

        }

      m_results.TE_XY = TE.Col(0);
      m_results.TE_YX = TE.Col(1);
      m_results.p_value_XY = pvals.Col(0);
      m_results.p_value_YX = pvals.Col(1);
      m_results.z_score_XY = zscores.Col(0);
      m_results.z_score_YX = zscores.Col(1);
      m_results.Ave_TE_XY = sTE.Col(0);
      m_results.Ave_TE_YX = sTE.Col(1);

      return true;


     }

The histogram method is used to estimate the probability density. It was chosen because it is the simplest to implement. The responsibility of computing the generalized version of transfer entropy is delegated to the private methods, nonlinear_entropy() and get_entropy().

double            get_entropy(matrix &testdata, ulong num_bins)
     {

      vector hist;
      vector bounds[];
      hist=vector::Ones(10);

      if(!np::histogramdd(testdata,num_bins,hist,bounds))
        {
         Print(__FUNCTION__, " error ");
         return EMPTY_VALUE;
        }

      vector pdf = hist/hist.Sum();
      vector lpdf = pdf;

      for(ulong i = 0; i<pdf.Size(); i++)
        {
         if(lpdf[i]==0.0)
            lpdf[i] = 1.0;
        }

      vector ent = pdf*log(lpdf);

      return -1.0*ent.Sum();

     }

The four component values used to calculate the joint and independent conditional entropies are combined in nonlinear_transfer() to get the final estimate.

double            nonlinear_transfer(matrix &testdata,long dep_index, long indep_index, ulong numbins)
     {
      double entropy=0.0;

      matrix one;
      matrix two;
      matrix three;
      matrix four;

      if(m_maxlagonly)
        {
         if(!one.Resize(testdata.Rows(),3) || !two.Resize(testdata.Rows(),2) || !three.Resize(testdata.Rows(),2) || !four.Resize(testdata.Rows(),1) ||
            !one.Col(testdata.Col(dep_index),0) || !one.Col(testdata.Col(dep_index+2),1) || !one.Col(testdata.Col(indep_index+2),2) ||
            !two.Col(testdata.Col(indep_index+2),0) || !two.Col(testdata.Col(dep_index+2),1) ||
            !three.Col(testdata.Col(dep_index),0) || !three.Col(testdata.Col(dep_index+2),1) ||
            !four.Col(testdata.Col(dep_index),0))
           {
            Print(__FUNCTION__, " error ", GetLastError());
            return entropy;
           }
        }
      else
        {

         if(!one.Resize(testdata.Rows(), testdata.Cols()-1) || !two.Resize(testdata.Rows(), testdata.Cols()-2) ||
            !three.Resize(testdata.Rows(), m_tlag+1))
           {
            Print(__FUNCTION__, " error ", GetLastError());
            return entropy;
           }

         matrix deplag = np::sliceMatrixCols(testdata,dep_index?dep_index+m_tlag+1:2,dep_index?END:2+m_tlag);
         matrix indlag = np::sliceMatrixCols(testdata,indep_index?indep_index+m_tlag+1:2,indep_index?END:2+m_tlag);
         //one
         if(!np::matrixCopyCols(one,deplag,1,1+m_tlag) || !np::matrixCopyCols(one,indlag,1+m_tlag) || !one.Col(testdata.Col(dep_index),0))
           {
            Print(__FUNCTION__, " error ", GetLastError());
            return entropy;
           }
         //two
         if(!np::matrixCopyCols(two,indlag,indlag.Cols()) || !np::matrixCopyCols(two,deplag,indlag.Cols()))
           {
            Print(__FUNCTION__, " error ", GetLastError());
            return entropy;
           }
         //three
         if(!np::matrixCopyCols(three,deplag,1) || !three.Col(testdata.Col(dep_index),0))
           {
            Print(__FUNCTION__, " error ", GetLastError());
            return entropy;
           }
         //four
         four = deplag;
        }

      double h1=get_entropy(one,numbins);
      double h2=get_entropy(two,numbins);
      double h3=get_entropy(three,numbins);
      double h4=get_entropy(four,numbins);

      // entropy = independent conditional entropy (h3-h4)  - joint conditional entropy (h1-h2)
      entropy = (h3-h4) - (h1-h2);

      return entropy;

     }

Comprehensive results of the test can be accessed using the get_results() method, which returns a structure of vectors. Each member of this structure refers to a different aspect of the results, with the length of each vector depending on the parameters set by the Initialize() method and the type of transfer entropy analysis performed.

//+------------------------------------------------------------------+
//| Transfer entropy results struct                                  |
//+------------------------------------------------------------------+
struct TEResult
  {
   vector            TE_XY;
   vector            TE_YX;
   vector            p_value_XY;
   vector            p_value_YX;
   vector            z_score_XY;
   vector            z_score_YX;
   vector            Ave_TE_XY;
   vector            Ave_TE_YX;
  };

The properties of the results structure are listed below.

Structure Property	Description
TE_XY	Transfer entropy from exogenous to endogenous variable
TE_YX	Transfer entropy from endogenous to exogenous variable
z_score_XY	Significance of transfer entropy from exogenous to endogenous variable
z_score_YX	Significance of transfer entropy from endogenous to exogenous variable
p_value_XY	p-value significance of transfer entropy from exogenous to endogenous variable
p_value_YX	p-value significance of transfer entropy from endogenous to exogenous variable
Ave_TE_XY	Average transfer entropy from exogenous to endogenous variable
Ave_TE_YX	Average transfer entropy from endogenous to exogenous variable

Calling get_transfer_entropies() returns a vector of the estimated transfer entropies for the last window in the dataset, measured in both directions. The order of the results follow column order of the original data passed to the class. So the first entropy value in the vector corresponds to the series in the first column.

Examples

The functionality of the class will be tested by running tests on randomly generated series with predetermined traits. The series are generated using the functions listed below, both of which are defined in generate_time_series.mqh.

//+------------------------------------------------------------------+
//|Generate a random walk time series under Geometric Brownian Motion|
//+------------------------------------------------------------------+
vector random_series(double initial_val, ulong steps, ulong len, double muu, double sgma)
  {
   vector out(len);

   out[0] = initial_val;

   int err=0;

   for(ulong i=1; i<len; i++)
     {
      out[i] = out[i-1]*(1.0+(muu*(double(steps)/double(len)))+(MathRandomNormal(muu,sgma,err)*sqrt(double(steps)/double(len))));
      if(err)
        {
         Print(__FUNCTION__, " MathRandonNormal() ", GetLastError());
         return vector::Zeros(1);
        }
     }

   return out;
  }

The function random_series() generates a random walk time series characteristic of geometric Brownian motion. Its parameters are:

initial_val : The initial value of the time series.
steps : The total number of steps in the random walk.
len : The length of the time series to be generated.
muu : The drift term (mean) of the GBM.
sgma : The volatility (standard deviation) of the GBM.

//+-----------------------------------------------------------------------------------------------+
//|Generate two time series under Geometric Brownian Motion with S2 dependent in part on S1-lagged|
//+-----------------------------------------------------------------------------------------------+
matrix coupled_random_series(double init_1,double init_2,ulong steps, ulong len, double muu_1, double muu_2, double sgma_1, double sgma_2,
                            double alpha, double epsilon, ulong lag)
  {

   vector gbm1 = random_series(init_1,steps,len,muu_1,sgma_1);
   vector gbm2 = random_series(init_2,steps,len,muu_2,sgma_2);

   if(gbm1.Size()!=gbm2.Size())
     {
      return matrix::Zeros(1,1);
     }

   matrix out(gbm2.Size()-lag,2);

   for(ulong i = lag; i<gbm2.Size(); i++)
     {
      gbm2[i]=(1.0-alpha)*(epsilon*gbm2[i-lag] + (1.0-epsilon) * gbm2[i]) + (alpha) * gbm1[i-lag];
      out[i-lag][0] = gbm2[i];
      out[i-lag][1] = gbm1[i];
     }

   return out;
  }

The function coupled_random_series() generates two coupled random walk time series, where the second series (gbm2) is partially dependent on the lagged values of the first series (gbm1). The function returns a matrix with two columns, with the dependent series placed in the first column. The parameters of the function are as follows:

init_1 : The initial value of the first time series.
init_2 : The initial value of the second time series.
steps : The total number of steps in the random walk.
len : The length of the time series to be generated.
muu_1 : The drift term of the first series.
muu_2 : The drift term of the second series.
sgma_1 : The volatility of the first series.
sgma_2 : The volatility of the second series.
alpha : A mixing parameter for the influence of the independent series on the dependent series.
epsilon : A parameter that adjusts the influence of lagged dependent series values.
lag : The lag for the dependency of the dependent series on the independent series.

To demonstrate the capabilities of the CTransEntropy class, two MetaTrader 5 scripts were prepared. Both scripts illustrate how the class can be used to analyze a dataset and detect the lag of the independent variable (time series) that best characterizes the dependency observed in the dependent variable (time series). The first method relies on visual inspection to determine the most significant directional entropy value from a set of results obtained by analyzing transfer entropy at different lags. This method is implemented in the script LagDetection.ex5.

//+------------------------------------------------------------------+
//|                                                 LagDetection.mq5 |
//|                                  Copyright 2024, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2024, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"
#property script_show_inputs
#include<transfer_entropy.mqh>
#include<generate_time_series.mqh>
//--- input parameters

input double   Init1=100.0;
input double   Init2=90.0;
input ulong    Steps=1;
input ulong    Len=500;
input double   Avg1=0;
input double   Avg2=0;
input double   Sigma1=1;
input double   Sigma2=1;
input double   Alph=0.5;
input double   Epsilon=0.3;
input ulong    Lag=3;
input bool     UseSeed = true;
input ulong    Bins=3;
input ENUM_TE_TYPE testtype=NONLINEAR_TE;
input ulong    NumLagsToTest = 10;
input int      PlotViewTimeInSecs = 20;
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
void OnStart()
  {
   if(UseSeed)
    {
     MathSrand(256);
    }
//---
   if(!NumLagsToTest)
     {
      Print(" Invalid input parameter value for \'NumLagsToTest\'. It must be > 0 ");
      return;
     }

   matrix series = coupled_random_series(Init1,Init2,Steps,Len,Avg1,Avg2,Sigma1,Sigma2,Alph,Epsilon,Lag);

   series = log(series);

   series = np::diff(series,1,false);
   
   matrix entropies(NumLagsToTest,2);

   for(ulong k = 0; k<NumLagsToTest; k++)
     {
      CTransEntropy ote;

      if(!ote.Initialize(series,0,1,k+1))
         return;

      if((testtype==NONLINEAR_TE && !ote.Calculate_NonLinear_TE(Bins)) ||
         (testtype==LINEAR_TE && !ote.Calculate_Linear_TE()))
         return;

      vector res = ote.get_transfer_entropies();

      entropies.Row(res,k);
     }

   Print(" entropies ", entropies);

   CGraphic* g = np::plotMatrix(entropies,"Transfer Entropies","Col 0,Col 1","Lag","TE");

   if(g==NULL)
      return;
   else
     {
      Sleep(int(MathAbs(PlotViewTimeInSecs))*1000);
      g.Destroy();
      delete g;
     }

   return;
  }
//+------------------------------------------------------------------+

The first 11 user-accessible input parameters of the script control the properties of the generated series. The last 4 input parameters configure various aspects of the analysis:

Bins : Sets the number of bins for the histogram method used to estimate the probability density of the data.
testtype : Enables the selection of either linear or non-linear transfer entropy analysis.
NumLagsToTest : Sets the maximum number of lags at which the tests will be conducted, starting at 1.
PlotViewTimeInSecs : Determines the amount of time the graph will remain visible before the program quits.
UseSeed : If true it enables the seed for the random number generator to ensure reproducability of test results.

The script generates two time series with a predetermined dependency and estimates the transfer entropy at different lags. Note the data was differenced before being analyzed. Probably unnecessary in this case, but it is good practice. The results (transfer entropies) are then visualized on a graph, where transfer entropy is plotted on the vertical axis against the corresponding lag on the horizontal axis. A successful outcome of the test should produce a plot with a clear peak at the lag that was chosen to generate the random series.

Running the program shows that the linear test successfully identified the lag dependency used to generate the series. Recall that the dependent series is in the first column of the randomly generated dataset.

LagDetection Graph : Linear Test

Performing the test again with the non-linear test option yields similar results. In this instance the magnitude of the entropy value is notably less. This could be due to the limitations of using the histogram method to estimate the probability distribution of the data. It should also be noted that the number of bins selected will affect the estimated transfer entropy.

LagDetection Result : NonLinear Test

In the next demonstration, we test the significance of the entropy values obtained at specific lags. This is implemented in the script LagDetectionUsingSignificance.ex5.

//+------------------------------------------------------------------+
//|                                LagDetectionUsingSignificance.mq5 |
//|                                  Copyright 2024, MetaQuotes Ltd. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2024, MetaQuotes Ltd."
#property link      "https://www.mql5.com"
#property version   "1.00"
#property script_show_inputs
#include<transfer_entropy.mqh>
#include<generate_time_series.mqh>
//--- input parameters
input double   Init1=100.0;
input double   Init2=90.0;
input ulong    Steps=1;
input ulong    Len=500;
input double   Avg1=0;
input double   Avg2=0;
input double   Sigma1=1;
input double   Sigma2=1;
input double   Alph=0.5;
input double   Epsilon=0.3;
input ulong    Lag=3;
input bool     UseSeed = true;
input ulong    Bins=3;
input ENUM_TE_TYPE testtype=LINEAR_TE;
input ulong    LagToTest = 3;
input ulong    NumIterations = 100;
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
void OnStart()
  {
   if(UseSeed)
    {
     MathSrand(256);
    }
//---
   if(!LagToTest)
     {
      Print(" Invalid input parameter value for \'LagToTest\'. It must be > 0 ");
      return;
     }

   matrix series = coupled_random_series(Init1,Init2,Steps,Len,Avg1,Avg2,Sigma1,Sigma2,Alph,Epsilon,Lag);

   series = log(series);

   series = np::diff(series,1,false);

   matrix entropies(1,2);


   CTransEntropy ote;

   if(!ote.Initialize(series,0,1,LagToTest))
      return;

   if((testtype==NONLINEAR_TE && !ote.Calculate_NonLinear_TE(Bins,NumIterations)) ||
      (testtype==LINEAR_TE && !ote.Calculate_Linear_TE(NumIterations)))
      return;

   vector res = ote.get_transfer_entropies();

   entropies.Row(res,0);

   TEResult alres = ote.get_results();

   Print(" significance: ", " pvalue 1->0 ",alres.p_value_XY, " pvalue 0->1 ",alres.p_value_YX);
   Print(" zscore 1->0 ",alres.z_score_XY, " zscore 0->1 ",alres.z_score_YX);

   return;
  }
//+------------------------------------------------------------------+

The script has similar user-adjustable input parameters, except for the last two:

LagToTest : Sets the specific lag at which the test will be conducted.
NumIterations : Defines the number of times the data will be shuffled for significance testing.

The script generates a pair of dependent series and performs a test at the chosen lag. The transfer entropy, along with the corresponding p-value and z-score, are written to the terminal's Experts tab.

Script Parameters at Lag 3

For the first run, the script is executed with the LagToTest and Lag parameters set to the same value. The results are displayed below. They show that the series in the first column is dependent on the series in the second column of the matrix.

JS      0       21:33:43.464    LagDetectionUsingSignificance (Crash 1000 Index,M10)     significance:  pvalue 1->0 [0] pvalue 0->1 [0.66]
LE      0       21:33:43.464    LagDetectionUsingSignificance (Crash 1000 Index,M10)     zscore 1->0 [638.8518379295961] zscore 0->1 [-0.5746565128024472]

In the second run, we modify only the value of the LagToTest parameter and compare these results with those from the previous run.

Script Parameters at Lag 5

Notice the differences in p-values and z-scores. In this case both p-values and z-scores are insignificant.

RQ      0       21:33:55.147    LagDetectionUsingSignificance (Crash 1000 Index,M10)     significance:  pvalue 1->0 [0.37] pvalue 0->1 [0.85]
GS      0       21:33:55.147    LagDetectionUsingSignificance (Crash 1000 Index,M10)     zscore 1->0 [-0.2224969673139822] zscore 0->1 [-0.6582062358345131]

While the outcomes of the tests indicate that the CTransEntropy class performs well, there is a significant limitation when conducting analysis with larger lags, especially when the option for multiple lag terms is enabled (maxLagOnly is false). This is particularly problematic with the non-linear test. This stems from the use of the histogram method to estimate the distribution of the data. Using the histogram method for estimating probability densities has notable drawbacks. The choice of bin width (or bin number) significantly affects the appearance and accuracy of the histogram. Too small a bin width can result in a noisy and fragmented histogram, while too large a bin width can obscure important details and smooth over features. The biggest problem relates to the fact that histograms are primarily effective for one-dimensional data. For higher-dimensional data, the number of bins grows exponentially. If there are numerous lags to consider, demands on available compute resources may be stretched significantly. It is therefore, recommended to keep the maximum number of lags small when analysis is conducted in consideration of multiple lags using generalized transfer entropy.

Conclusion

In conclusion, the CTransEntropy class enables analysis of transfer entropy in both linear and non-linear contexts. Through practical demonstrations, we have shown its ability to detect and quantify the influence of one time series on another, with results validated by visual inspection and significance testing. The class effectively handles various scenarios, offering valuable insights into causal relationships within time series applications. However, users should be mindful of the computational challenges associated with analyzing multiple lags, particularly when applying non-linear methods. To ensure efficient performance and accurate results, it is advisable to limit the number of lags considered. Overall, the CTransEntropy class is handy tool for uncovering complex dependencies and enhancing the understanding of dynamic systems.

File	Description
Mql5\include\generate_time_series.mqh	contains functions for generating random time series
Mql5\include\ np.mqh	a collection of vector and matrix utility functions
Mql5\include\ OLS.mqh	contains the definition of the OLS class implementing ordinary least squares regression
Mql5\include\ TestUtilities.mqh	provides a collection of tools used to prepare datasets for OLS evaluation
Mql5\include\ transfer_entropy	contains the definition of the CTransEntropy class which implement transfer entropy analysis
Mql5\scripts\LagDetection.mq5	a script that demonstrates the functionality of the CTransEntropy class
Mql5\scripts\LagDetectionUsingSignificance.mq5	a second script that illustrates a different approach to interpreting the result of using CTransEntropy

Attached files |

Download ZIP

generate_time_series.mqh (2.25 KB)

np.mqh (51.13 KB)

OLS.mqh (13.36 KB)

TestUtilities.mqh (4.36 KB)

transfer_entropy.mqh (17.22 KB)

LagDetection.mq5 (2.43 KB)

LagDetectionUsingSignificance.mq5 (2.37 KB)

Mql5.zip (18.26 KB)

Warning: All rights to these materials are reserved by MetaQuotes Ltd. Copying or reprinting of these materials in whole or in part is prohibited.