How to speed up big array reading?

 

Hello 

I'm trying to read a news file on back test from 2000 until present.

Is there a way to read the Array quicker? because what I'm trying for now is painfully slow

input int GMT_Offset = 3;
string News[][5];
//+------------------------------------------------------------------+
//|                                                                  |
//+------------------------------------------------------------------+
int  OnInit()
  {
   string sep = ",";
   ushort u_sep;
   string substrings[];
   int i = 0;
   ArrayResize(News, 10000, 10000);
   u_sep = StringGetCharacter(sep, 0);
   ResetLastError();
   int Handle = FileOpen("Events.txt", FILE_TXT | FILE_COMMON | FILE_READ | FILE_ANSI, ",");
   if(Handle < 0)
     {
      Print("File Error : ", GetLastError());
      return(INIT_FAILED);
     }
   while(!FileIsEnding(Handle))
     {
      string line = FileReadString(Handle);
      int k = StringSplit(line, u_sep, substrings);
      ArrayResize(News, i + 1, 10000);
      News[i][ 0] = substrings[0];
      News[i][ 1] = substrings[1];
      News[i][ 2] = substrings[2];
      News[i] [3] = substrings[3];
      News[i][ 4] = substrings[4];
      i++;
     }
   ArrayResize(News, i);
   FileClose(Handle);
   return(INIT_SUCCEEDED);
  }
//+------------------------------------------------------------------+
//| Expert deinitialization function                                 |
//+------------------------------------------------------------------+
void OnDeinit(const int reason)
  {
  }
//+------------------------------------------------------------------+
//|                                                                  |
//+------------------------------------------------------------------+
datetime Gmt(datetime time)
  {
   return(time - (GMT_Offset * 3600));
  }
//+------------------------------------------------------------------+
//| Expert tick function                                             |
//+------------------------------------------------------------------+
void OnTick()
  {
   MqlRates mrate[];
   ArraySetAsSeries(mrate, true);
   if(CopyRates(_Symbol, _Period, 0, 3, mrate) < 0)
     {
      Alert("Error copying rates/history data - error:", GetLastError(), "!!");
      return;
     }
   static datetime Prev_time;
   datetime Bar_time[1];
   Bar_time[0] = mrate[0].time;
   if(Prev_time == Bar_time[0])
     {
      return;
     }
   Prev_time = Bar_time[0];
//----------------------------------
   for(int i = 0; i < ArrayRange(News, 0); i++)
      if(Gmt(TimeCurrent()) == StringToTime(News[i][0]))
        {
         Print(News[i][0], News[i][1], News[i][2], News[i][3]);
        }
  }
 

So I read this:

"If instead of using a variable for the array size you call the ArraySize() function when checking condition in a for loop, the looping time may be significantly prolonged as the ArraySize() function will be called at every loop iteration. So the function call takes more time than calling a variable"

And changed the ArraySize() to be a variable instead, slight improvement but still painfully slow

int Size=ArrayRange(News,0);
   for(int i = 0; i < Size; i++)
      if(Gmt(TimeCurrent()) == StringToTime(News[i][0]))
        {
         Print(News[i][0], News[i][1], News[i][2], News[i][3]);
        }
Documentation on MQL5: Language Basics / Operators / Loop Operator for
Documentation on MQL5: Language Basics / Operators / Loop Operator for
  • www.mql5.com
Loop Operator for - Operators - Language Basics - MQL5 Reference - Reference on algorithmic/automated trading language for MetaTrader 5
 
Don't go through the whole array from 0 every time. When time is met, remember the index of the array and use that index as the starting point for the next iteration.
 
Lativ_ #:

So I read this:

"If instead of using a variable for the array size you call the ArraySize() function when checking condition in a for loop, the looping time may be significantly prolonged as the ArraySize() function will be called at every loop iteration. So the function call takes more time than calling a variable"

And changed the ArraySize() to be a variable instead, slight improvement but still painfully slow

I cannot estimate the structure of your file, but let me assume following.

You are using a CSV-File which you want to read.

The file should have two types of delimiters. -> One to separate the different datasets from each other, and another to separate the values inside the dataset.

Lets say a dataset consists of one line of text in your file, separated by CRLF "\r\n".

Then the values would most probably be separated by a comma, right?


Well, then do exactly that. 

short delimiter = ('\r' << 8) + '\n';  // This will give you "\r\n" as the file delimiter in a short (16-bit) variable (2 Bytes).

int Handle = FileOpen("Events.txt", FILE_TXT | FILE_COMMON | FILE_READ | FILE_ANSI, delimiter);


Then, when you split the string into its values, you can use the comma as separator. 


If that doesnt help, maybe you want to process tha fiel as binary, using a char-array and read the file into memory in one function call. Then process the array with a loop, with the most simple approach possible. Use ternary operator. "variable = (condition) ? expr1 : expr2";

I would guess, you cannot get faster than that.

 
Enrique Dangeroux #:
Don't go through the whole array from 0 every time. When time is met, remember the index of the array and use that index as the starting point for the next iteration.

Improved by some major seconds!! 

Thank you

 
Dominik Christian Egert #:

I cannot estimate the structure of your file, but let me assume following.

You are using a CSV-File which you want to read.

The file should have two types of delimiters. -> One to separate the different datasets from each other, and another to separate the values inside the dataset.

Lets say a dataset consists of one line of text in your file, separated by CRLF "\r\n".

Then the values would most probably be separated by a comma, right?


Well, then do exactly that. 


Then, when you split the string into its values, you can use the comma as separator. 


If that doesnt help, maybe you want to process tha fiel as binary, using a char-array and read the file into memory in one function call. Then process the array with a loop, with the most simple approach possible. Use ternary operator. "variable = (condition) ? expr1 : expr2";

I would guess, you cannot get faster than that.

Would dividing the file by years not help much?

I'll try the binary and char-array

Thank you

 

Actually not starting from zero every time improved the speed by 1 minute. But still, 01:26 minute for a month back test is still slow.

int memory=0;
datetime t=TimeCurrent();
int Size=ArrayRange(News,0);
   for(int i = memory; i < Size; i++)
      if(Gmt(t) == StringToTime(News[i][0]))
        {
         Print(News[i][0], News[i][1], News[i][2], News[i][3]);
         memory=i;
        }
 
Compare integers instead of strings. 
 

You can exit the loop when the current date is before an event

int memory=0;
datetime t=TimeCurrent();
int Size=ArrayRange(News,0);
   for(int i = memory; i < Size; i++)
      if(Gmt(t) == StringToTime(News[i][0]))
        {
         Print(News[i][0], News[i][1], News[i][2], News[i][3]);
         memory=i;
        }else if(Gmt(t)<StringToTime(News[i][0])){break;}

And this :

Enrique Dangeroux #:
Compare integers instead of strings. 

(sorry for the constant edits mod , the interface is very easy to use)

 
Lorentzos Roussos #:

And this 

This was the solution! exiting a loop will be my most favorite thing to do :)

Thank you very much!

 
Stop doing the same things multiple times
datetime t=TimeCurrent();
int Size=ArrayRange(News,0);
   for(int i = memory; i < Size; i++)
      if(Gmt(t) == StringToTime(News[i][0]))
        {
         Print(News[i][0], News[i][1], News[i][2], News[i][3]);
         memory=i;
        }else if(Gmt(t)<StringToTime(News[i][0])){break;}
Simplify your code
datetime t=Gmt(TimeCurrent());
int Size=ArrayRange(News,0);
   for(int i = memory; i < Size; i++)
      datetime n=StringToTime(News[i][0]);
      if(t == n)
        {
         Print(n, News[i][1], News[i][2], News[i][3]);
         memory=i;
        }else if(t < n){break;}