Efficiently reading in a csv file?

 

I placed "AAPL.csv" into the MetaTrader terminal folder's Files subfolder (MQL4/Files) to be accessible by the EA. The structure of this csv is as follows:

Date,Open,High,Low,Close,Adj Close,Volume
1980-12-12,0.1283479928970337,0.1289059966802597,0.1283479928970337,0.1283479928970337,0.10092189908027649,469033600
1980-12-15,0.12221000343561172,0.12221000343561172,0.12165199965238571,0.12165199965238571,0.09565676748752594,175884800

I want to read this, as well as many other similar csv files. All files have different lengths. My question is, what is the best practice when reading variable-length files? For now, I managed to read the content of my file by creating a 2-dimensional array: 

string s[7][1000000];

although this is poor programming (what if the file only has 500 rows?) and it can still fail if I encounter a csv that is longer (what if file has 1000001 rows?). I tried using a dynamic array: 

string s[7][];

but it returns '[' - invalid index value error. Yet another idea I had to use the FileSize() function and allocate just-the-necessary amount of memory to the 2-dimensional array. However, 

int handle=FileOpen(FileName,FILE_CSV|FILE_READ,",");
   if(handle>0)
      {
        int size = FileSize(handle);
      ...

yielded a size that equals the product of column numbers and row numbers. I was hoping to obtain a row_count and col_count and use them to define s:

string s[col_count][row_count];


My full working code:

extern string FileName = "AAPL.csv";
 
int init()
  {
   int row=0,col=0;
   string s[7][1000000];
   
   ResetLastError();
   int handle=FileOpen(FileName,FILE_CSV|FILE_READ,",");
   if(handle>0)
      {
        //int size = FileSize(handle);
        //Print(size);
        while(True)
        {
          string temp = FileReadString(handle);
          if(FileIsEnding(handle)) break; //FileIsEnding = End of File
          s[col][row]=temp;
          if(FileIsLineEnding(handle)) //FileIsLineEnding = End of Line
          {
            col = 0; //reset col = 0 for the next row
            row++; //next row
          }
          else
          {
            col++; //next col of the same row
          }
        }
        FileClose(handle);
      }
      else
      {
        Comment("File "+FileName+" not found, the last error is ", GetLastError());
      }  
   return(0);
  }

int start()
  {
   return(0);
  }
//+------------------------------------------------------------------+
 
Levente Csibi:

I placed "AAPL.csv" into the MetaTrader terminal folder's Files subfolder (MQL4/Files) to be accessible by the EA. The structure of this csv is as follows:

I want to read this, as well as many other similar csv files. All files have different lengths. My question is, what is the best practice when reading variable-length files? For now, I managed to read the content of my file by creating a 2-dimensional array: 

although this is poor programming (what if the file only has 500 rows?) and it can still fail if I encounter a csv that is longer (what if file has 1000001 rows?). I tried using a dynamic array: 

but it returns '[' - invalid index value error. Yet another idea I had to use the FileSize() function and allocate just-the-necessary amount of memory to the 2-dimensional array. However, 

yielded a size that equals the product of column numbers and row numbers. I was hoping to obtain a row_count and col_count and use them to define s:


My full working code:

you might use self-grown array resize approach to parse out csv file.  i illustrate below code sample for an AAPL.csv from Yahoo Finance.

please note that Yahoo Finance csv's text coding is utf-8 with 0x0a (i.e. \n) row separtor.  below codes has comments to highlight such adhoc process.

class CCsvData
  {
public:
   datetime          time;
   double            open,high,low,close,adj_close,volume;
  };

CCsvData CsvList[];
//+------------------------------------------------------------------+
//| Expert initialization function                                   |
//+------------------------------------------------------------------+
int OnInit()
  {
   ReadCsvData();
   return(INIT_SUCCEEDED);
  }
//+------------------------------------------------------------------+
//| Expert deinitialization function                                 |
//+------------------------------------------------------------------+
void OnDeinit(const int reason)
  {
   ArrayFree(CsvList);
  }
//+------------------------------------------------------------------+
//|                                                                  |
//+------------------------------------------------------------------+
void ReadCsvData()
  {
   ArrayResize(CsvList,0);
   int fHandle=FileOpen("AAPL.csv",FILE_BIN|FILE_READ);
   if(fHandle==INVALID_HANDLE)
     {
      Print("failed to open csv file, error code: ",GetLastError());
      return;
     }
   uchar buf[];
   int ii,csvColumnSz=7;//yahoo finance csv has 7 columns
   string readStr="";
   FileSeek(fHandle,0,SEEK_SET);
   FileReadArray(fHandle,buf,0,WHOLE_ARRAY);
   FileClose(fHandle);
   readStr=CharArrayToString(buf,0,WHOLE_ARRAY,CP_UTF8);//yahoo csv's text coding is utf-8
   if(readStr!="")
     {
      string elArr[],dataArr[],tmpStr="";
      StringSplit(readStr,'\n',elArr);//yahoo's csv row separator is 0x0a (i.e. \n)
      for(ii=0; ii<ArraySize(elArr); ii++)
        {
         if(elArr[ii]=="" || StringToDouble(elArr[ii])==0)//filter out empty row and first title row
            continue;
         StringSplit(elArr[ii],',',dataArr);// ',' is an inline separator
         if(ArraySize(dataArr)<csvColumnSz)
            continue;
         ArrayResize(CsvList,ArraySize(CsvList)+1);
         int lastIndex=ArraySize(CsvList)-1;
         tmpStr=dataArr[0];
         StringReplace(tmpStr,"-",".");//reformat csv's "yyyy-MM-dd" to mql's "yyyy.MM.dd" time string
         CsvList[lastIndex].time=StringToTime(tmpStr);
         CsvList[lastIndex].open=StringToDouble(dataArr[1]);
         CsvList[lastIndex].high=StringToDouble(dataArr[2]);
         CsvList[lastIndex].low=StringToDouble(dataArr[3]);
         CsvList[lastIndex].close=StringToDouble(dataArr[4]);
         CsvList[lastIndex].adj_close=StringToDouble(dataArr[5]);
         CsvList[lastIndex].volume=StringToDouble(dataArr[6]);
        }
      // sanity check if csv data is well loaded
      Print("total csv rows: ",ArraySize(CsvList));
      int prDigit=2;
      for(ii=ArraySize(CsvList)-1; ii>=(int)MathMax(0,ArraySize(CsvList)-10); ii--)//print last 10 rows
        {
         Print(ii," ",TimeToString(CsvList[ii].time,TIME_DATE|TIME_MINUTES|TIME_SECONDS)," ",
               DoubleToString(CsvList[ii].open,prDigit)," ",
               DoubleToString(CsvList[ii].open,prDigit)," ",
               DoubleToString(CsvList[ii].high,prDigit)," ",
               DoubleToString(CsvList[ii].low,prDigit)," ",
               DoubleToString(CsvList[ii].close,prDigit)," ",
               DoubleToString(CsvList[ii].adj_close,prDigit)," ",
               DoubleToString(CsvList[ii].volume,prDigit));
        }
      /* example output as of 2021.03.25
      total csv rows: 252
      251 2021.03.25 00:00:00 119.54 119.54 121.66 119.00 120.59 120.59 98696600.00
      250 2021.03.24 00:00:00 122.82 122.82 122.90 120.07 120.09 120.09 88530500.00
      249 2021.03.23 00:00:00 123.33 123.33 124.24 122.14 122.54 122.54 95467100.00
      248 2021.03.22 00:00:00 120.33 120.33 123.87 120.26 123.39 123.39 111912300.00
      247 2021.03.19 00:00:00 119.90 119.90 121.43 119.68 119.99 119.99 185023200.00
      246 2021.03.18 00:00:00 122.88 122.88 123.18 120.32 120.53 120.53 121229700.00
      245 2021.03.17 00:00:00 124.05 124.05 125.86 122.34 124.76 124.76 111437500.00
      244 2021.03.16 00:00:00 125.70 125.70 127.22 124.72 125.57 125.57 114740000.00
      243 2021.03.15 00:00:00 121.41 121.41 124.00 120.42 123.99 123.99 92403800.00
      242 2021.03.12 00:00:00 120.40 120.40 121.17 119.16 121.03 121.03 87963400.00
      */
     }
  }
//+------------------------------------------------------------------+
//| Expert tick function                                             |
//+------------------------------------------------------------------+
void OnTick()
  {
  }
//+------------------------------------------------------------------+
 
Tsungche Kuo:

you might use self-grown array resize approach to parse out csv file.  i illustrate below code sample for an AAPL.csv from Yahoo Finance.

please note that Yahoo Finance csv's text coding is utf-8 with 0x0a (i.e. \n) row separtor.  below codes has comments to highlight such adhoc process.

Thanks a ton, Tsungche Kuo! It works like a charm! I really appreciate it! 

 

Hello, thanks a lot,

I could use this example as well. I have adopted it a little bit to my case, but great, it works, thanks a lot.


A little question in addition:


I would like to do a kind of "transformation".

In my source file, I have 1 field for "date", and one field for "time".

Ideally, I would combine those 2 fields from the file into 1 field in the Array (which has the date time format of mql5.


Do you have a little tip how I could implement this transformation in the context of the above program?

Thanks a lot in advance!

 
Tsung Che Kuo #:

you might use self-grown array resize approach to parse out csv file.  i illustrate below code sample for an AAPL.csv from Yahoo Finance.

please note that Yahoo Finance csv's text coding is utf-8 with 0x0a (i.e. \n) row separtor.  below codes has comments to highlight such adhoc process.

Nice, thank you. I can use that too.
 
Tsung Che Kuo #:

you might use self-grown array resize approach to parse out csv file.  i illustrate below code sample for an AAPL.csv from Yahoo Finance.

please note that Yahoo Finance csv's text coding is utf-8 with 0x0a (i.e. \n) row separtor.  below codes has comments to highlight such adhoc process.

This is not really efficient with big files. You should use the third parameter of the ArrayResize function. Increasing the size of an integer array from 0 to a million without making any other operations takes 25 seconds on my computer.