Extracting backtest data --> wierd results? What am I doing wrong?

 

Hi guys,

I'm currently exporting backtest data from MT4 via this very simple code I made:

int fileHandle;


int init()
{


fileHandle = FileOpen("MT4dump.txt",FILE_CSV|FILE_WRITE, ';');
if(fileHandle < 0){
Alert("Error ", GetLastError());
}


return(0);
}

int deinit()
{

FileFlush(fileHandle);
FileClose(fileHandle);

return(0);
}

int start()
{

if(fileHandle > 0){
FileWrite(fileHandle, Open[0], Close[0], High[0], Low[0], Bid, Ask);
}

return(0);
}

...which as you can see simply dumps the open/close/high/low/bid/ask per 'tick'.

The issue that is confusing me is, that I do not understand what the strategy tester considers a 'tick' with respect to the 'period'.

For instance, this is the dataset obtained from XAGUSD, from 2010-01-01 to 2011-01-01 (the last full year), by varying the period in the strategy tester:

M1: 15907 entries
M5: 58320 entries
M15: 149580 entries
M30: 286252 entries
H1: 392812 entries
H4: 1046989 entries
D1: 1046263 entries

I was under the assumption that the 'period' you set in the strategy tester would essentially define how large a 'tick' is. For example, if i chose D1, then one 'tick' should be 1 day, thus there should be 365 entries, more or less. There's obviously a lot more then that!

So I don't understand firstly why the ST period doesn't reflect the 'tick' duration in the MQL4 code. Any ideas why not / how I should fix this? I know I can probably get a timestamp and quantize the data, but that seems a dirty way to do it, and also makes it harder to develop my EA.

Additionally, I do not understand why all the pairs have different amounts of entries - the date range for all sets is the same, and I established above that the 'tick' is independant from the strategy tester 'period', so I can only assume that a tick is always a tick is always a tick (smallest lot of data possible). But even that doesn't fit the results because in that case, the amount of ticks for M1 should be equal to M5,M15,M30,H1,H4 and D1...

I really do not know what is going on here guys but I would be most grateful if anyone could indicate what I am doing wrong or where my method of thinking is flawed.

 
I should point out I suppose that my goal is to be able to extract backtest data such that each entry in the data file represents one candlestick in the given period, that is, for D1 there should be 365 candlesticks, for H1 there should be around 8765.
 

i'm not sure but i think if u add Time[0] (or TimeToStr(Time[0],TIME_DATE) (or iTime())) u will c that u have a lot duplicate

 
Hmm, i'll check that in a second and compare the results... thanks for the lead.
 

Ok here's the results, using Open;Close;High;Low;Bid;Ask,Time[0]:

27.56;27.56;27.56;27.56;27.56;27.6;1290578400
27.56;27.564;27.564;27.56;27.564;27.604;1290578400
27.56;27.567;27.567;27.56;27.567;27.607;1290578400
27.56;27.571;27.571;27.56;27.571;27.611;1290578400
27.56;27.574;27.574;27.56;27.574;27.614;1290578400
27.56;27.573;27.574;27.56;27.573;27.613;1290578400
27.56;27.572;27.574;27.56;27.572;27.612;1290578400
27.56;27.571;27.574;27.56;27.571;27.611;1290578400
27.56;27.573;27.574;27.56;27.573;27.613;1290578400
27.56;27.575;27.575;27.56;27.575;27.615;1290578400
27.56;27.578;27.578;27.56;27.578;27.618;1290578400
27.56;27.58;27.58;27.56;27.58;27.62;1290578400
27.56;27.582;27.582;27.56;27.582;27.622;1290578400
27.56;27.581;27.582;27.56;27.581;27.621;1290578400
27.56;27.58;27.582;27.56;27.58;27.62;1290578400
27.56;27.579;27.582;27.56;27.579;27.619;1290578400
27.56;27.582;27.582;27.56;27.582;27.622;1290578400
27.56;27.585;27.585;27.56;27.585;27.625;1290578400
27.56;27.587;27.587;27.56;27.587;27.627;1290578400
27.56;27.59;27.59;27.56;27.59;27.63;1290578400
27.56;27.589;27.59;27.56;27.589;27.629;1290578400
27.56;27.588;27.59;27.56;27.588;27.628;1290578400
27.56;27.587;27.59;27.56;27.587;27.627;1290578400
27.56;27.586;27.59;27.56;27.586;27.626;1290578400
27.56;27.585;27.59;27.56;27.585;27.625;1290578400
27.56;27.586;27.59;27.56;27.586;27.626;1290578400
27.56;27.585;27.59;27.56;27.585;27.625;1290578400
27.56;27.587;27.59;27.56;27.587;27.627;1290578400
27.56;27.586;27.59;27.56;27.586;27.626;1290578400
27.56;27.587;27.59;27.56;27.587;27.627;1290578400
27.56;27.585;27.59;27.56;27.585;27.625;1290578400
27.56;27.584;27.59;27.56;27.584;27.624;1290578400
27.56;27.582;27.59;27.56;27.582;27.622;1290578400
27.56;27.581;27.59;27.56;27.581;27.621;1290578400
27.56;27.58;27.59;27.56;27.58;27.62;1290578400
27.59;27.59;27.59;27.59;27.59;27.63;1290579300
27.59;27.616;27.616;27.59;27.616;27.656;1290579300
27.59;27.615;27.616;27.59;27.615;27.655;1290579300

I've highlighted in bold just when Time[0] transitions to a new value. So it seems many ticks are being recorded for a specific period. The amount between each Time[0] value is 900, which corresponds directly to 900 seconds in 15 minutes, so yes qjol, you are correct - thanks for this!

Now the question remains, how do I obtain a high/low/open/close for that timeframe? Should I take all the results and average? Or should I take all the minimums and find the smallest one, all the maximums and select the largest one from each Time[0] set? Im not sure which one would be more accurate.

 

1) don't write Duplicate or

2) open it in excel or access & delete the unnecessary lines

 

They aren't duplicates - within each Time[0] frame, there are different open/close values. Also I cannot use excel as the dataset is huge. I can remove the lines with my code however.

I am just unsure whether it is more accurate to take all that data and merely average open/close/high/low/bid/ask within the same timeframe, or whether I should take the 'highest high', 'lowest low', and depending on the direction of the candle, the highest/lowest open/close. In essence, with inter-Time[0] data, is it better to use the average to obtain the values, or the range.

 

More wierdness ensues.

For a period of 1 year backtrace on XAGUSD with these periods, and then filtering for unique Time[0]'s (i.e select only one open/close/high/low/bid/ask per Time[0]):

Period Expected Candlesticks Unique Time[0]'s in 1yr dataset
M5 1yr/5m = 105189 3395
M15 1yr/15m = 35063 2451
M30 1yr/30m = 17531 2210
H1 1yr/1hr = 8765 2083
H4 1yr/4hr = 2191 1156
D1 1yr/1d = 365 260

I really really have to be doing something wrong here. Filtering the results as qjol has suggested (which makes perfect sense - remove duplicates) leads again to numbers that just don't make any sense...