optimizing an expert, profiling the time spent

 

I've developed an expert which I have tried to optimize as much I can to allow as accurate backtesting as possible. 

But now I realize that much of the overall time spent is actually spent between returning from my expert and before I get the next tick.

I'm using M1, backtesting 10y. I'm recording times using Intel TBB tick_count. 10y with "control points" in tester gives now ~4min overall lenght.

Time is consumed as follows:

               Recorded times                       Time(s)         Count           %                       

               Buffer handling            0         15.88           34878251        7.59                       

               Account system calls       1         1.27            34878251        0.61                       

               Calculate Orders           2         8.82            34878251        4.21                       

               Ticket closing             3         19.90           34878251        9.50                       

               SLMON                      4         27.23           34878251        13.00                       

               Statistics monitor         5         1.34            34878251        0.64                       

               Misc                       6         6.44            139513004       3.08                       

               New orders                 7         70.60           34878251        33.72                       

               Time between               8         57.91           34878250        27.66                       

       === main entries total                       209.40  ====

Time between is the time from the end of start() to the beginning of next start(). So, that contributes 27% of total time. Contribution increases to even more crazy numbers if I use every tick in tester ... and goes does when using just M1 open prices.

Would like to optimize the largest portions in there but don't just have a clue what's happening there between in that case. Moved already many start() variables to static as I hoped it would be some stack allocation thing but didn't much get an improvement.

No dynamic memory used... Have moved main loops to dll to speed it to this level. Running on a fastest machine I gould get, i7 7700 at 4.7GHz, SSD, 3000MHz DDR4. 

Wonder if anyone would have a good guess, suggestion or even knowledge ? 

 

Forum on trading, automated trading systems and testing trading strategies

How to get backtesting faster ?

whroeder1, 2017.08.07 15:53

Don't do per tick what can be delayed.

void OnTick(){
   static datetime time0=0; datetime timep=time0; time0=Time[0];
   bool isNewBar = timep != time0;

   #define PRICE_MAX EMPTY_VALUE
   #define PRICE_MIN 0
   static double buyLevel=PRICE_MAX, sellLevel=PRICE_MIN;
   if(isNewBar){
      buyLevel=PRICE_MAX; sellLevel=PRICE_MIN;
      // calls to indicators etc. goes here.
      if(isBuyCondition)   buyLevel= ...;
      if(isSellCondition) sellLevel= ...;
   }
   if(Bid >= buyLevel){
      // reevaluate buy conditions/SL/TP if necessary
      // calls to indicators etc. goes here.
      // compute lots
      // open buy
   }
   else if(Bid <= sellLevel){
      ... 
   }
   // else nothing to do yet, except maybe update status.
}

That said. It has nothing to do with the accuracy of the back test. Sure it is important to speed up as much as you can your code. Accuracy of back test is a whole different topic.

 
Mikko Siltanen:

I've developed an expert which I have tried to optimize as much I can to allow as accurate backtesting as possible. 

Is it about speed or accuracy ?

But now I realize that much of the overall time spent is actually spent between returning from my expert and before I get the next tick.

I'm using M1, backtesting 10y. I'm recording times using Intel TBB tick_count. 10y with "control points" in tester gives now ~4min overall lenght.

How are you doing that ? With a DLL ? Why aren't you using GetMicrosecondCount() ?

4 minutes on open prices (10 years) is a LOT. Of course it depends of your strategy, maybe you need a lot of calculation. But I suspect your code could be seriously improved.

Time is consumed as follows:

               Recorded times                       Time(s)         Count           %                       

               Buffer handling            0         15.88           34878251        7.59                       

               Account system calls       1         1.27            34878251        0.61                       

               Calculate Orders           2         8.82            34878251        4.21                       

               Ticket closing             3         19.90           34878251        9.50                       

               SLMON                      4         27.23           34878251        13.00                       

               Statistics monitor         5         1.34            34878251        0.64                       

               Misc                       6         6.44            139513004       3.08                       

               New orders                 7         70.60           34878251        33.72                       

               Time between               8         57.91           34878250        27.66                       

       === main entries total                       209.40  ====

Time between is the time from the end of start() to the beginning of next start(). So, that contributes 27% of total time. Contribution increases to even more crazy numbers if I use every tick in tester ... and goes does when using just M1 open prices.

Not sure it's very clear what is measured and how exactly. 

Would like to optimize the largest portions in there but don't just have a clue what's happening there between in that case. Moved already many start() variables to static as I hoped it would be some stack allocation thing but didn't much get an improvement.

No dynamic memory used... Have moved main loops to dll to speed it to this level. Running on a fastest machine I gould get, i7 7700 at 4.7GHz, SSD, 3000MHz DDR4. 

Wonder if anyone would have a good guess, suggestion or even knowledge ? 

Are you sure using DLL speed up anything ? 

 

It's both. Of course you get speed with open prices but results tend to go down with control points and every tick. To optimise with 'control points' or 'every tick' needs highly optimized strategy.

Yes with DLL. 4minutes is with control points. or somewhat less if I disable the time profiling. ~30s with M1 open prices. I can configure it to use GetMicrosecondCount() as well but just ended up prefering TBB... maybe there was also some issue in recording that time between thing with that. 

My code is already optimized to the extreme. Enormous amount of hours and sweat used for this. The problem is now what I said between the end of my processing (return() at the end of start()) and before getting the next tick (start()). 

I have constantly compared the speed of mql4 code to the optimized C++ and the difference is clear. I would appreciate comments regarding the time between thing above. There's nothing explaining that. Shouldn't take that long from tester to just give the next tick. Shouldn't be an issue of my disk speed or memory speed or should I try to run MT4 from ramdisk next? My measurement is just simply done so that I have an array of amounts and counts and accumulate the amount with stoptime-starttime pairs. Basically my strategy calculates several (or tens) of smis with divergences etc. Have no external indicators to slow it down, using ringbuffers to avoid buffer handling cost, etc. But sure it can always be optimized it's just not very motivating when you realize that the largest contributor is somewhere else and cannot explain what causes it.

Thanks for comments anyway!! let's keep on digging.  

 
Mikko Siltanen:

It's both. Of course you get speed with open prices but results tend to go down with control points and every tick. To optimise with 'control points' or 'every tick' needs highly optimized strategy.

Accuracy is related to your strategy. Every tick is the more accurate in general, if you want to use Open Prices or Control points you strategy need to be compatible. I suppose you already know that.

Yes with DLL. 4minutes is with control points. or somewhat less if I disable the time profiling. ~30s with M1 open prices. I can configure it to use GetMicrosecondCount() as well but just ended up prefering TBB... maybe there was also some issue in recording that time between thing with that. 

Yeah sorry I misread.

My code is already optimized to the extreme. Enormous amount of hours and sweat used for this. The problem is now what I said between the end of my processing (return() at the end of start()) and before getting the next tick (start()). 

Maybe. Impossible for us to say something about that.

I have constantly compared the speed of mql4 code to the optimized C++ and the difference is clear. I would appreciate comments regarding the time between thing above. There's nothing explaining that. Shouldn't take that long from tester to just give the next tick. Shouldn't be an issue of my disk speed or memory speed or should I try to run MT4 from ramdisk next? My measurement is just simply done so that I have an array of amounts and counts and accumulate the amount with stoptime-starttime pairs. Basically my strategy calculates several (or tens) of smis with divergences etc. Have no external indicators to slow it down, using ringbuffers to avoid buffer handling cost, etc. But sure it can always be optimized it's just not very motivating when you realize that the largest contributor is somewhere else and cannot explain what causes it.


Time between is the time from the end of start() to the beginning of next start(). So, that contributes 27% of total time. 

Explain how you are measuring exactly, with code example.

 
Besides @Alain Verleyen's valid points. If it is only speed you are after, consider a multi threading back testing platform. I am not advocating mt5, but it will speed up your optimizing approx n times your cores. There might be other options out there.
 

ok.


#ifdef recordTimes


enum {

   recBufHandle,

   recAccSysCalls,

   recCalcOrders,

   recTicketClose,

   recSLMON, 

   recStatMon, 

   recMisc, 

   recNewOrders,

   recTimeBetween, 

   recTimesAmountMain,     // overall sum of the above

   recCountSmiLib,

   recCountsmilibmisc,

...

   recTimesAmount       // details also from these

} recTimes; 


static string rectimetxt[recTimesAmount] = {"Buffer handling          ",

                                            "Account system calls     ",

                                            "Calculate Orders         ",

                                            "Ticket closing           ",

                                            "SLMON                    ", 

                                            "Statistics monitor       ", 

                                            "Misc                     ", 

                                            "New orders               ",

                                            "Time between             ", 

                                            "---", 

                                            "CountSmiLib              ",

                                            "countsmilib misc         ",

...

                                           };


#ifdef TBBTIMING


#else

static ulong timesData[30][3];


void startrectime(int i) {

   timesData[i][0] = GetMicrosecondCount();

}

void stoprectime(int i) {

   ulong latestcount = GetMicrosecondCount();

   timesData[i][1]+=latestcount-timesData[i][0];

   timesData[i][2]++;

}

#endif


#else

#define startrectime(i)

#define stoprectime(i)

#endif


#import "misc42.dll"

void     miscini();

void     mytimerstart(int i);

double   mytimerstop(int i);

double   gettimesamount(int i);

int      gettimescount(int i);

double   timerresolution();

#define  startrectime mytimerstart

#define  stoprectime mytimerstop

#import


printing out:

...

#ifndef TBBTIMING

      tmp = StringFormat("%35s%7d%20i%20i%20.2f", rectimetxt[mat],mat,timesData[mat][1],timesData[mat][2],((double)timesData[mat][1]/(double)sum)*100.0);                                     

#else

      double tmpd = gettimesamount(mat);

      tmp = StringFormat("%35s%7d%20.2f%20i%20.2f", rectimetxt[mat],mat,tmpd,gettimescount(mat),(tmpd/sum)*100.0);                                     

#endif




usage example:

            startrectime(recCountSmiLib);

            

            countSmiDlib(optype, tfindx,tfr, SMISi[tfr][tfindx][optype], SMILen[tfr][tfindx][optype]);  

            SMIcounted[tfr][tfindx][optype] = 1;   

            

            stoprectime(recCountSmiLib);

and the problem:


int start()  {

   stoprectime(recTimeBetween);

...

...

   startrectime(recTimeBetween);

   return(1);

} // start

Hopefully you get the idea, though it gets a bit messy with ifdefs.

Found somewhat promising article here: https://www.metatrader5.com/en/terminal/help/start_advanced/start, but the mentioned execution_mode parameter for tester makes no difference in my environment. Article is on MT5 so maybe that's the reason, but I gave it a try anyway. Anyway configurable delay on the scale of milliseconds is huge in tester case. Maybe it's ok for normal but the parameter is really on tester. Any delay over millisecond would ruin the tester performance for sure... and there's no configurability promised on the scale less than millisecond.

Does anyone know if there's any difference how the MT4 builds from different brokers behave? Mine is FXCM Mt4 v 4.00 build 1090.      




Platform Start - For Advanced Users - MetaTrader 5
Platform Start - For Advanced Users - MetaTrader 5
  • www.metatrader5.com
After installation, a group of programs of the trading platform is added to the Start menu, and the program shortcut is created on the desktop. Use them to run the platform. Two copies of the platform cannot run from the same directory. If you need to run multiple copies at the same time, install the appropriate number of programs in different...
 

Regarding multithreading, for some reason I cannot get any gains from Intel TBB. I managed to run my system finally on several cores, tried different ways but those never gave me advantage. Processor load certainly goes up but no real speed gains. Maybe it's because there are these strange delays no-one is able to explain. And you cannot profile the MT4 execution with debugger or Intel VTune for example because MT4 closes immediately when it sees an debugger connection. All these make also it very difficult to debug your dll but it's possible after all (was even for me). Would of course be interesting to try MT5 but I did that once to halfway and gave up. Too different in my case. And who could guarantee that there's not something similar lurking out there?

 
Mikko Siltanen:


and the problem:


int start()  {

   stoprectime(recTimeBetween);

...

...

   startrectime(recTimeBetween);

   return(1);

} // start

Hopefully you get the idea, though it gets a bit messy with ifdefs.

That's the MT4 Strategy Tester processing time. In my opinion the problem is it's not possible to measure it accurately like that as it takes certainly far less than 1µs, but your time resolution is 1µs. I would measure it as the total time (as given but the strategy tester log) minus all other operations time.

 

Mikko Siltanen:

Does anyone know if there's any difference how the MT4 builds from different brokers behave? Mine is FXCM Mt4 v 4.00 build 1090.      

No difference.

Forum on trading, automated trading systems and testing trading strategies

Why is column "Flags" no longer present in the CSV tick data file exported from MT5 ?

Alain Verleyen, 2018.01.14 04:12

The terminal is the same for everyone, what the broker can customize is just cosmetic.

Just for information.


 

Thanks. Good to know.

But it still bothers me how that processing time behaves or changes between open prices, control points ane every tick. My gut feeling says that the processing time should be smaller in every tick case as on average there's less things to do for MT4: less new orders, less orders to close. Contribution should go down. But it stays roughly the same (now my latest run) or increases (the comment I gave few days ago).