Find average value over number of bars? - page 3

 
As you can see, my code has two functions, a min and a max to get the min/max values indexes from the buffers.

It is impossible to implement an algorithm with O 1 to compute the min/max values from the ring buffer.

At least I cannot think of a solution to do it. It will never be deterministic, no matter what model I think of. 
 
Dominik Christian Egert #:
I am not certain I understood that.

What I understood is that every new value, you add to the calculation, would also increase your length of the MA.

I think we are looking at 2 different things - I was referring to the OP's requirement to "find the ave value over a number of bars". That is easy to do in a deterministic way if one maintains the sum and count and that was my point. Min and max are also easy to collect.

You have opted for MA (which I am not sure was the original request, but may be...), which is of course different, but not difficult to do the necessary operations on a fixed size array.

Ultimately its about effort vs benefit - none of this is new and all easily done 

 
Dominik Christian Egert #:

The essential issue is in fact a limitation in MQLs OO model. Because you cannot bind an object to an external array. If that were possible, you could use the external array as a data feed, and wouldn't be required to use an internal ring buffer.

This was not clear to me - I never found any issues putting objects into arrays, or arrays into objects - is it something else you are trying to do?

 
R4tna C #:

I was thinking about this - if you have a variable to sum up the values, and another to count the values, you have the simplest/fastest (and deterministic) way to compute the mean.

Only 3 operations: 1. add the new value to the sum 2. increment the count 3. mean =  new sum divided by the new count.

Of course this does not retain a complete history of all values, but that is a simple matter of adding each new entry to an array, should one need it for deeper analysis.

Alternatively if one chooses to keep an incremental history array, just use ArraySize() instead of having a count variable, all of this would occur in microsecs anyway

A thought on using a variable compared to a function, in this case ArraySize.

A function call will end up in a stack allocation, moving the current context from the CPUs registers and loading a new set into it.

This is computational much more than tracking a variable in a local scope that's anyways already loaded, hence, it will be much quicker to keep track of such a value in a variable than calling a function.

Although we cannot know if ArraySize is inlined by the compiler.

Since the new update on ME5 supports optimized compile, I've conducted some tests with my code and came up with the impression, if my code is already written in such way, that it is optimized coded, the option "optimized compile" has zero effect on my code.

The execution times are within the margin of error the same.

So I assume, coding in such a way, that it is optimal for the CPUs prefetching unit and memory managing unit, as well as being in line with the CPUs architecture and the OSs task allocation algorithm, will bring you the most benefits right from the gecko.

That's what I like about Williams code, it is perfectly aligned to all these requirements and for sure be the most efficient way of executing exactly that programmed code.

Reducing function calls, aligning variables in the correct order, accessing them within a volatile code block. All this ensures everything is lining up.


 
R4tna C #:

This was not clear to me - I never found any issues putting objects into arrays, or arrays into objects - is it something else you are trying to do?

Ahm... No.

Apples and Peaches... Haha

OK, let me try to explain the issue.

Lets say you have an array, close prices, given to you by a function, like OnCalculate(). 

Lets say you want to have an object that calculates the mean over the last 15 values. When instantiating this object, you cannot give reference to the array and make it store this reference. Because you cannot create reference-variables, except as paramter in functions, which loose focus on return.

This means, you cannot have a memory address pointer to point to an external array within the given object, and you are forced to pass the array every time to the object.

Therefore you cannot write such code as this:

template <typename T>
struct _obj
{
        // Local storage
        T*      external_data;

        _obj() :
                external_data (NULL)
        {};

        _obj(T& ext_data) :
                external_data (*ext_data)
        {};

        // Continue working with external_data, if assigned....
};


Thats why it is neccessary to keep track of data internally, within the object itself.  

 
R4tna C #:

I think we are looking at 2 different things - I was referring to the OP's requirement to "find the ave value over a number of bars". That is easy to do in a deterministic way if one maintains the sum and count and that was my point. Min and max are also easy to collect.

You have opted for MA (which I am not sure was the original request, but may be...), which is of course different, but not difficult to do the necessary operations on a fixed size array.

Ultimately its about effort vs benefit - none of this is new and all easily done 

To be hoest, I think you have a thought error on this.

"sum and count and that was my point" -> No, thats not a mean or average value over a given amount of values, especially when they keep changing, or better said, as the scope of relevance keeps shifting, not adding or extending the scope. 


"Min and max are also easy to collect" -> Please show me how to do this, because I cannot think of a way on how to do this in a O 1 way, or a deterministic execution environment. - How would you tackle this issue?

 
Dominik Christian Egert #:
A thought on using a variable compared to a function, in this case ArraySize.

A function call will end up in a stack allocation, moving the current context from the CPUs registers and loading a new set into it.

This is computational much more than tracking a variable in a local scope that's anyways already loaded, hence, it will be much quicker to keep track of such a value in a variable than calling a function.

Although we cannot know if ArraySize is inlined by the compiler.

Since the new update on ME5 supports optimized compile, I've conducted some tests with my code and came up with the impression, if my code is already written in such way, that it is optimized coded, the option "optimized compile" has zero effect on my code.

The execution times are within the margin of error the same.

So I assume, coding in such a way, that it is optimal for the CPUs prefetching unit and memory managing unit, as well as being in line with the CPUs architecture and the OSs task allocation algorithm, will bring you the most benefits right from the gecko.

That's what I like about Williams code, it is perfectly aligned to all these requirements and for sure be the most efficient way of executing exactly that programmed code.

Reducing function calls, aligning variables in the correct order, accessing them within a volatile code block. All this ensures everything is lining up.


This seems to be overthinking it.

Over the years I did a lot of performance optimization, usually with very large data sets/systems, when processing could take days (yes days.. even with extremely expensive hardware clusters) and involved many disparate systems & technologies.

I met many teams who would approach the issues from this perspective: caches, buffers, OS tuning etc. and frankly it was always wrong. At best you could shave off a few millisecs of the runtime when we need to be reducing it by factors of hours. We were taught very clearly that this is the wrong methodology (not far from you actually - in Walldorf and Munich).

The bottlenecks, and solutions, were always higher up in the application stack and the gains to be had were immense once found. I won't go into detail as its way beyond the scope of this discussion, but suffice to say a few nanoseconds here and there because of using ArraySize() rather than keeping a variable are largely irrelevant and not worth worrying about until it manifests itself as real problem which needs a solution.

 
Dominik Christian Egert #:

"Min and max are also easy to collect" -> Please show me how to do this, because I cannot think of a way on how to do this in a O 1 way, or a deterministic execution environment. - How would you tackle this issue?

I outlined the 3 operations in maintaining a sum, count and mean. You could also have variables for min and max - just compare these with the incoming number and decide whether it is a min or max and keep it if so. A fixed number of operations... two ternary statements would do it

 
R4tna C #:

This seems to be overthinking it.

Over the years I did a lot of performance optimization, usually with very large data sets/systems, when processing could take days (yes days..) and involved many disparate systems & technologies.

I met many teams who would approach the issues from this perspective: caches, buffers, OS tuning etc. and frankly it was always wrong. At best you could shave off a few millisecs of the runtime when we need to be reducing it by factors of hours. We were taught very clearly that this is the wrong methodology (not far from you actually - in Walldorf and Munich).

The bottlenecks, and solutions, were always higher up in the application stack and the gains to be had were immense once found. I won't go into detail as its way beyond the scope of this discussion, but suffice to say a few nanoseconds here and there because of using ArraySize() rather than keeping a variable are largely irrelevant and not worth worrying about until it manifests itself as real problem which needs a solution.

Repeat your statement after testing this piece of code please:

    double test_arr[];
    ArrayResize(test_arr, 10000);
    
    
    ulong start = GetMicrosecondCount();
    for(int cnt = NULL; (cnt < ArraySize(test_arr)) && !_StopFlag; cnt++)
    { test_arr[cnt] = cnt; }
    printf("Time in microseconds used: %llu", GetMicrosecondCount() - start);
    
    
    start = GetMicrosecondCount();
    for(int cnt = ArraySize(test_arr) - 1; (cnt >= NULL) && !_StopFlag; cnt--)
    { test_arr[cnt] = cnt; }
    printf("Time in microseconds used: %llu", GetMicrosecondCount() - start);
    


Non optimized compile results:

2022.08.27 11:05:28.394 2022.08.06 00:00:00   Time in microseconds used: 53

2022.08.27 11:05:28.394 2022.08.06 00:00:00   Time in microseconds used: 34



Optimized compile results:

2022.08.27 11:05:55.261 2022.08.06 00:00:00   Time in microseconds used: 65

2022.08.27 11:05:55.261 2022.08.06 00:00:00   Time in microseconds used: 33


EDIT: 

Its half the execution time, that is a relevant factor, no matter how much nanoseconds you scalp off of the execution. 

 
Dominik Christian Egert #:

Repeat your statement after testing this piece of code please:


Non optimized compile results:

2022.08.27 11:05:28.394 2022.08.06 00:00:00   Time in microseconds used: 53

2022.08.27 11:05:28.394 2022.08.06 00:00:00   Time in microseconds used: 34



Optimized compile results:

2022.08.27 11:05:55.261 2022.08.06 00:00:00   Time in microseconds used: 65

2022.08.27 11:05:55.261 2022.08.06 00:00:00   Time in microseconds used: 33


You are worried about a few microseconds? Nothing more to be said.