Generic Class Library - bugs, description, questions, usage and suggestions - page 18

 

I ran this

#include <Generic\ArrayList.mqh>
#include <Generic\HashMap.mqh>

CHashMap<int, int> MagicsByDeals;

#define  AMOUNT 1000000

template <typename T>
int Test( T &Object )
{
  int Res = 0;
  
  for(int i = 0; i < AMOUNT; i++)
  {
    int Tmp = 0;
    
    if (Object.TryGetValue(i, Tmp))    
      Res += Tmp;
  }
  
  return(Res);
}

#define  BENCH(A)                                                              \
{                                                                             \
  const ulong StartTime = GetMicrosecondCount();                              \
  A;                                                                          \
  Print("Time[" + #A + "] = " + (string)(GetMicrosecondCount() - StartTime)); \
} 

void OnStart()
{   
  CArrayList<int> CollectionList;
  CHashMap<int, int> CollectionHash;
  
  for(int i = 0; i < AMOUNT; i++)
  {      
    CollectionHash.Add(i, i);
    CollectionList.Add(i);
  }
  
  BENCH(Print(Test(CollectionList)));
  BENCH(Print(Test(CollectionHash)));
}


and got this

1783293664
Time[Print(Test(CollectionList))] = 24819
1783293664
Time[Print(Test(CollectionHash))] = 91270


CArrayList is faster than the hash variant. Went into the innards of CArrayList, and there is this

template<typename T>
class CArrayList: public IList<T>
  {
protected:
   T                 m_items[];
   int               m_size;

Now I feel cheated. CArrayList turned out to be a wrapper for an ordinary array. I thought it was a normal list with Prev/Next pointers, but here it looks like this

template<typename T>
bool CArrayList::TryGetValue(const int index,T &value)
  {
//--- check index
   if((uint)index<(uint)m_size)
     {
      //--- get value by index
      value=m_items[index];
      return(true);
     }
   return(false);
  }
 
fxsaber:
In addition to the algorithms for working with structures itself, there is also the problem of choosing the right container based on the specifics of the task.
 
Combinator:
In addition to the algorithms of working with structures themselves, there is also the problem of choosing the right container based on the specifics of the task.

So the interface may be the same for different implementations. Some disappointment I experienced, not from the interface, but from a particular implementation - an array. Compared to the hash - heaven and earth.

 
fxsaber:

I feel cheated now. CArrayList turned out to be a wrapper for an ordinary array. I thought it was a normal list with Prev/Next pointers, but here it is


Forum on trading, automated trading systems and strategy testing

Algorithms, solution methods, performance comparison

Sergey Dzyublik, 2017.12.10 20:52


What kind of DBMS, what you say to a man who understands zero indata structures.
If there is no concept of ArrayList (vector from C++), what can we talk about here.....


It's more about your inattention...
Read all available list -https://www.mql5.com/ru/docs/standardlibrary/generic

CArrayList is an analog of std::vector from C++
CLinkedList is most likely an analogue of std::list from C++

 
fxsaber:

I ran this

and got this

CArrayList is faster than the hash variant. Went into the innards of CArrayList, and there is this

Now I feel cheated. CArrayList turned out to be a wrapper for an ordinary array. I thought it was a normal list with Prev/Next pointers, but here you get something like this

Historically, List is not a leaf at all, but an array. That's why everything is correct. If you get a list in generic, it'll probably be called LinkedList or something like that.

 
fxsaber:

Some disappointment, not from the interface, but from the specific implementation - the array.

well... this container is an array (i.e. analog of vector in C++), and native mql array is very good, so arraylist is rather a bit more convenient to use.
 

Forum on trading, automated trading systems and trading strategies testing

Generic classes library - errors, description, questions, peculiarities of use and suggestions

Sergey Dzyublik, 2017.12.09 01:12


There are a few suggestions for possible improvement:

1) In my humble opinion, the implementation contains not quite correct choice of capacity - both initial size 3 and subsequent ones when rebuilding a hash table.
Yes, it is true that you need to choose a prime number for uniformity of distribution.
However, the CPrimeGenerator implementation does not meet expectations and contains omissions of prime numbers.
The purpose of such a "generator" is clear - to provide an increment factor of the order of 1.2 with automatic generation of prime numbers.
However, isn't this too small a coefficient? In C++ for vectors, the coefficient is usually 1.5-2.0, depending on the library (because it strongly affects the average complexity of the operation).
The way out could be a constant coefficient followed by rounding the number to prime via a binary search of a list of prime numbers.
And an initial size of 3 is way too small, we don't live in the last century.

2) Hash table rebuilding (Resize) is currently performed only when 100% capacity is filled (all m_entries[] are filled).
In this regard, there may be a significant number of collisions for keys with not very evenly distributed.
As an option, consider introducing a fill factor, which will reduce 100% by the necessary limit for hash table rebuilding.

1) The volume growth factor(capacity) is not equal to 1.2, the CPrimeGenerator::ExpandPrime method is used to calculate the new volume in CHashMap:

int CPrimeGenerator::ExpandPrime(const int old_size)
  {
   int new_size=2*old_size;
   if((uint)new_size>INT_MAX && INT_MAX>old_size)
      return INT_MAX;
   else
      return GetPrime(new_size);
  }

In this method, the old collection size is multiplied by two, then for the resulting value we find the nearest simple number from the top and return it as the new volume.

About the initial value of capacity - I agree, it's very small.

But on the other hand there are always constructors, where you can explicitly specify the initial volume:

class CHashMap: public IMap<TKey,TValue>
  {
public:
                     CHashMap(const int capacity);
                     CHashMap(const int capacity,IEqualityComparer<TKey>*comparer);
  }

So I don't see much sense to change anything here.

2) Adding a fill factor parameter would really help avoid collisions in some cases. Most likely to be implemented.

 
Roman Konopelko:

1) The coefficient of volume increment (capacity) is not equal to 1.2, to calculate the new volume in CHashMap, the method CPrimeGenerator::ExpandPrime is used:

In this method, the old collection size is multiplied by two, then for the resulting value we find the nearest simple number from the top and return it as the new volume.

About the initial value of capacity - I agree, it's very small.

But on the other hand there are always constructors, where you can explicitly specify the initial volume:

That's why I don't see any particular point in changing something here.

2) Adding fill factor parameter will really help avoid collisions in some cases. Most likely it will be realized.

Roman, I think you have got the wrong idea there. At present, some classes don't even contain the most necessary methods. Have you ever tried to use them? Take for example CRedBlackTree - the classic implementation of a red-black tree. Pretty cool algorithm, but without iteration capability. What do you mean by that? There are many other things that you need but no one bothered to implement for some reason. GetHashCode, for example, is horrible. Sorry, but this is not the level of MQ!

 
Vasiliy Sokolov:

Roman, I think you're on the wrong track there. Now some generic classes don't even contain the most necessary methods. Have you even tried to use them? Take for example CRedBlackTree - classic red-black tree implementation. Pretty cool algorithm, but without iteration capability. What do you mean by that? There are many other things that you need but no one bothered to implement for some reason. GetHashCode, for example, is horrible. Sorry, but this is not the level of MQ!

I perfectly understand the need for iterators for all the template collections of the Genric library, but without the possibility to create nested classes and multiple inheritance from the interfaces, you can not implement them properly.

I would like to hear more specific comments about GetHashCode.

 
Roman Konopelko:

...

As for GetHashCode, I'd like to hear more specific comments.

The Problem

Obviously, it is impossible to implement GetHashCode properly in custom MQL5 environment. This is because not all objects can be accessed. And while for the primitive members like double int, etc. the implementation works fine, for complex objects (classes, structures and even enumerations) the hash is calculated by name. Obviously, if we have a thousand objects like CObject or even ENUM_something_that, then both CHashMap and CHashSet degenerate into LinkedList:

#include <Generic\HashSet.mqh>
//+------------------------------------------------------------------+
//| Безобидное перечисление, но может быть класс или структура       |
//+------------------------------------------------------------------+
enum ENUM_NUMBER
{
   NUMBER_FIRTS,  // First
   NUMBER_SECOND, // Second
   NUMBER_THIRD   // Third
};
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
void OnStart()
{
   CHashSet<ENUM_NUMBER> set_enum;
   for(int i = 0; i < 3; i++)
      set_enum.Add((ENUM_NUMBER)i);
   //-- Поздравляю, мы выродились в LinkedList.... 
   for(int i = 0; i < 3; i++)
   {
      //-- Здесь дикий оверхед по производительности, который тчательно скрывается...
      string is = set_enum.Contains((ENUM_NUMBER)i) ? " присутствует " : " отсутствует ";
      //...
   }
}

You can't avoid this because all we have at the user level is the name of the object:

//+------------------------------------------------------------------+
//| Returns a hashcode for custom object.                            |
//+------------------------------------------------------------------+
template<typename T>
int GetHashCode(T value)
  {
//--- try to convert to equality comparable object  
   IEqualityComparable<T>*equtable=dynamic_cast<IEqualityComparable<T>*>(value);// <-- Здесь будет получено название класса, структуры или перечисление
   if(equtable)
     {
      //--- calculate hash by specied method   
      return equtable.HashCode();
     }
   else
     {
      //--- calculate hash from name of object
      return GetHashCode(typename(value));
     }
  }

Consequently, the hashes of all objects of complex types are equal to each other and this code here involves a full array enumeration to search:

//+------------------------------------------------------------------+
//| Determines whether a set contains the specified element.         |
//+------------------------------------------------------------------+
template<typename T>
bool CHashSet::Contains(T item)
  {
//--- check buckets
   if(ArraySize(m_buckets)!=0)
     {
      //--- get hash code for item      
      int hash_code=m_comparer.HashCode(item);
      //--- search item in the slots       
      for(int i=m_buckets[hash_code%ArraySize(m_buckets)]-1; i>=0; i=m_slots[i].next)
         if(m_slots[i].hash_code==hash_code && m_comparer.Equals(m_slots[i].value,item))
            return(true);
     }
   return(false);
  }

In fact, this is so critical that it defeats the point of using all Generic at the root. The point is that in most cases you need to associate or store complex objects: enumerations, structures or classes. If you needed to operate with simple types, you could do with something simpler.


Solution

Obviously, MQL5 is a type-safe user programming language running on the internal virtual machine. Something like Net. This means that the necessary access to objects is still there, but at a more system level. Hence,It is possible to write the GetHashCode system function with the next prototype:

int GetHashCode(void sample);

How should it work? Firstly it should be omnivorous and fast. Give you a uniform distribution. For example:

int hash = GetHashCode(3);
// hash = 3

hash = GetHashCode(3.14159);
// hash = 2039485

enum EType{E1, E2, E3};
hash = GetHashCode(E2);
// hash = 2

class C1{};
C1* class1 = new C1();
hash = GetHashCode(class1);
//hash = 39203847

struct S1{};
S1 struct1;
hash = GetHashCode(struct1);
//hash = 83928342

This function can be located in the system functions section. I don't see any other solutions.

s.e. I'll make other suggestions about interfaces, multiple inheritance, and other C++ legacy stuff a bit later.