Generic Class Library - bugs, description, questions, usage and suggestions - page 17

 
Alexey Oreshkin:

Great theoretical example! In practice, has anyone ever operated thousands of transactions?

p.s. I'm not trying to prove that it's crap and nobody needs it. I'm trying to understand the value for real trading. I'm not a theorist in general, but a pure practitioner.

It's not about 1,000 trades or just 10. The point is that we write short code that works equally effectively with both 10 and 1000000..... trades.
 

Briefly about the current implementation ofCHashMap:

template<typename TKey,typename TValue>
class CHashMap: public IMap<TKey,TValue>
  {
protected:
   int               m_buckets[];                        
   Entry<TKey,TValue>m_entries[];
   int               m_count;
   int               m_free_list;
   int               m_free_count;
   IEqualityComparer<TKey>*m_comparer;
   bool              m_delete_comparer;

     .................................

.................................

}

First we need to find out whatEntry<TKey,TValue> is.
Basically it's a Node like in CLinkedList, which contains:

- placed in a key and value container
- hash hashed value for key - hash_code
- next - "pointer" to the nextEntry<TKey,TValue> in order to eliminate collisions through one-way list


m_entries[] - array of "cells" where added key and value, hash_code, next are placed. The size of the array corresponds to Capacity.
m_count - specifies the index of the first unused cell in m_entries. Initial value is 0, growing to Capacity, next is rebuilding all arrays with increasing Capacity and size of all arrays.
m_buckets[] - index array on m_entries[]. The default value is -1. Array size corresponds to Capacity.


Without collisions, adding a unique value in theCHashMap container:

  1. Calculate hash by key, get hash_code
  2. Put key, value, hash_code in m_entries[] by index m_count. (next = -1 because the example has no collisions)
  3. Put in m_buckets[] by hash_code % (ArraySize(m_buckets)) the index value of the element just added in m_entries[] (this is m_count).
  4. Increase m_count by +1

Without collisions, get value by key inCHashMap container:

  1. Compute hash by key, get hash_code
  2. Using the hash_code % (ArraySize(m_buckets)) from m_buckets[], get the value which is an index of the array m_entries[]
  3. From array m_entries[] we obtain our value value from the obtained index m_buckets[hash_code % (ArraySize(m_buckets)).

Collision resolving:

For different keys, it can happen that hash_code_key_1 % (ArraySize(m_buckets)) is equal to hash_code_key_2 % (ArraySize(m_buckets)) This is called a collision.
All elements with collision added to m_entries[] are linked in a single linked list via next (seeEntry<TKey,TValue> structure)
And m_buckets[] always points to the index of the first head element in that collision list.
You get not one big list with collisions, but many small lists.


Collision, getting value by key inCHashMap container:

  1. Calculate hash by key, get hash_code
  2. Using the hash_code % (ArraySize(m_buckets)) from m_buckets[], get the value, which is an index of the m_entries[] array
  3. We see that the value of next is not equal to -1, which indicates the presence of a collision
  4. Walk through the single-link list consisting of elements of m_entries[] and compare the sought value and the saved keys for equality


Deleting the value from theCHashMap container:

- when deleting a cell in m_entries[] no deletion occurs, the cell is marked as free and hash_code = -1.
- free cells form their own list of free cells (through the same next of Entry<TKey,TValue>), for the head index of the list is responsible -m_free_list
-
free cells are first used when adding new values to the container.
-m_free_count is used to control the current number of free cells


Rebuild hash table (process of increasing Capacity) :

- rebuild only when Capacity is 100% full.
- next size of Capacity is taken from the list:

const static int  CPrimeGenerator::s_primes[]=
  {
   3,7,11,17,23,29,37,47,59,71,89,107,131,163,197,239,293,353,431,521,631,761,919,
   1103,1327,1597,1931,2333,2801,3371,4049,4861,5839,7013,8419,10103,12143,14591,
   17519,21023,25229,30293,36353,43627,52361,62851,75431,90523,108631,130363,156437,
   187751,225307,270371,324449,389357,467237,560689,672827,807403,968897,1162687,1395263,
   1674319,2009191,2411033,2893249,3471899,4166287,4999559,5999471,7199369
  };

- m_entries[] and m_buckets[] arrays sizes are increased.
- m_buckets[] is cleared and completely refilled based on data from m_entries[] (cached hash value for key - hash_code)



Forum on trading, automated trading systems and strategy testing

Generic Class Library - bugs, description, questions, usage and suggestions

Sergey Dzyublik, 2017.12.09 01:12

Got acquainted withCHashMap implementation
Honestly, I liked it.


There are several suggestions for possible improvement:

1) In my humble opinion, the implementation contains not quite correct choice of capacity - both initial size 3 and subsequent ones when rebuilding the hash table.
Yes, it is true that you need to choose a prime number for uniformity of distribution.
However, the CPrimeGenerator implementation does not meet expectations and contains omissions of prime numbers.
The purpose of such a "generator" is clear - to provide an increment factor of the order of 1.2 with automatic generation of prime numbers.
However, isn't this too small a coefficient? In C++ for vectors, the coefficient is usually 1.5-2.0, depending on the library (because it strongly affects the average complexity of the operation).
The way out could be a constant coefficient followed by rounding the number to prime via a binary search of a list of prime numbers.
And an initial size of 3 is way too small, we don't live in the last century.

2) Hash table rebuilding (Resize) is currently performed only when 100% capacity is filled (all m_entries[] are filled).
In this regard, there may be a significant number of collisions for keys with not very evenly distributed.
As an option, consider introducing a fill factor that reduces 100% by the necessary limit to perform hash table rebuilding.


 
Alexey Oreshkin:

In practice, has anyone ever operated thousands of transactions?

Yes, millions of references to history on one pass is practiced by many.

 
Vasiliy Sokolov:

On Forts every first.

It's clear.

Sending orders by hft algorithm and raising them later for analysis are different things. These hashes are not needed for sending, and they are not needed for analysis, since we analyze something else entirely.

So, no offense. You also need theory.

 
Alexey Oreshkin:

It's clear.

Sending orders by hft algorithm and raising them later for analysis are different things. These hashes are not needed for sending, and they are not needed for analysis either, since they are analyzed for something else entirely.

So, no offense. We also need theory.


I do not use dishwasher, I use sponge.
No offense meant. Dishwashers are lame, what are they for.

 
Alexey Oreshkin:

It's clear.

Sending orders by hft algorithm and raising them later for analysis are different things. You don't need these hashes to submit them and you don't need them to analyze them, since you will analyze something else entirely.

So, no offense. We also need theory.

What grudges? Keep writing your crutches.

 
Sergey Dzyublik:

Briefly about the current implementation ofCHashMap

Thank you, I will use when parsing the source code.

 
Vasiliy Sokolov:

What grudges? Keep writing your crutches.

The thing is, I don't use the subject offered here and I'm trying to understand whether I'm right or not. I'm satisfied with ordinary associative arrays, but I always want to be better.
 
fxsaber:

Thank you, I will use when parsing the sources.


Omitted checking for existence of key in container when adding new key-value pair, executed first.
But it is not important.

 
Faced an error message caused by the absence of this
//+------------------------------------------------------------------+
//| Gets the element at the specified index.                         |
//+------------------------------------------------------------------+
template<typename T>
bool CArrayList::TryGetValue(const int index,T &value) const

Please fix something like this throughout Generic.