Generic Class Library - bugs, description, questions, usage and suggestions - page 5

 
Sergey Dzyublik:

Hashes average O(1), if the size of the dictionary to the number of expected elements allows.
And then it depends on collision handling implementations, it could be through a list, it could be more subhash....

My terminological ignorance hinders the process of understanding. I will wait for examples. I would like concise and to the point - warrants, ticks and the like.

 
fxsaber:

My terminological ignorance hinders the process of understanding. I will wait for examples. I would like examples that are concise and to the point - warrants, ticks, and the like.


I do not care about orders. I am interested in the deals.

 
fxsaber:

I read it on the wiki. That's a case where you don't understand the logic of intuitive reasoning at all.


365 days is the limit on the size of the dictionary
number of students in class is the expected number of items in the collection
coincidence of birthdays is a collision

 
Sergey Dzyublik:

365 days is a limit on the size of the dictionary
the number of students in the class is the expected number of items in the collection
coincidence of birthdays is a collision

Thank you, such a terminological definition is clearer. I just don't understand what this "paradox" has to do with the topic of this thread?

 
Sergey Dzyublik:

365 days is the limit on the size of the dictionary
number of students in class is expected number of items in the collection
coincidence of birthdays is a collision


In this example, the hash is the student's birthday.
We have a closet with 365 shelves that contain students' journals.
We put each diary on the shelf corresponding to the student's birthday.

We need the diary of the student Petrov and we know when he was born.
Now by the birthday in O(1) we can easily find Petrov's diary, as well as the diary of any other student.
The exception is when two students have the same birthday - this is called a collision.

Collision solving is using additional information to determine which of two or more diaries we need.
List-based resolving is simply going through all the items in a collision in sequence to match the one we are looking for. Tear off each diary and see whose it is.
A subhash is organizing a hash table of the items involved in the collision, but by a different criterion. For example, by the hour the student was born.


If you are interested in the topic, I suggest a course from MailRu on youtube about algorithms and data structures.

 
Sergey Dzyublik:

In this example the hash is the student's birthday.
We have a closet with 365 shelves that contain students' diaries.
We put each diary on the shelf that corresponds to the student's birthday.

We need the diary of Petrov and we know when he was born.
Now by the date of birth in O(1) we can easily find the diary of Petrov, as well as another student.
The exception is when two students have the same birthday - this is called a collision.

Collision solving is using additional information to determine which of two or more diaries we need.
List-based resolving is simply going through all the items in a collision in sequence to match the one we are looking for. Tear off each diary and see whose it is.
A subhash is organizing a hash table of the items involved in the collision, but by a different criterion. For example, by the hour when a student was born.


If you are interested in the topic, I suggest a course from MailRu on youtube about algorithms and data structures.

This is how I originally imagined it. I just don't understand the highlighted one. Hash is not an index, so you can find an offset in an array of pointers. All the same, we need to search in a list. This is a binary search with proper sorting.

 
fxsaber:

That's how I imagined it from the beginning. I just don't understand the highlighted one. Hash is not an index, so you can find it by offset in an array of pointers. All the same, you need to search in the list. And this is a binary search with proper sorting.


Highlighted are verbal typos)) And already corrected.
Hash is an index. It is given to the size of the dictionary.

Each shelf has the same height, and by the number of the shelf you can know exactly at what height to look for it. O(1)

 

Simple as possible about associative array #1

A lot of algorithms that are presented in Generic are based in one way or another on the associative array or dictionary. It is also one of the two most commonly used algorithms. I think I would be close to the truth if I said that 90% of all programming tasks are covered by arrays and dictionaries. Before moving on to practice, let us describe the work of the dictionary as simple and clear as possible, deliberately simplifying some of the details.

We will work with a very simple example: a regular wordlist. Suppose we have just a few of those words and they all start with different letters of the alphabet:

string words[] = {"apple", "cat", "fog", "dog", "walk", "zero"};

The English alphabet contains 26 characters. Let's create an array of strings, of size 26 elements:

string dictionary[26];

Now, if we agree to store words in indexes corresponding to their first letter we will get a simple dictionary. We will do the indexing from scratch. The word 'apple' will be stored in our index 0, because the character 'a' is the first letter of the alphabet, 'cat' - in index 1, 'dog' - in index 3, fog - will occupy index 4, walk - index 24 and zero - the last index 25.

Note that indexes from 5 to 23 will not be used. This is an additional waste of memory, but we can instantly access, for example, the word "walk", because we know that if it exists, it is located at index 24.

Let's write our first empty dictionary:

//+------------------------------------------------------------------+
//|                                                         Dict.mq5 |
//|                        Copyright 2017, MetaQuotes Software Corp. |
//|                                              http://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2017, MetaQuotes Software Corp."
#property link      "http://www.mql5.com"
#property version   "1.00"
string words[26];

bool Contains(string word)
{
   uchar index = (uchar)StringGetCharacter(word, 0)-97;
   return words[index] != NULL;
}
bool Add(string word)
{
   uchar index = (uchar)StringGetCharacter(word, 0)-97;
   if(words[index] == NULL)
   {
      words[index] = word;
      return true;
   }
   return false;
}
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
void OnStart()
  {
//---
   Add("apple");
   if(Contains("apple"))
      printf("Словарь содержит слово apple");
   else
      printf("Словарь не содержит слово apple");
  }
//+------------------------------------------------------------------+

If we add the word "apple" to it and then find it, the search operation to access this word will be instantaneous. Because we know in advance the index by which this word can be found. There can be no other options for the index, so it is not necessary to search the entire list. This is roughly the basis of all dictionaries.

In the following example, we will improve it so that we can store words that begin with the same letter.

 

About associative array #2

It happens that different words begin with the same letter of the alphabet. If we put the word "apple" in our previous dictionary, and then try to put "application" there, it won't work. Index 0 will be already occupied by the word apple. In this case they talk about a hash function collision. Our hash function is very simple - it returns the number of the first character of the word, that's why collisions in this function are very frequent. In order to store different words starting with the same letter, we make an addition: we will store elements not in an array, but in an array of arrays. Then index 0 will contain another array with two words: "apple" and "application":

//+------------------------------------------------------------------+
//|                                                         Dict.mq5 |
//|                        Copyright 2017, MetaQuotes Software Corp. |
//|                                              http://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2017, MetaQuotes Software Corp."
#property link      "http://www.mql5.com"
#property version   "1.00"
#include <Arrays\ArrayObj.mqh>
#include <Arrays\ArrayString.mqh>
CArrayString* WordsArray[26];

bool Contains(string word)
{
   uchar index = (uchar)StringGetCharacter(word, 0)-97;
   CArrayString* words = WordsArray[index];
   if(words == NULL)
      return false;
   for(int i = 0; i < words.Total(); i++)
      if(word == words.At(i))
         return true;
   return false;
}
bool Add(string word)
{
   uchar index = (uchar)StringGetCharacter(word, 0)-97;
   CArrayString* words;
   if(WordsArray[index] == NULL)
      WordsArray[index] = new CArrayString();
   words = WordsArray[index];
   for(int i = 0; i < words.Total(); i++)
      if(word == words.At(i))
         return false;
   words.Add(word);
   return true;
}
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
void OnStart()
  {
//---
   //ArrayResize(WordsArray, 26);
   Add("apple");
   Add("application");
   if(Contains("application"))
      printf("Словарь содержит слово application");
   else
      printf("Словарь не содержит слово application");
  }
//+------------------------------------------------------------------+

Now our dictionary stores words even starting with the same letter. But the cost of accessing words with the same first letter increases. Because now we need a full search of all the words beginning with 'a' to find out if the word 'application' is in the dictionary or not. If our hash function would produce different numbers even if the words differed by one character, then the probability of trying words with the same indexes would tend to zero, and access to an arbitrary element would tend to o(1). This is exactly what happens in real dictionaries. The functions used there are collision-resistant, so we can say with certainty that accessing the items in these collections is very fast and virtually search-free.

 
Vasiliy Sokolov:

About associative array as simple as possible

Very many algorithms that are presented in Generic are based in one way or another on the associative array or dictionary. It's also one of the two most commonly used algorithms. I think I would be close to the truth if I said that 90% of all programming tasks are covered by arrays and dictionaries. Before moving on to practice, let us describe the work of the dictionary as simple and clear as possible, deliberately simplifying some of the details.

We will work with a very simple example: a simple wordlist. Suppose we have just a few of those words and they all start with different letters of the alphabet:

The English alphabet contains 26 characters. Let's create an array of strings, of size 26 elements:

Now, if we agree to store words in indexes corresponding to their first letter we will get a simple dictionary. We will do the indexing from scratch. The word 'apple' will be stored in our index 0, because the character 'a' is the first letter of the alphabet, 'cat' - in index 1, 'dog' - in index 3, fog - will occupy index 4, walk - index 24 and zero - the last index 25.

Note that indexes from 5 to 23 will not be used. This is an additional waste of memory, but we can instantly access, for example, the word "walk", because we know that if it exists, it is located at index 24.

Let's write our first empty dictionary:

If we add the word "apple" to it and then find it, the search operation to access this word will be instantaneous. Because we know in advance the index by which this word can be found. There can be no other options for the index, so it is not necessary to search the entire list. This is roughly the basis of all dictionaries.

In the next example we will improve it so that we can store words, which begin with the same letter.

This solution is perfect. Everything is as simple as possible. Only functions, arrays and correct data organization. This is what I was talking about.