What data organization to use for finding... - page 2

 
Dominik Christian Egert #:
You suggest a hybrid solution, little bit like 3d rendering, where you only render what you see...


Maybe that's a way to go. And I could use a partial plane for the currently visible area as well as some additional frame around the visible area. Maybe even go as far as gradually adding and removing from the partial plane. It probably would need to span multiple visible areas in all directions.

...

Some additional challenges come up. What happens if zoom level is changed. What happens if price bar scale is changed....

...

I am not convinced this is feasible in a more efficient way than just going without taking a "viewport" into consideration.

I think, it will turn out to be an implementation of R*-Tree in mql....

But. - What about vectors? I am not very educated on vector math, but any shape could be seen as a vector collection. Isn't there some math that could be used to solve the task?

Especially because vectors are native implemented in mql.





I don't get why you want to extend around the visible area ? A user cannot move the mouse on something invisible (of course an object partially visible need to be include in the visible area, that's a different point).

When the chart is zoomed, scaled, moved, etc... just update your visible area and the selected objects according to the new time/price intervals. In all cases working with chart objects is relatively slow, you would need to keep all data in memory for a fast processing.

A vector is a data structure, not an algorithm, you need to start from the algorithm then see what data structure you need to implement it.

 
Alain Verleyen #:

I don't get why you want to extend around the visible area ? A user cannot move the mouse on something invisible (of course an object partially visible need to be include in the visible area, that's a different point).

When the chart is zoomed, scaled, moved, etc... just update your visible area and the selected objects according to the new time/price intervals. In all cases working with chart objects is relatively slow, you would need to keep all data in memory for a fast processing.

A vector is a data structure, not an algorithm, you need to start from the algorithm then see what data structure you need to implement it.

The idea was to precalculate. Little bit like 3d rendering is done where the next frame gets rendered while displaying the current one.

Something like this in the control path: No element is hovered, update the "viewport" data. Else, Update the charts objects.

Since mouse move are frequent events, this would make the code be more responsive.

When using a tree as a data structure, like a BSP Tree (Binary Spatial Partitioning) parts of the tree could be dropped, a new root of the tree would be selected and new areas of the chart would be added to the tree when the viewport changes.

This would make it work somewhat gradually responsive to the changes.

Concerning zoom and scale change, I could simply go with the maximum possible visible area and overlay the actual visible area. This way I wouldn't need to worry about changes to much and could feed that as a boundary when traversing the tree.

Concerning the data structure, yes, that's right, vectors are not an algorithm. But I think, they come in handy as a data structure to store the points and make a selection/comparison when traversing the tree.

I first need to decide on the tree type. BSP Trees seem to be a viable choice. They have some interesting properties that could be used for other tasks as well.


I am still thinking about how to solve this. But it seems, a tree is one of the better approaches.

 
Dominik Christian Egert:
I have a chart on which I draw 10s of thounds of objects of different types, iE trendlines, rectangles, vertical and horizontal lines, triangles... All have coordinates of type price and datetime.

I have a "list" of all objects drawn.

What I would like to do now is, given the coordinates of the mouse, converted to price and datetime, query this list efficiently (meaning not iterate the whole list) and get the object(s) that are coinciding with the mouse coordinates.

Instead of performing a linear search (iterating) through the list, you can build an index from keys in the list, then use that index to get the values directly from the list.

  1. when the list is populated, you build an index for the list.
  2. every time the list is changed, you have to re-build the index again.

This idea is similar to index rebuilding in relational databases.

The index itself is a hash map of key-value pairs, where (key) is a transformation of price and time; (value) = index of a graphical object in the list;

The run-time complexity of linear search by iterating though the list is O(n), where n is the total objects in the list. The time complexity of hashed search using the index map method is O(1).

#include <Arrays\ArrayObj.mqh>
#include <Generic\HashMap.mqh>

class t_obj : public CObject
  {
public:
   string            name;
   double            price;
   datetime          time;
                     t_obj(string n, double p, datetime t) { name = n; price = p; time = t; }
  };

CArrayObj              List;      // list of chart objects
CHashMap<string, int>  MapIndex;  // hashmap for faster indexing by price and time

int OnInit()
  {
   int sub_window = 0;

   // Populate the list of chart objects
   int obj_total = ObjectsTotal(0, sub_window);
   for(int i = 0; i < obj_total; i++)
     {
      string name = ObjectName(0, i, sub_window);
      double price = ObjectGetDouble(0, name, OBJPROP_PRICE);
      datetime time = (datetime)ObjectGetInteger(0, name, OBJPROP_TIME);

      List.Add(new t_obj(name, price, time));
     }
   
   // Rebuild the index of the list
   RebuildIndex();
   
   return INIT_SUCCEEDED;
  }

void RebuildIndex()
{
   MapIndex.Clear();
   
   for(int i = 0; i < List.Total(); i++)
     {
      string key;
      t_obj *obj = List.At(i);
      StringConcatenate(key, obj.price, obj.time);
      MapIndex.Add(key, i);  // map each <price, time> to a numeric index.
     }
}

void OnChartEvent(const int id, const long &lparam, const double &dparam, const string &sparam)
  {
//--- If this is an event of a mouse click on the chart
   if(id==CHARTEVENT_CLICK)
     {
      //--- Prepare variables
      int      x      = (int)lparam;
      int      y      = (int)dparam;
      datetime time   = 0;
      double   price  = 0;
      int      window = 0;
      //--- Convert the X and Y coordinates in terms of date/time
      if(ChartXYToTimePrice(0, x, y, window, time, price))
        {
         string key;
         int index;

         // get the key from price and time.
         StringConcatenate(key, price, time);

         // get array index from the hashed key
         if(MapIndex.TryGetValue(key, index))
           {
            t_obj *obj = List.At(index);
            PrintFormat("X=%d  Y=%d  =>  list_index=%d  object_name=%s", x, y, index, obj.name);
           }
        }
     }
  }

 
amrali #:

Instead of performing a linear search (iterating) through the list, you can build an index from keys in the list, then use that index to get the values directly from the list.

  1. when the list is populated, you build an index for the list.
  2. every time the list is changed, you have to re-build the index again.

This idea is similar to index rebuilding in relational databases.

The index itself is a hash map of key-value pairs, where (key) is a transformation of price and time; (value) = index of a graphical object in the list;

The run-time complexity of linear search by iterating though the list is O(n), where n is the total objects in the list. The time complexity of hashed search using the index map method is O(1).


Thank you for this idea and demo code.

This was my first attempt to solve the task. There is just one challenge I couldn't solve using a hash algorithm.
A Trendline will have at most only two pairs of coordinates, thus only two entries in the list that match. Therefore only the coordinates of the edges will give a hit. And these hashes are completely anywhere in the list, they have no relation to the spacial plain.

What I would need, is a hash function that allows for closest point searches. So I thought, I'll just create enough hashes for all the coordinates a Trendline is composed of, as if it were made of dots.

Now that leads to an array, that is more complex to maintain and has additionally the computational need for the hash function.

This lead me to try a simple x-y map where I store the objects at their coordinates.

It's the either memory or CPU tradeoff I am facing.

So I came to the conclusion, it be best to be able to search vectors. (The mathematical vectors, with coordinates) so I started this thread to see what possible options other people have.

If a hash function were able to create a hash that matches multiple points, I could use that approach. But I cannot think of one. That's where my question about "some math for vectors" comes in.

One idea I had was to "transform" all objects to vectors and then transform the vectors from [[x1, y1], [x2, y2]] to [[0, 0, y1], [x1, x2, y1]] but thats beyond my capabilities to understand how this could solve my search problem. The idea was to use the length of the vector here to calculate the hash, instead of the coordinates itself.

So the issue is, using a hash, I would need a special type of hash function, that is capable of incorporating the spacial information between the two endpoints of the vector in such way, that intermediate coordinates also produce the same hash.




 
if the coordinates are limited like 0-1000 for both for a hash you can squeeze them together into one new number = x + 1000*y. Then x= number%1000 and y= number/1000 (one need to check the rounding of big numbers)? max of ulong (18 446 744 073 709 551 615) has 20 digits so this would work up to 8 digits (99 999 999) for each coordinate?
 

My idea is to round-down the coordinates of the graphical objects (e.g., price to 3 digits, time to 1 minute) before calculating the hash key.

This effectively will transform the chart area to small rectangles that can be easily encoded in the index database.

The search by using these approximate keys (mouse price and time) will return the "nearest bounding rectangle". However, each rectangle has a limitation that it can contain only one object (HashMap does not allow values with duplicate keys).

string GetHash(double price, datetime time)
  {
   string key;
   price = floor(price*1000)/1000;
   time = time - time % PeriodSeconds(PERIOD_M1);
   StringConcatenate(key, price, time);
   return key;
  }
 
amrali #:

My idea is to round-down the coordinates of the graphical objects (e.g., price to 3 digits, time to 1 minute) before calculating the hash key.

This effectively will transform the chart area to small rectangles that can be easily encoded in the index database.

The search by using these approximate keys will return the "nearest bounding rectangle".

So, in fact reducing the effective resolution and by that introducing some blurriness to the hashes.

In contrast a BSP Tree has O(log-n) and is computational not as expensive as a hash function. It lays out more to memory, but only relevant parts.

The hash would need to be computed on all objects. And on every mouse move as well.

Maybe a combination would give a good compromise. Instead of using all digits, just removing one digit and reducing time scale to maybe 6 seconds and using a BSP Tree in combination would give good results.

The hash table still has the problem of working only on the coordinate points, meaning, intermediate coordinates along a iE Trendline will still not produce a match.


EDIT:

Collisions in hashes is a problem especially when reducing the "resolution". The "endpoint" of one Trendline is the "starting point" of another, thus drawing an ABC pattern.


 
Carl Schreiber #:
if the coordinates are limited like 0-1000 for both for a hash you can squeeze them together into one new number = x + 1000*y. Then x= number%1000 and y= number/1000 (one need to check the rounding of big numbers)? max of ulong (18 446 744 073 709 551 615) has 20 digits so this would work up to 8 digits (99 999 999) for each coordinate?
I am not sure, I properly understand what you are trying to say.

A hash is by definition an always same length representation of the data it is generated on.

So the amount of data I would feed to iE CRC64 is independent of the resulting amount of data for the hash. The hash will in this case always be 64 bit. That's where the so called "hash-collision" issue comes in, where two different inputs produce the same hash.

The mql std-lib hash solutions do not allow collisions, they require unique hashes.

That in it's own wouldn't be an issue, as I am able to build a hash table that allows collisions.

The issue I am having with hashes is the spacial partitioning.

What I mean is, a vector has only two points, and at least one point is shared with another vector (in my case). Also the vector itself has no coordinates along it reaching from start to end.

For a hash table, I would need to interpolate these points and fill the table with these points.

When I do so, reducing the "resolution" will reduce either the amount of objects uniquely representable, or increase the collisions.

Computing the interpolated points, creating a hash from them, managing collisions, is going to be a heavier tradeoff than using a BSP Tree, as I see it at the moment.


 
I think you need to code a mini "Google Maps" algorithm, hehehee. 
 
For triangles for instance, does it need to return only from the borders or also from within?