Comparison of two quotation charts with non-linear distortions on the X-axis - page 7

 
hrenfx:

How is this problem solved via DTW (example):

  1. We need to find similar situations on the history as the extreme 100 bars.
  2. The available history is 1 000 000 bars.
  3. First, we took 1,000,000 sequences of 50 bars and compared them with the pattern through DTW.
  4. Then we took another 1,000,000 sequences of 55 bars and compared them through DTW with the template.
  5. This time it is 60 bars.
  6. .....
  7. At 100 bars.
  8. .....
  9. 300 bars. And this is where we can stop.

In total we have done 50,000,000 comparisons using DTW algorithm which has complexity of O(N^2). That is very roughly 5 * 10^11 (500 billion) elementary computational operations were performed.

Now a new bar has arrived - we have done 500 billion calculations again.

We decided to run it on history, starting from 200 000 last element. Roughly, it takes 200,000 times 500 billion calculations each to make a run. That's 10^17 calculations in total.

Even if there is a tricky optimization, it will not yield a gain of more than two orders of magnitude. That is, you will have to perform 10^15 calculations at most.


First understand the principle of the algorithm before you say such nonsense. The complexity of the algorithm is O(n*m), where n and m are the lengths of the two input vectors.

p.s. No "at 50 bars, at 55 bars" should be done, because one of the three possible distortions is missing an observation.

p.p.s. DTW with path shape constraints also exists, your "and we can stop there" is also completely unnecessary.

p.p.p.s. Your "zigzag + correlation calculation" method will show nonsense, since the zigzag will be based on the maxima of a noisy random variable, which tells you little.

p.p.p.s. For searching in a large audio stream, completely different methods are used.

 
IgorM:
The article about DTW on Habrahabr http://habrahabr.ru/blogs/algorithm/135087/ seems to be pretty well written, but I can't figure out how to use DTW for OHLC, could someone explain it to me?


For OHLC you need to think of a function for bar spacing. For example:

1. two bars are given

2. approximate each of the bars by a polynomial of third degree (for bar C>O by points: {(t[0];Open), (t[1];Low), (t[2];High), (t[3];Close)}, for bar O>C by points {(t[0];Open), (t[1];High), (t[2];Low), (t[3];Close)}, where t[i]=i/3)

3. Consider distance as a square root of the integral of the square of the difference of two polynomials on the interval 0..1.

(not invented by me, the method seems to be well known and available to all interested persons).

 
Integer:

You should first read the task of the author of the thread and his answers.

I agree, the task was formulated to find a similarity criterion. I have made a further logical step that the author will apply, based on his earlier work with the similarity criterion via Spearman's QC. Perhaps mistakenly, the author is thinking of applying a new for him similarity criterion to another.
 
hrenfx:

Decided to run on history, starting with the 200,000 outermost element. Roughly, the run requires 200,000 times 500 billion calculations each. That's 10^17 calculations in total.

Even if there is a tricky optimization, it will not yield a gain of more than two orders of magnitude. I.e. it will have to perform 10^15 calculations at most.

The task is much more modest - to compare what has already happened today with the start of the previous day (pattern sizes are different!) and evaluate the similarity. If it is present - to draw an approximate course trajectory for the rest of today's day. If we take H1, it is 24 bars. In any case no more than 24 comparisons. On M15 it is 96 Comparisons at maximum. According to my observations, the similarity is within 2 days at most, and then the market "forgets" everything. Optimization over months and years is self-defeating.
 
anonymous:


Get a feel for the principle of the algorithm first, before you make up such nonsense. The complexity of the algorithm is O(n*m), where n and m are the lengths of two input vectors.

Why so much hostility and aggression? Read estimates of the algorithm's complexity. O(N^2) doesn't contradict what you've written. Algorithms with such complexity are suitable for tasks with small amounts of data.

p.s. No "at 50 bars, at 55 bars" needs to be done, as one of the three possible distortions is missing an observation.

p.p.s. DTW with path shape constraints also exists, your "and you can stop there" is also completely unnecessary.

Note the word "rough" mentioned several times in the example above. If all the nuances are taken into account, the post will grow to a huge unremarkable size. You can mention FastDTW-algorithm and other additional algorithmic optimizations. It's better to shine your brains and knowledge in practice.

p.p.p.s. Your "zigzag + correlation calculation" method will show nonsense, since the zigzag will be based on the maxima of a noisy random variable, which tells you little.

In your language - "nonsense". Could you constructively illustrate this with an example?

p.p.p.s. For searching in a large audio stream, completely different methods are used.

That's interesting.
 
wmlab:
The task is much more modest.

Then I've overdone it. There is probably nothing simpler than DTW for such a simple task. But comparing such small sequences of data is questionable.

Can you give an example from life when your hypothesis seems to have worked?

 
hrenfx:

Then I've overdone it. There is probably nothing simpler than DTW for such a simple task. But comparing such small sequences of data is questionable.

Can you give an example from life, when your hypothesis seems to have worked?


It kind of comes from real life - I noticed that on EURUSD I mentally shift today's already obtained chart to yesterday's one. If at least the first half of the day visually coincides, we can make a forecast (if it is not Friday and if there is no news). "Visually" is an alternation of ups and downs, levels don't match. Well, it is like looking at an illustration of evolution - the adjacent pictures are similar, the distant ones are not. Well, if the pictures of today and yesterday do not coincide, then this method does not work.
 
Is it possible to cite graphs of visually matched sites?
 
wmlab:
The task is much more modest - to compare what has already happened today with the start of the previous day (pattern sizes are different!) and assess the similarity. If so, to draw an approximate course trajectory for the rest of today's day. If we take H1, it is 24 bars. In any case no more than 24 comparisons. On M15 it is 96 Comparisons at maximum. According to my observations, the similarity is within 2 days at most, and then the market "forgets" everything. Optimization on the interval of months and years is a self-deception.

However, my indicator and Expert Advisor on H4, for example, operate with hindsight of 900-1000 bars and clearly catch the events of last bars https://forum.mql4.com/ru/46596/page124, it means that the market memory is not so fleeting thing?
 

Got something... Just didn't realise it. Experiencing a sense of mystical ecstasy:)

Files:
idtw2.mq4  8 kb