How to find out how to filter data before processing - Trading Systems

Aleksey Vyazmikin 2016.12.21 17:43 #21

Vladimir:
I remembered one way of clustering. It goes something like this: you allocate groups of elements (clusters) in a set, such that the maximum distance between elements of one cluster is less than the minimum distance from any element of that cluster to the element not in that cluster. The distance can be an ordinary distance, a modulus of difference of real numbers. Of course, there will not necessarily be only one such cluster. Maybe you don't need exactly one cluster, maybe you should compare them with each other in other ways as well. For example, the average time of occurrence of a level in the group.

This is interesting, but so far I can't figure out how to determine what numbers are in the cluster - by brute force? Then, I suppose, there will be groups that overlap each other, because if we define a cluster by searching for the smallest delta, but larger with respect to other elements, then dropping some element from such cluster will shift the cluster - the distance between clusters will be important, if it is significant, it should work out.

Vladimir:

The distance between two points on the real axis we all measure as the modulus of their difference. In mathematics, this is commonly referred to as the metric. How do we measure the distance in the plane between points, pairs of numbers, each of which is real? Again we have the familiar ready solution - the Euclidean distance, the root of the square of the sum of the squares of the subordinate differences. And mathematicians have other metrics in the plane, e.g. the greatest modulus of two differences, the sum of moduli of differences(http://ad.cctpu.edu.ru/Math_method/math/45.htm). And this is only in the case of pairs of numbers. Only two numbers, and always two. And you need to enter a proximity measure in a much more complicated situation. There are not two numbers in a group, and there are different numbers in different groups.

We need to identify the largest group, or the same groups by the number of elements in them. My disadvantage is my inability to read complex formulas correctly, so I have to try to understand everything from examples and comments to them.

Vladimir:

There are metrics in mathematics that measure the distance between two functions. But again, always between two. Again not suitable for you, you have a group.

That's why it's important to understand it thoroughly yourself. Write, maybe we can formalise it down to an algorithm for obtaining a numerical characteristic of proximity in a set.

However, consider also giving up trying to create it. The link above says what requirements the metric must meet. They didn't appear there out of the blue, without any of them strange effects will occur. In the post above I gave an example of how to abandon such comprehensive attempts - let the points in the group be closer in pairs to each other on the real axis than to elements outside the point. You wouldn't have to invent something very non-trivial.

That's right, initially we determine the proximity of two points, and then we try to exclude the distance that is great - that's the question, how do we determine if the distance is great or not? This is where the algorithm has now failed - when the distance appears to be an order of magnitude greater.

Machine learning in trading: Finding the High and Pure maths, physics, chemistry,

Aleksey Vyazmikin 2016.12.21 17:44 #22

Dmitry Fedoseev:
Didn't write it down - count the differences first. Then everything else.

So you have counted the differences in the "Delta" column, what do you suggest we do next?

Aleksey Vyazmikin 2016.12.21 18:09 #23

Testing such an algorithm to filter the data before processing:

1. Sum the two deltas in sequence and multiply the value by two

2. Find the average value of the resulting numerical series

3. Create a new numerical series if the value is lower than the mean value

4. Repeat point 2-3 until the numerical series is less than half of the original series

NUMBER P./P.	Number	Delta	53,33	25,82	9,60
1	10
2	20	10
3	30	10	40	40
4	40	10	40	40
5	50	10	40	40
6	51	1	22	22	22
7	52	1	4	4	4
8	53	1	4	4	4
9	54	1	4	4	4
10	60	6	14	14	14
11	70	10	32	32
12	80	10	40	40
13	120	40	100
14	150	30	140
15	190	40	140
16	210	20	120
17	223	13	66
18	232	9	44	44
19	250	18	54
20	260	10	56

5. After filtering, we already do the calculation according to the above algorithm

NO.P./P.	Number	Delta	Close values	Proximity in a row	Maximum	Dense	Density	Density v2
1	40				4
2	50	10	0	0		50
3	51	1	1	1		51	0,80	1,00
4	52	1	1	2		52
5	53	1	1	3		53
6	54	1	1	4		54
7	60	6	0	0

Tried different figures - got a plausible version, would be happy to hear critical comments.

Machine learning in trading: Pure maths, physics, logic 10points 3.mq4

Dmitry Fedoseev 2016.12.21 18:25 #24

-Aleks-:
So you counted the differences in the "Delta" column, what do you suggest we do next?

Why are you going around in circles? It has long been written here

Aleksey Vyazmikin 2016.12.21 20:38 #25

Dmitry Fedoseev:
Why are you going around in circles? It's been written here for a long time now

Here you state "The longest section is when the original series is below average." but this, I understand, is a flaw in my algorithm, after which the decision was made to make a filter - I did it and now the algorithm doesn't get so obviously stupid when the numbers are significantly different from each other.

OrderCloseTime Expert Advisor MQL5 Machine learning in trading: Deviation param from standard

Dmitry Fedoseev 2016.12.21 20:56 #26

-Aleks-:

Here you state "The longest stretch is when the original series is below average.", but this, as I understand it, is a flaw in my algorithm, after which the decision was made to make a filter - I made it and now the algorithm does not get so obviously stupid when the numbers differ significantly from each other.

What is the disadvantage?

The filter is not a substitute for the algorithm. The filter is an addition to the algorithm.

Aleksey Vyazmikin 2016.12.21 21:02 #27

Dmitry Fedoseev:

What is the disadvantage?

The filter is not a substitute for the algorithm. The filter is an addition to the algorithm.

I don't know what the disadvantage is - I may not see it yet.

I think I should try to code it now - can you help me if I have difficulties?

Dmitry Fedoseev 2016.12.21 21:12 #28

-Aleks-:

I don't know what the downside is - I may not see it yet.

I think I need to try and codify it now - can you help me if I'm having difficulties?

You should start first. Or maybe you won't have difficulties. But I won't think about anything before then, because it turns out that I'm thinking the wrong thing or thinking the wrong way...

Aleksey Vyazmikin 2016.12.21 21:20 #29

Dmitry Fedoseev:
because every time I find out I'm thinking the wrong thing, I'm thinking the wrong thing...

That's what makes people unique...

Aleksey Vyazmikin 2017.02.08 21:57 #30

Started to develop an algorithm - I'm making a filter now. Difficulty has arisen in synchronising the two columns - "Number" and "Delta"

Ideas on how to eliminate the inaccuracy would be welcome:

//+------------------------------------------------------------------+
//|                                              Test_FindOblast.mq4 |
//|                        Copyright 2017, MetaQuotes Software Corp. |
//|                                             https://www.mql5.com |
//+------------------------------------------------------------------+
#property copyright "Copyright 2017, MetaQuotes Software Corp."
#property link      "https://www.mql5.com"
#property version   "1.00"
#property strict
//+------------------------------------------------------------------+
//| Script program start function                                    |
//+------------------------------------------------------------------+
void OnStart()
  {
   int massivSize=19; //размер массива
   double Digit[19]=
     {
      10,
      20,
      30,
      40,
      50,
      51,
      52,
      53,
      54,
      60,
      70,
      80,
      120,
      150,
      190,
      210,
      223,
      232,
      250,
      260
     };
   double summDelta[19-1];
   int N=massivSize-1;//Количество оставшихся цифровых значений
   double avrMass=0;//Среднее значение массива дельт

//-Фильтр
//1. Суммируем  последовательно две дельты и умножаем значение на два
   for(int i=1;i<massivSize;i++)
     {
      summDelta[i-1]=((Digit[i]-Digit[i-1])+(Digit[i+1]-Digit[i]))*2;
     }
   for(int i=0;i<massivSize-1;i++) printf("summDelta[%d] = %G",i,summDelta[i]);

//2. Находим среднее значение получившегося числового ряда
//3. Составляем новый числовой ряд, если значение меньше среднего значения
//4. Повторяем пункт 2-3 пока числовой ряд не будет меньше половины первоначального ряда
   for(int Z=0;N>massivSize/2;Z++)
     {
      int SizeMass=ArraySize(summDelta);//Узнаем размер массива
      avrMass=iMAOnArray(summDelta,0,SizeMass,0,0,0);
      Print("Среднее значение получившегося числового ряда",Z,"=",avrMass);

      for(int i=0;i<SizeMass;i++)
        {
         if(summDelta[i]>avrMass)
           {
            summDelta[i]=0;
            Digit[i]=0;
            N--;
           }
        }

         Print("N=",N);
         ArraySort(summDelta,WHOLE_ARRAY,0,MODE_DESCEND);
         ArraySort(Digit,WHOLE_ARRAY,0,MODE_DESCEND);
         if(N!=0)
           {
            ArrayResize(summDelta,N,0);
            for(int i=0;i<N;i++) printf("summDelta[%d] = %G",i,summDelta[i]);
            ArrayResize(Digit,N+1,0);
            for(int i=0;i<N+1;i++) printf("Digit[%d] = %G",i,Digit[i]);
           }
         else
           {
            for(int i=0;i<N;i++) printf("summDelta[%d] = %G",i,summDelta[i]);
            for(int i=0;i<N+1;i++) printf("Digit[%d] = %G",i,Digit[i]);
            return;
           }
     }
      int SizeMass=ArraySize(summDelta);//Узнаем размер массива
      avrMass=iMAOnArray(summDelta,0,SizeMass,0,0,0);
      Print("Среднее значение получившегося числового ряда=",avrMass);

//-Основной алгоритм
//1. Находим разницу между числами - это как раз их близость друг от друга.

//2. Если число меньше среднего значения дельт, получившихся из п.1, то - 1, а если нет - 0.

//3. Если значение из п.2 равно 1, то суммируем значение с предыдущим итогом, если нет - 0.

//4. Находим максимальное значение из пункта 3.

//5. Определяем диапазон - находим значение из пункта 4 и ищем вверх из пункта 3 число с нулевым значением, потом увеличиваем найденное число на единицу.
//Таким образом мы получаем диапазон чисел, плотность которых наибольшая по отношению к другим.
  }
//+------------------------------------------------------------------+

Any questions from newcomers Questions from Beginners MQL5 [ARCHIVE!] Any rookie question,

NUMBER P./P.	Number	Delta	53,33	25,82	9,60
1	10
2	20	10
3	30	10	40	40
4	40	10	40	40
5	50	10	40	40
6	51	1	22	22	22
7	52	1	4	4	4
8	53	1	4	4	4
9	54	1	4	4	4
10	60	6	14	14	14
11	70	10	32	32
12	80	10	40	40
13	120	40	100
14	150	30	140
15	190	40	140
16	210	20	120
17	223	13	66
18	232	9	44	44
19	250	18	54
20	260	10	56

NUMBER P./P.	Number	Delta	53,33	25,82	9,60
1	10
2	20	10
3	30	10	40	40
4	40	10	40	40
5	50	10	40	40
6	51	1	22	22	22
7	52	1	4	4	4
8	53	1	4	4	4
9	54	1	4	4	4
10	60	6	14	14	14
11	70	10	32	32
12	80	10	40	40
13	120	40	100
14	150	30	140
15	190	40	140
16	210	20	120
17	223	13	66
18	232	9	44	44
19	250	18	54
20	260	10	56

Numerical series density - page 3

NUMBER P./P.	Number	Delta	53,33	25,82	9,60
1	10
2	20	10
3	30	10	40	40
4	40	10	40	40
5	50	10	40	40
6	51	1	22	22	22
7	52	1	4	4	4
8	53	1	4	4	4
9	54	1	4	4	4
10	60	6	14	14	14
11	70	10	32	32
12	80	10	40	40
13	120	40	100
14	150	30	140
15	190	40	140
16	210	20	120
17	223	13	66
18	232	9	44	44
19	250	18	54
20	260	10	56