Market etiquette or good manners in a minefield - page 84

 

Neutron thanks for sending in the thesis


Thanks also for the topic raised and the constructive

I have so far managed to implement something similar using SSAS (unfortunately, only read both this thread and the thesis today),

maybe k.n. will learn something useful from it:

1. Clustering the intraday range into several clusters according to the average value of the H1 candlestick and dividing the day into time intervals (TI)

(2 clusters and 2 VV, but one yen pair turned out 4 clusters and 7 VV)

2. Models are created for each instrument and VI:

- clustering of previous upward and downward moves separately by two input parameters: price and time

- Clustering of future moves upwards and downwards separately by two input parameters: price and time
(this model is of course only used to prepare data for the training phase)

- predictive models:
-- naive Bayes

-- rule of thumb

-- relationship rules

-- NS

-- decision tree

3. based on minimal deviation from an extremum, a zigzag is constructed - we obtain a piecewise monotone f-structure and, accordingly, patterns of N segments.

We train appropriate clustering models and clustering the segments

5. We calculate the predictive horizon of the pattern as a fraction of the pattern length or fix length or to the nearest multiple of a larger fraction of the pattern length

6. Calculate and classify the "forecast"

7. Teach prognostic models

During operation, the Expert Advisor loads data into MS SQL and periodically generates zigzag points and forecasts. Having received them, it analyzes the values of their support and the validity of the rules. It makes a decision based on the comparison.


Advantages:

- It is possible to use and compare the results of several DM models

- the models themselves can choose from a variety of methods

Disadvantages:

- Fewer parameters for setting up the models

- resource-intensive operation of MSSQL

- Lack of a scientific basis (let us correct, incl. the conclusions of the aforementioned dissertation)


Without in any way encroaching on the great shade of science and proofs in the mentioned work,

I would very much like to ask everyone reading this thread

about expediency of using for clustering piecewise monotone additional inputs such as volume, market-delta, etc.

Perhaps they can play an additional, e.g. filtering role.


S.Privalov: In fact this work (dissertation) defines a method to create a pattern for forecasting.

Imho, that's what we're talking about - for example, when you get a forecast by ass. rules you can specify number N of pattern pieces and ask for different number of prognostic steps ahead and get them with different supports and certainties - i.e. we get a model that can be used as a basis for a forecast.

Do you think it's possible/how do you apply your idea of using Kalman's f. to them?

Having a model/and it would be interesting to train it under them.


This is what the models look like in SSAS, for this tool 3 vi have been identified

Here http://www.fxexpert.ru/forum/inde x.php?showtopic=656 there are libraries of meshes, including mql and other useful stuff.

 

I started digging deep into the network trying to determine the optimal number of hidden neurons. I came to a rather interesting conclusion, which I think others have long since come to. I will try to explain it intuitively. Imagine a hidden layer neuron that has 20 inputs. With these inputs, like tentacles, it groans on the inputs and in the process of grooving, weights of inputs are optimized to get the expected outputs of the network. We now add another hidden neuron, with 20 inputs connected to the 20 inputs of the first neuron, but with different weights. Since the second neuron is probing the same inputs, it essentially gives no new information. If this is true, then training the network will lead to the same input weights for both neurons. As the outputs of both neurons are multiplied by the corresponding weights of the output neuron and summed up, the input weights of both hidden neurons may differ from each other in amplitude by a factor equal to the ratio of output neuron weights.

Anyway, after training such a 20-2-1 network, I get these input weights of two hidden neurons:

Notice that both sets of weights behave the same depending on the input number, except for the constant multiplier. That is, both neurons give the same information to the output neuron. Maybe that's why the learning error of my network hardly improves after adding a second hidden neuron. Here is another picture of weights of hidden neurons. Here they differ by a sign, but the signs of output neuron weights also differ


Generally speaking, such a two-layer network behaves similarly to a simple single-layer network (AR model). With this conclusion I am at a dead end. It turns out that neural networks are suitable only for classification, i.e. when the output has only two values: yes or no. Trying to use the network for price prediction leads to the same results as the AR model since neuron outputs are not saturated and its non-linear activation function plays a minor role even if we normalize input data to -1...+1. I'm going to have a beer - maybe new ideas will appear (for example, to feed +1 and -1 to network inputs?)

 
M1kha1l писал(а) >>

I have so far managed to implement something similar using SSAS (unfortunately, I only read this thread and the thesis today),

Here http://www.fxexpert.ru/forum/inde x.php?showtopic=656 there are libraries of meshes, including mql and other useful stuff.

Thank you, M1kha1l, for your kind words and for sharing your knowledge.

Your post needs some time to comprehend. I`m reading it now.

gpwr wrote >>

I started digging deep into the network trying to determine the optimal number of hidden neurons. Came to a rather interesting conclusion, which others probably reached a long time ago. I'll try to explain it intuitively. Imagine a hidden layer neuron that has 20 inputs. With these inputs, like tentacles, it groans on the inputs and in the process of grooving, weights of inputs are optimized to get the expected outputs of the network. We now add another hidden neuron, with 20 inputs connected to the 20 inputs of the first neuron, but with different weights. Since the second neuron is probing the same inputs, it essentially gives no new information. If this is true, then training the network will lead to the same input weights for both neurons. As the outputs of both neurons are multiplied by the corresponding weights of the output neuron and summed up, the input weights of both hidden neurons can differ from each other in amplitude by a factor equal to the ratio of output neuron weights.

Great work, gpwr, I didn't even have a thought to look at the weights of the parallel neurons!

Let's look at the process of learning error reduction (in red) and generalization error reduction (in blue) for multilayer NS in the learning process:

We can see that the minimum of the generalisation error (what gives us the statistical advantage in decision making) is not the same as the minimum of the generalisation error. This is understandable, because history does not always repeat exactly, it repeats approximately. And while there is only one global minimum for learning, there are many local minima for generalization and there is no way to choose the best one. You can only guess. And this is where statistics in the form of multiple neurons in a hidden layer comes to the fore. All of them are a bit undertrained and "wallow" in local minima, some at their best (in terms of generalization error), some at their worst. Get it? In general, their solutions are averaged by the output linear neuron and such estimation is the best. Moreover, the estimation error falls as the square root of the number of neurons in the hidden layer. This is why predictive power of NS increases with increasing number of neurons in hidden layer.

Those data, which you cited, say about the retraining of the mesh and, as a consequence, synchronous operation of all its neurons in the hidden layer. The multilayer NS has degenerated into a linear pseptron!

 
Neutron >> :

Thank you, M1kha1l, for your kind words and for choosing to share your knowledge.

It takes some time to comprehend your post. I read it.

Great work, gpwr, I didn't even think to look at the weights of the parallel neurons!

Come on, let's look at the learning error reduction process (in red) and generalisation error (in blue) for the multilayer NS in the learning process:

We can see that the minimum of the generalisation error (what gives us the statistical advantage in decision making) is not the same as the minimum of the generalisation error. This is understandable, because history does not always repeat exactly, it repeats approximately. And while there is only one global minimum for learning, there are many local minima for generalization and there is no way to choose the best one. You can only guess. And this is where statistics in the form of multiple neurons in a hidden layer comes to the fore. All of them are a bit undertrained and "wallow" in local minima, some at their best (in terms of generalization error), some at their worst. Get it? In general, their solutions are averaged by the output linear neuron and such estimation is the best. Moreover, the estimation error falls as the square root of the number of neurons in the hidden layer. This is why predictive power of NS increases with increasing number of neurons in hidden layer.

Those data, which you cited, say about the retraining of the mesh and, as a consequence, synchronous operation of all its neurons in the hidden layer. The multilayer NS has degenerated into a linear perseptron!

My network was given 300 training examples and the number of weights was 45. In literature there is an opinion that with 5 times more training examples than weights, the network with 95% probability will be generalized. That is, my network must have good generalization according to the theory, but in fact it is not so. That is why I gave examples to confirm it. I think the point here is not to take more training examples. It's about the nature of the problem I'm forcing the network to solve. If you try to make the network predict the size of the next step of the price, then in training it will tend to such weights at which neurons operate in the linear area of the activation function in order to preserve the proportionality between the predicted step and the input past steps. That is, the task itself is linear. Given this state of affairs, adding hidden neurons will not improve anything. And the hidden layer itself becomes unnecessary. By experimenting with my network, I came to the conclusion that a single layer works as well as a double layer. I think, reading your previous posts in this thread, you've come to the same conclusion for EURUSD as well.

In my opinion, the network should be used for highly non-linear problems (such as XOR or classification problems) for which the neuron activation function can be chosen to be staggered.


 

I wanted to point out that the essence of the detected effect may be explained by overtraining of NS due to excessive number of training epochs, and there must be an optimum in the number of training epochs Nopt (see figure above), which depends on the coefficient of learning rate. I didn't say that we should "...to take even more training examples. ", this is the essence of the problem of optimal training sample length Rorth=w*w/d. That's what theory says, not that "...the number of training examples is 5 times the number of weights..." .

We're talking about different causes for the network overtraining effect.

I agree with you that the task of predicting the sign and amplitude of the next bar is predominantly linear and there can't be anything else. The market is as simple as a crowbar and as unpredictable as a weather forecast. Although, a bit later I will post the comparative results for quality of predictions on hourly bars for single layer, double layer with two neurons in the hidden layer and double layer with 4 neurons in the hidden layer. All these will be presented as functions of the number of inputs d of the NS. The data will be averaged over 50 independent numerical experiments.

 

Is there no way... to put a horse and a doe in the same cart? I mean MT4 and MathCad. To receive current quotes directly in MathCad and perform the entire analysis there, and then transfer the generated signal to MT4. The first thing that comes to mind is using the file, but both programs must check the contents of two different files all the time. It is not very convenient. Maybe there is something else? It's a pity that there is no interrupt handling in MQL. How inconveniently everything is done! МТ4, this fucking... I could just sit down and study C++.


P.S. I'm messing around with data for the grid.

 

No, is it boring to implement the whole analysis algorithm in MQL? Or are the difficulties exorbitant in some way?

P.S. And I'm getting statistics...

 

Yes, there are difficulties with MQL at every turn. For example, here is the simplest code (indicator) for splitting a quotient into a number of transactions:

#property indicator_chart_window
#property indicator_buffers 2

#property indicator_color1 Red
#property indicator_color2 MediumBlue

extern int step = 5;
extern int sp = 3;

double Trans[], Kagi[];
int mn, mx, H, Cotir, Up, Dn;
bool set_new=false, SetMax=false, SetMin=false;

//******************************************************************************
int init() 
{ 
SetIndexBuffer(0, Trans);
SetIndexBuffer(1, Kagi);

SetIndexEmptyValue(0,0.0);
SetIndexEmptyValue(1,0.0);

SetIndexStyle(0,DRAW_ARROW);
SetIndexStyle(1,DRAW_ARROW);

SetIndexArrow(0,119);
SetIndexArrow(1,162);

IndicatorShortName ("Kagi++");
return(0); 
} 
//*******************************************************************************

int start() 
{ 
int reset, MaxBar, MaxBarJ, counted_bars=IndicatorCounted(), mx_j, mn_j;

if ( counted_bars<0)return(-1);
if ( counted_bars>0) counted_bars--;
int limit=Bars- counted_bars-1; MaxBar=Bars-2; MaxBarJ= MaxBar-30; 
if ( limit==Bars-1) limit=Bars-2;

//----+ ОСНОВНОЙ ЦИКЛ ВЫЧИСЛЕНИЯ ИНДИКАТОРА 
for(int i= limit; i>=0; i--)
   {
     Cotir = Open[ i]*MathPow(10.0,Digits);
          
     if(! set_new)
      {
        mx = Cotir;
        mn = Cotir;
        H = step* sp;
        set_new = true;
      }               
     if( Cotir - mx > 0)
      {
        mx = Cotir;
        mx_j = i;
      }  

     if( Cotir - mn < 0)
      {
        mn = Cotir;
        mn_j = i;
      }  

     if(! SetMax && Cotir <= mx - H) 
      {
         Trans[ i] = Cotir/MathPow(10.0,Digits); 
         mn = Cotir;           
         mn_j = i;
         SetMax = true;
         SetMin = false;
         Kagi[ mx_j] = mx/MathPow(10.0,Digits);
      } 
     if(! SetMin && mn + H <= Cotir) 
      {
         Trans[ i] = Cotir/MathPow(10.0,Digits); 
         mx = Cotir;       
         mx_j = i;
         SetMax = false;
         SetMin = true;
         Kagi[ mn_j] = mn/MathPow(10.0,Digits);
      } 

   }
return(0);  
}


It works but we cannot draw its results with lines in the chart window. Besides, (this is beyond my understanding) if the spread is divided by less than 6 spreads, it does not output anything at all, despite the fact that arrays are formed correctly. In MQL, as well as in any homebrew technology there are a lot of completely unexpected difficulties (starting with double comparison and so on...) and "peculiarities", of course undocumented. I'm scolding, in general...

Also I understand how to check the grid in MathCad, but how to do it in MQL - can't you think? What if I have a bug somewhere?

 
paralocus писал(а) >>

Yes, there are difficulties with MQL at every turn. Here's the simplest code for example, to split a quotient into a number of transactions:

see attachment - draw - enjoy

paralocus wrote >>

I also know how to check the grid in MathCad, but how to do it in MQL - I have no idea. >> What if I have a bug somewhere?

Why not consider using NeuroShell or Statistics? There you can create meshes and put them into libraries and then train them from MT and monitor their parameters
Files:
 

Looks like I won't be able to wait to collect statistics on my MS...

I think I will have to give up the idea of presenting prediction accuracy data in increments of 1 by the number of NS inputs. For example, we can simplify the task by taking the number of inputs as a multiple of 2...