Using Neural Networks in Trading. - page 4

 
StatBars >> :

I think it follows from your post that normalisation depends more on the data than on the task at hand.

About the second part: do you consider incremental MA and incremental series?

And generally speaking, do you mean that the trained network should be insensitive to input data (to each individual input), or just change input data and the network should still make predictions?

Yes, dependence on data distribution affects the speed and quality of learning. That is, normalisation essentially affects speed and quality, respectively. About the second part, no of course, you can't shove completely different data into a trained neural network on the same data, but just the same, describing the distribution accurately enough. The distribution, the type of data should always remain the same. But if you train a network with one type of data describing exactly the process, and you get very different results using other data in the new trained network describing the process just as exactly, then this means that you have probably put the wrong question to the neural network. The data must firstly be fully describing the process, and secondly you must reduce such a generalisation error that is adequate in terms of requiring the network to generalise qualitatively. All of this happens literally on an intuitive level. It's no use fiddling with the data type if the process is fully described, but it's worth asking the right question of the network.

 

A word or two about preprocessing.

Source sample: Output correlates with input (the most significant correlation). corr=0.64.

On the graph: Coordinate X - input data. Y - Required output

Remove the linear relationship. You don't need a network to find the linear dependence, and besides it will worsen the results of the neural network.

This is what the decorrelated data looks like.

You could also see from the first graph that the density of data points is concentrated in the centre, and sparse along the edges.

So, points concentrated in the centre will give the main stimulus for network training, or rather their error values will exceed the error values of the data on the edges. The network will first find the expectation of sampling, which is just in the center, and then will distribute around it observing the minimum error condition.

Therefore the frequency distribution function is equalized, it levels out the importance of the error and the network has a definite incentive to achieve the smallest error in the centre of distribution as well as at the edges of the data distribution.

With the sigmoidal function, the input and output data are almost evenly distributed.

This is how the transformed data looks like. It is on this data that the network learns.

So, the data cloud is evenly distributed. But it is worth to say that there are some nuances that do not allow to call such preprocessing optimal for the network.

It is also worth noting that all conversions are reversible and do not introduce inaccuracy.

All methods (in principle) have been discussed in this thread.

 
StatBars писал(а) >>

Data distribution function after conversion with sigmoidal function, input data and output data are distributed almost evenly.

StatBars, is this procedure automated, or do you have to do it manually - to adjust the coefficients of the sigmoid function?

 
Neutron писал(а) >>

StatBars, is this procedure automated, or do you have to do it manually - adjust the coefficients of the sigmoid function?

The coefficients have to be adjusted, so far... But I plan to automate... The idea is that if the approximating function is selected correctly, it will be a rectangle.

I automatized alignment only with area distribution function, but there are so many "slippery" moments that I had to renounce it...

 

Yes - I have the same thing.

I need to ask Prival how to get the desired distribution (rectangular) from an arbitrary distribution in analytical form.

And, why do you use sigmoid as FA and not hyperbolic tangent? The advantages are obvious...

 
Neutron писал(а) >>

And, why are you using sigmoid as an FA rather than a hyperbolic tangent? The advantages are obvious...

And the advantages could be more detailed.

 
Yes, a neuron activated by a symmetric function learns twice as fast. Besides, during the learning process, some of the weights take values close to zero, which turns them off, i.e. the effective number of "working" synapses in a neuron with sigmoidal FA is always smaller than in a hyperbolic one. This is not good, because you still have to drag "dead" synapses back and forth.
 
Neutron писал(а) >>
Yes, a neuron being activated by a symmetric function learns twice as fast. Moreover, in the process of learning, some of the weights take values close to zero, i.e. the effective number of "working" synapses of the neuron with sigmoidal FA is always less than that of hyperbolic. This is not good, because you still have to drag "dead" synapses back and forth.

Just a simple conversion allows you to get a value from -1 to 1 in sigmoid as well. There's nothing complicated about it.

 

Yeah, who can argue with that?

It's just that it's a matter of "trousers on, pants off".

 
Neutron писал(а) >>

That's right - I have the same thing.

I need to ask Prival how to get the desired distribution (rectangular) from an arbitrary distribution in analytical form.

And, why do you use sigmoid as FA and not hyperbolic tangent? The advantages are on the surface, after all...

I use just the hyperbolic tangent.