Neural networks. Questions from the experts. - page 21

 
alsu:

4) Initialisation parameters for weights - the form of distribution of values and their variance.

Found. It is possible to set the initialisation parameters for the weights.

It looks like I used the default values.


 
LeoV:
Artificial Neural Networks and Genetic Algorithms short advanced training course
and genetic algorithms"

Applications for short-term advanced training courses "Artificial Neural Networks and Genetic Algorithms" are closed
Artificial Neural Networks and Genetic Algorithms",
conducted by the Department of Continuing Education of the M.V. Lomonosov Moscow State University
V.V. Lomonosov Moscow State University on the basis of the Moscow State University Nuclear Research Institute
Moscow State University. Completers of the courses receive a state certificate of advanced professional training.
state certificate of advanced training.
Trainings will be held twice a week in the evening from 19:00.
Classes start February 25, 2011.

To learn more about the course programme, get more information and
Please click here to apply for the course:
http://www.neuroproject.ru/kpk.php
I am sorry, is this an advertisement or will you be taking the course yourself?
 
lasso:

Found. It is possible to set the initialisation parameters for the scales.

I seem to have used the default values.


Well, it seems to be clear what the issue is here.

Starting from your problem (as I understand you have one-dimensional data which you want to split into two classes), the point is to find a point (just one!) on the set of input values, which optimally performs the specified splitting. Suppose you have a 1-7-1 network. This network has 21 weights (seven inputs for the hidden layer, seven offsets for it, and seven inputs for the output neuron). It turns out that we are trying to find one point, picking up 21 variables. At such a redundancy of twenty one to one there is nothing surprising that the network readings shift so much - any smallest difference in initial weights leads to a significant scatter of outputs after training. Roughly speaking, it turns out that the task is too simple for the network, but since it does not know about it, it looks for something that does not exist. Technically, this can probably be called retraining, but in essence it is shooting at sparrows with a cannon.

Strictly speaking, the task of splitting one-dimensional data into two classes is successfully performed by a single neuron with one input weight and one bias.

 
lasso:

One more thing. I am alarmed by the "narrowness" of the range of current network outputs. To clarify:

-- the MLP network is 1-7-1

-- Inputs to the network are evenly distributed in the range [0;1], outputs in the training examples are 1 and -1.

If after training the whole range of input values is passed through the network, we see that the outputs of the network lie in a very narrow range. For example:

opt_max_act=-0.50401336 opt_min_act=-0.50973881 step=0.0000286272901034

or even so

opt_max_real=-0.99997914 opt_min_real=-0.99999908 step=0.00000010

.............................

Is this correct or not?


It's hard to say about correctness... depends on the situation

According to your example:

This case means that the first network on all inputs says "don't know" and the second one on the same inputs says "class -1". If the data is the same and the difference is only in initialization of weights, most likely, the case is a strong mixing of classes, due to which the grids can not intelligently understand the learning pattern and as a result act "at random". Speaking about how this can happen, I assume that if the network (probably so) uses biased neurons, the grid simply zeroes out the weights of all information inputs and leaves only bias for analysis. "Analysis", of course, is only nominal; the network operates according to the ostrich principle - it simply does not see the inputs. In order to confirm or deny this, we need to see the matrices of the trained network.

 

Here is the MLP NS code generated by "Statistics":

/* ------------------------------------------------------------------------- */


#include <stdio.h>
#include <math.h>
#include <string.h>
#include <stdlib.h>

#ifndef FALSE
#define FALSE 0
#define TRUE 1
#endif

#define MENUCODE -999


static double NNCode38Thresholds[] =
{

/* layer 1 */
-0.78576109762088242, -0.23216582173469763, -1.6708808507320108, -1.525614113040888,
1.4153558659332133, -0.77276960668316319, 2.3600992937381298, 2.473963708568014,
-0.43422405325901231, 0.68546943611132893, 0.19836417975077064, 0.26461366779934564,
-0.19131682804149783, 0.24687125804149584, -0.95588612620053504, 0.25329560565058901,
-1.0054817062488075, 1.3224622867600988, 0.88115523574528376, 0.32309684489223067,
0.52538428519764313,

/* layer 2 */
-1.8292886608617505

};

static double NNCode38Weights[] =
{

/* layer 1 */
1.8660729426318707,
1.3727568288578245,
3.1175074758006374,
3.356836518157698,
3.2574311486418068,
3.2774957848884769,
1.4284147042568165,
3.534875314491805,
2.4874577673065557,
2.1516346524000403,
1.9692127720516106,
4.3440737376517129,
2.7850179803408932,
-12.654434243399631,
2.4850018642785399,
2.1683631515554227,
1.77850226182071,
2.1342779960924272,
2.8753050022428206,
3.9464397902669828,
2.5227540467556553,

/* layer 2 */
-0.041641949353302246, -0.099151657230575702, 0.19915689162090328, -0.48586373846026099,
-0.091916813099494746, -0.16863091580772138, -0.11592356639654273, -0.55874391921850786,
0.12335845466035589, -0.022300206392803789, -0.083342117374385544, 1.550222748978116,
0.10305706982775611, 3.9280003726494575, 0.12771097131123971, -0.12144621860368633,
-0.40427171889553365, -0.072652508364580259, 0.20641498115269669, 0.1519896468808962,
0.69632055946019444

};

static double NNCode38Acts[46];

/* ---------------------------------------------------------- */
/*
  NNCode38Run - run neural network NNCode38

  Input and Output variables.
  Variable names are listed below in order, together with each
  variable's offset in the data set at the time code was
  generated (if the variable is then available).
  For nominal variables, the numeric code - class name
  conversion is shown indented below the variable name.
  To provide nominal inputs, use the corresponding numeric code.
  Input variables (Offset):
  stoch

  Выход:
  res
    1=1
    2=-1

*/
/* ---------------------------------------------------------- */

void NNCode38Run( double inputs[], double outputs[], int outputType )
{
  int i, j, k, u;
  double *w = NNCode38Weights, *t = NNCode38Thresholds;

  /* Process inputs - apply pre-processing to each input in turn,
   * storing results in the neuron activations array.
   */

  /* Input 0: standard numeric pre-processing: linear shift and scale. */
  if ( inputs[0] == -9999 )
    NNCode38Acts[0] = 0.48882189239332069;
  else
    NNCode38Acts[0] = inputs[0] * 1.0204081632653061 + 0;

  /*
   * Process layer 1.
   */

  /* For each unit in turn */
  for ( u=0; u < 21; ++u )
  {
    /*
     * First, calculate post-synaptic potentials, storing
     * these in the NNCode38Acts array.
     */

    /* Initialise hidden unit activation to zero */
    NNCode38Acts[1+u] = 0.0;

    /* Accumulate weighted sum from inputs */
    for ( i=0; i < 1; ++i )
      NNCode38Acts[1+u] += *w++ * NNCode38Acts[0+i];

    /* Subtract threshold */
    NNCode38Acts[1+u] -= *t++;

    /* Now apply the logistic activation function, 1 / ( 1 + e^-x ).
     * Deal with overflow and underflow
     */
    if ( NNCode38Acts[1+u] > 100.0 )
       NNCode38Acts[1+u] = 1.0;
    else if ( NNCode38Acts[1+u] < -100.0 )
      NNCode38Acts[1+u] = 0.0;
    else
      NNCode38Acts[1+u] = 1.0 / ( 1.0 + exp( - NNCode38Acts[1+u] ) );
  }

  /*
   * Process layer 2.
   */

  /* For each unit in turn */
  for ( u=0; u < 1; ++u )
  {
    /*
     * First, calculate post-synaptic potentials, storing
     * these in the NNCode38Acts array.
     */

    /* Initialise hidden unit activation to zero */
    NNCode38Acts[22+u] = 0.0;

    /* Accumulate weighted sum from inputs */
    for ( i=0; i < 21; ++i )
      NNCode38Acts[22+u] += *w++ * NNCode38Acts[1+i];

    /* Subtract threshold */
    NNCode38Acts[22+u] -= *t++;

    /* Now calculate negative exponential of PSP
     */
    if ( NNCode38Acts[22+u] > 100.0 )
       NNCode38Acts[22+u] = 0.0;
    else
      NNCode38Acts[22+u] = exp( -NNCode38Acts[22+u] );
  }

  /* Type of output required - selected by outputType parameter */
  switch ( outputType )
  {
    /* The usual type is to generate the output variables */
    case 0:


      /* Post-process output 0, two-state nominal output */
      if ( NNCode38Acts[22] >= 0.05449452669633785 )
        outputs[0] = 2.0;
      else
        outputs[0] = 1.0;
      break;

    /* type 1 is activation of output neurons */
    case 1:
      for ( i=0; i < 1; ++i )
        outputs[i] = NNCode38Acts[22+i];
      break;

    /* type 2 is codebook vector of winning node (lowest actn) 1st hidden layer */
    case 2:
      {
        int winner=0;
        for ( i=1; i < 21; ++i )
          if ( NNCode38Acts[1+i] < NNCode38Acts[1+winner] )
            winner=i;

        for ( i=0; i < 1; ++i )
          outputs[i] = NNCode38Weights[1*winner+i];
      }
      break;

    /* type 3 indicates winning node (lowest actn) in 1st hidden layer */
    case 3:
      {
        int winner=0;
        for ( i=1; i < 21; ++i )
          if ( NNCode38Acts[1+i] < NNCode38Acts[1+winner] )
            winner=i;

        outputs[0] = winner;
      }
      break;
  }
}
 
alsu:

It's hard to say about correctness... depends on the situation.

According to your example:

This case means that the first net on all inputs says "don't know" and the second one on the same inputs says "class -1". If the data is the same and the difference is only in initialization of weights, most likely, the case is a strong mixing of classes, due to which the grids can not intelligently understand the learning pattern and as a result act "at random". Speaking about how this can happen, I assume that if the network (probably so) uses biased neurons, the grid simply zeroes out the weights of all information inputs and leaves only bias for analysis. "Analysis", of course, is only nominal; the network operates according to the ostrich principle - it simply does not see the inputs. In order to confirm or deny this, we need to see the matrices of the trained network.

One more thing: FANN does apply bias-shift in every layer except the incoming one...

But I didn't find anything similar to bias in the description of the NN package of Statistics 6.

For a beginner in NS, all these biases are really mind boggling...

 

Yes, very similar to what I said, only in reverse. The network just gets lost in the data. Note - from the network architecture it follows that all weights of the first layer are equal with respect to input data and should, in theory, be evenly distributed around zero - but as you can see in the picture, they were driven up, which caused the hidden layer neurons to go into saturation (you have a logistic activation function). The activation thresholds did not help, because they remained around zero, as did the output neuron, which, as expected, did not understand anything that the first neuron told it - but we have already figured out what happened to it.


 

Great!!!

The values of the weights and thresholds in the form of a diagram.

And a very different perspective. Thank you.

 
lasso:

Here is the MLP NS code generated by "Statistics":

Good afternoon!

Could you suggest in a nutshell, is it possible to learn how to compile a dll file from a C file with a neural network generated by Statistica with the help of a programmer? I mean to have the procedure explained once, so that you can do it yourself afterwards according to the scheme. Just the level of programming in Basic at school, and NS model to work with Foreh, but I need to update the network regularly - read a new dll to generate it. In MQL code, correcting it every time is a bit complicated.

 
alexeymosc:

Good afternoon!

Is it possible to learn how to compile a dll file from a C file with a neural network generated by Statistica with the help of a programmer?

Goodnight!

I think not, the only exception is if that programmer doesn't work for Statistica itself ))

alexeymosc:

In MQL code, correcting it every time is a bit complicated.

What kind of NS do you use in Statistics?

If you correct something manually, it means there is an algorithm, so it needs to be automated....

...............................

Above I was recommended to use GA, and just today with the help of joo library(UGALib) I managed to get a desirable and stable result.

Now I will drag and drop this case to 4...

My deepest gratitude to Andrei (the author). Very promising and flexible direction.

.....................

May be it is worth digging in this direction?