Neural networks. Questions from the experts. - page 19

 

A "small" problem has arisen.

If we feed the same training examples (TS) to the same network, but initialize with random weights before training,

then each time the network may train differently, with different results on the same test sample.

FANN library is used.

Decided to check how Statistics 6 would behave in a similar situation?

And here networks with the same architecture give different results.

For illustration I chose two networks with maximally different results. You can see that their performance is exactly the opposite.


Graphs of activation thresholds confirm, that these two networks on the same OP were trained completely differently.

(Full results for all networks and data for all samples are attached)


...............................................

Any suggestions on how to achieve stability in the neural network learning process?

 
lasso:

A "small" problem has emerged.....

the network may train differently each time, with different results on the same test sample....

Can you tell me how to achieve stability in neural network training process?


So it is a question of questions) Many methods are used for training NS, but all of them are understandably different from direct brute force. And all of them have one common essential drawback: paralysis or getting stuck in a local extremum. There is no universal solution, except for increasing the quality of the mechanism/algorithm of learning and increasing the learning time (the number of epochs of learning). And in each case it is solved differently.
 
lasso:

Can you tell me how to achieve stability in the learning process of a neural network?

Use GAs.
 
lasso:


Can you tell me how to achieve stability in the learning process of a neural network?


SVM.

Example for two classes:

Possible dividing planes.... MLP BP will find either one and stop .....

As for SVM:

This mod will always find one single dividing plane ....

Or GA as suggested above....

Good luck ....

 
Figar0:

So it's a matter of questions) Many methods are used for training NS, but all of them, for obvious reasons, are different from direct brute force. And they all have one common essential drawback: paralysis or getting stuck in a local extremum. There is no universal solution, except for increasing the quality of the mechanism/algorithm of learning and increasing the learning time (the number of epochs of learning). And in each case it is solved differently.

If it's about getting stuck in a local extremum, then I think in this case, the results should all be "good", and differ only in a certain range -- "better", "worse"...

But not drastically change the test results! Do you understand?

Here are the results of the runs on the 1 month test period:

-9337

+5060

+14522

+7325

+12724

-3475

+10924

-9337

+5060

-3475

-9337

-3475

................................

Here the foreign comrades advise applying the network committees

but I don't think this is the best solution...

Especially, let me remind you, the OP data proposed in the problem, is quite easily separable by linear methods,

And is it impossible to find a simple and stable solution in the form of NS.

 

I don't understand about the GA, what does it apply to the search for?

...............

That is, to apply GA not in addition to the NS, but instead of them?

 
lasso:

I don't understand about the GA, what does it apply to the search for?

...............

That is, to apply GA not in addition to the NS, but instead of them?


GAs can pick up NS weights, and anything can be used as a fitness function... You can look for EMA GA as far as I remember...
But to be honest I don't understand how this GA will help you, it can also stop at different points... just like the NS...

And in general, honestly, it's a normal phenomenon, as long as they don't differ too much...

 
lasso:

If it's about getting stuck in a local extremum, then I think in this case, the results should all be "good", and differ only in a certain range -- "better", "worse"...

But not drastically change the test results! Understand?


Here, it is likely that the network is not over- but under-trained. The reason for this, apparently, is not a good enough architecture.

Although there may be overtraining - if the network structure is excessively redundant and the initial initialization is random, then the mesh may get stuck at different extremes each time, hence the strong difference in results

 
alsu:

Here it is likely that the network is not over- but under-trained. The reason seems to be a lack of quality architecture.

Although there may be overtraining - if the network structure is excessively redundant and the initial initialization is random, then the mesh may get stuck at different extremes each time, hence the strong difference in results

What data or results do you need to provide so that you can concretely determine what the snag is?
 

One more thing. I am alarmed by the "narrowness" of the range of current network outputs. To clarify:

-- the MLP network is 1-7-1

-- The network outputs are evenly distributed in the range [0;1], outputs in training examples are represented by the values 1 and -1.

If after training the whole range of input values is passed through the network, we see that the outputs of the network lie in a very narrow range. For example:

opt_max_act=-0.50401336 opt_min_act=-0.50973881 step=0.0000286272901034

or even so

opt_max_real=-0.99997914 opt_min_real=-0.99999908 step=0.00000010

.............................

Is this correct or not?