Market etiquette or good manners in a minefield - page 11

 
Neutron >> :

That's right.

But, the input of each perseptron has a separate additional input for a constant +1 offset. This speeds up learning and increases the power of the Grid.

Or, if you don't count the input layer, then with everything said:



If there are any mistakes, please correct.

 
Neutron >> :

That's right.

But, the input of each perseptron has a separate additional input for a constant +1 offset. This speeds up learning and increases the power of the Network.

Is it like a gimmick that replaces the neuron threshold without increasing the number of configurable parameters? Cool, first time I've seen it but I like it :)

 

And where is the constant input bias of each neuron?

paralocus писал(а) >>

As far as I have been able to understand you, the figure shows the optimal NS architecture for the market.

This is my understanding. Perhaps it is not true. But the results of numerical experiments confirm this statement.

The number of inputs is 12 and the number of synapses is 4 so by the formula Popt=k*w*w/d we get 144/4 = 36... Is this 36 bars? Or 36 near Buy/Sell situations? Have I got it right?

Consider carefully: Number of all synapses in your architecture: w=12+4=16.

Number of inputs: d=3 (not 4*3, but only 3).

The optimal length of training sample: Popt=k*w*w/d =4*16*16/3=340 samples per time series (you have 4 samples at each input neuron). They can be bars or indicator values, or they can be transactional samples and it's up to you to decide which is better to increase predictability... Remember that predictability is part of the MTS rate of return to the 4th degree! A very strong correlation (see this topic at the beginning).

 
paralocus писал(а) >>

Or, if you don't count the input layer, take into account everything that is said:

If there are mistakes please correct.

I don't get it!

Why are you not counting the input layer? Doesn't it take part in learning and prediction?

It's best to have two layers - a hidden layer (aka input layer) and an output layer. With this architecture, you have w=4*4+5=21, d=4 and P=4*21*21/4=440 counts.

 
TheXpert писал(а) >>

Is it kind of a gimmick that replaces the neuron threshold without increasing the number of parameters that can be adjusted?

FION wrote >>

I see. The constant offset simply shifts the activation point on the hypertangent curve slightly .

Generally correct, but to be precise, when another batch of data arrives at the NS input, we imply that it is not centered (MO!=0). That's why we introduce an additional constant input at each neuron. In the process of training a particular neuron selects a value of weight at this input so that it compensates a possible shift of its input data. This allows for statistically faster (from the centre of the imaginary cloud) learning.

 
Neutron >> :

And where is the constant bias at the input of each neuron?

That's what I think. This may not be true. But the results of numerical experiments confirm this statement.

Let's count carefully: Number of all synapses in your architecture: w=12+4=16

Number of inputs: d=3 (not 4*3, but only 3).

The optimal length of training sample: Popt=k*w*w/d =4*16*16/3=340 samples per time series (you have 4 samples at each input neuron). They can be bars or indicator values, or they can be transactional samples and it's up to you to decide which is better to increase predictability... Remember that predictability is part of the MTS rate of return to the 4th degree! A very strong correlation (see this thread at the beginning).

Popt=k*w*w/d, where k is a dimensionless constant of order 1 and accounts for the fact that the market is volatile.

Then in this formula d is number of inputs of one neuron of hidden layer, and k is number of neurons in hidden layer? Sorry, I somehow find it hard to believe that the network can learn on 340 bars. It's very small... I must have misunderstood something.

Until now I was only familiar with the simplest perceptron that is "trained" in the tester of the MT4 terminal with a genetic algorithm. You have to check at least some significant (2 or 3 months) history. Of course, I understand that the geneticist doesn't actually teach the perceptron anything, he just chooses the most suitable coefficients and works with very low effectiveness, since he is acting blindly. Well, never mind. That was a lyrical digression.


Did I understand correctly that the single inputs should also have their own weighting factors? And how can I "whitelist" the inputs? That is, suppose I have RSI normalized by hypertangent with expectation as high as 0.21 on the input. If I do the following: f(t) = th(RSI(i)*kf), where kf > 1 is a specially selected coefficient leveling the probability density function at the price of some distortion of the input signal, will it be ok or not?

What are transactional counts?

 
Neutron >> :


By the way, for everyone interested: the strategy - "lock in losses and let profits grow" or "lock in profits and let losses grow" (depending on whether the market is trending or flat, on the chosen trading horizon) - is not optimal when reinvesting capital. In this case, it is more profitable to fix on each step with reinvestment! I.e. if we have 10 continuous profitable transactions, then it is more profitable to pay commission to brokerage companies and reinvest them, than to keep one position all the time and save on spread.

Such a paradox, which may lead us to the transactions bernulling and after that - to the effective usage of the basic equation of trading in the analytical form (unlike Vince) without any problems with parameterization.

This is not so much a paradox as it is a property of MM with reinvestment. The efficiency of this MM depends on the number of trades, among other things. The profitability of this MM is the geometric average in degree of the number of trades. With a small number of trades the profitability loses to a simple MM, but if we manage to survive with a large number of trades (play long) then the return can be larger. But as always, nothing is given for free. The price to pay is asymmetric leverage and its consequence - a long period of low income compared to a simple MM.

 
paralocus писал(а) >>

Popt=k*w*w/d, where k is a dimensionless constant of order 1 and accounts for the fact of market variability.

Then in this formula d is the number of inputs of one neuron of the hidden layer, and k is the number of neurons in the hidden layer? Sorry, I somehow find it hard to believe that the network can learn on 340 bars. It's very small... I must have misunderstood something.

Until now I was only familiar with the simplest perceptron that is "trained" in the tester of the MT4 terminal with a genetic algorithm. You have to check at least some significant (2 or 3 months) history. Of course, I understand that the geneticist doesn't actually teach the perceptron anything, he just chooses the most suitable coefficients and works with very low effectiveness, since he is acting blindly. Well, never mind. That was a lyrical digression.

Did I get it right that single inputs should also have their own weighting coefficients? And how can we "whitelist" the inputs? Suppose I have a hypertangent normalized RSI with expectation as high as 0.21 on the input. If I do the following: f(t) = th(RSI(i)*kf), where kf > 1 is a specially selected coefficient leveling the probability density function at the price of some distortion of the input signal, will it be ok or not?

What are transient counts?

Paralocus, are you afraid of making a mistake? Drop it! - Try it this way and that way, and see the result - everything will fall into place.

k is not the number of neuron inputs, but an empirical characteristic of the Market - its variability and is chosen in the range of 2 to 4. If the Market were stationary, then k could be taken both 10 and 20, which would mean going to the asymptotics on the learning process of the Network. Unfortunately, the Market can be called stationary only in its non-stationarity, so the coefficient should be taken as minimum as possible in the process of retraining of NS. Hence we get the range for k mentioned above .

Your geneticist is a kind of stochastic method of learning with elements of gradient descent (if I'm not mistaken). Not a bad thing, but loses out in terms of learning speed to ORO. Abandon geneticist in favour of back propagation of error - learning will be more efficient and there is no limit to the number of inputs and synapses of the Network.

Single inputs have their coefficients, which are trained as normal and do not differ in properties from other inputs.

Bleaching of inputs, is an elimination of correlation dependencies between them. To use this procedure, first convince yourself of this very correlation.

A transaction, is the act of buying or selling an asset in the market, i.e. a transaction, a bribe (not in the criminal sense:-)

 
Neutron >> :

Do away with genetics in favour of back propagation of error - learning will be more efficient and there is no limit to the number of inputs and synapses of the Network.


That's it, I've given up. Sat down to write the grid with ORO. There may be some questions about the ETA itself.

 
paralocus писал(а) >>

There may be some questions about the ETA itself.

No problem!

By the way, let's take a closer look at the architecture of your Network.

You have a committee of three independent bilayer networks connected by an output neuron (so committee). Each Grid in your committee contains just one neuron in its input, which is wrong, because such an architecture doesn't differ from a single-layer perseptron in its computational power. That's why you have three inputs (4 including bias) instead of 12. Once again: you have created a board of directors analog, where the Chairman by general voting (output neuron) chooses the "correct" answer, and each of the voters is represented by a single neuron. Such an architecture would not provide a trading advantage. Right, provide at least two input neurons for each committee member, it will allow to fully exploit FA's nullity property and noticeably increase predictive power of the committee.

You see how much the AI and us have in common... In fact, voting at a Komsomol meeting is nothing more than the optimal scheme of collective behavior in terms of the fastest achievement of the goal at the lowest cost!

Note that the output of the committee has no non-linear activation function, it is simply an adder and its function is to make a decision based on the voting results. Thus, this architecture is closest to your idea and is a committee of bilayer nonlinear networks with one hidden layer. The number of neurons in the hidden layer may be increased increasing prediction accuracy, but we must remember that the training sample length increases quadratically and very soon comes to a point where the efficiency of incremental increase decreases and even leads to degradation of the forecasting capabilities of the network. From my numerical experiments the optimum is no more than 2-4 nerons in the hidden layer.

For given architecture, optimal training sample length P=1500 samples.

P.S. It looks nice. I mean the picture. I get esthetic pleasure!