Machine learning in trading: theory, models, practice and algo-trading - page 1980
You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
And how is memory organized?
depends on where
if you understand it right away, then I'm waiting for clarification :)
http://peterbloem.nl/blog/transformers
Hi all, I didn't post the video directly to the forum thread, but posted it on my blog. WARNING non-normative language for those who are really interested in the market...
https://www.mql5.com/ru/blogs/post/739164
Hi all, I didn't post the video directly to the forum thread, but posted it on my blog. WARNING non-normative language for those who are really interested in the market...
https://www.mql5.com/ru/blogs/post/739164
I only roamed in the woods before, I didn't use the HH-ki.....
Yeah, me neither... That's why I'm talking about the block diagram, so that at least at the level of images to understand how it works
Yeah, me too... That's why I'm talking about the block diagram, so that at least at the level of images to understand what works
I spent two days trying to figure out what a cohonen layer is
and it turned out to be just a primitive autoencoder
Vladimir wrote about them in articlesdepends on where
if you understand it, I'm waiting for clarification :)
http://peterbloem.nl/blog/transformers
What I can't create, I don't understand, that's what Feynman said.
Multiplication is better than addition, the sign is taken into account. Generally the works of say argument and result is something) a single accounting function.
Queries, keys and values are not quite clear how organized.
The main difference is pseudo-parallel processing and access to trained data and the scalar product of input and output result vectors called self-awareness. And then the matrix of these scalar products is used in training. And it's not weights.
I didn't find about the short long memory in the article.
In general, additional matrices correcting the result are created.
I don't pretend to understand it correctly ))))
What I can't create, I don't understand, that's what Feynman said.
Multiplication is better than addition, the sign is taken into account. Generally the works of say argument and result is something) a single accounting function.
Queries, keys and values are not quite clear how organized.
The main difference is pseudo-parallel processing and access to trained data and the scalar product of input and output result vectors called self-awareness. And then the matrix of these scalar products is used in training. And it's not weights.
I didn't find about the short long memory in the article.
In general, additional matrices correcting the result are created.
I don't pretend to understand it correctly))))
it's another algorithm (like the coolest now), there's no definition of long and short memory in it like in lstm
the long and short is just to see how an lstm cell works
I spent two days trying to figure out what a cohonen layer (VQ) is
but it turns out it's just a primitive autoencoder
Vladimir wrote about them in articlesVladimir wrote about VQ ? or just ?
What about memory ? how does it work there ? is it permanent or does it work in the window (like an indicator), is it static or is it retrained?
is it possible to do something similar to scaffolding ?
I have a million questions)))
this is a different algorithm (like the coolest now), there are no definitions of long and short memory as in lstm, like
about long and short is just to see how an lstm cell works
Ahh. Well then there's self-awareness and resource in times as understood. In general, scaling the network architecture simply improves its performance to some limit. Here, as understood the complexity of the network by combining different logics of networks and then scaling it))). And consequently
The bottleneck in learning transformers is the matrix of scalar products of self-awareness. For the sequence length t , it is a dense matrix containing t squared elements. At standard 32-bit precision and with t= 1000 a batch of 16 such matrices takes up about 250MB of memory. Since we need at least four of them (before and after softmax, plus their gradients) for a single self-imaging operation, this limits us to a maximum of twelve layers in a standard GPU and BELOW 12GB.
You have to do a lot of studying and thinking before you understand...
you might have to buy brain vitamins, drink less)
I haven't figured it out yet.) But it's not as hard as it sounds.
So we're back to the usual flowchart again, you have to make it up first so you have an image-level understanding...
like -
first the classifier(it does this and that)
then we connect the classifier to the output (it does this and that)
then count something ( it does this and that )
Output is connected to the casterizer again
etc...
If you just read some complicated nonsense where you don't even know the terms, what will you get?
So, you need to understand the basic principle of the algorithm, and to understand it at the level of the block scheme as I pointed out. Then you will understand what is what and what is what, and when you understand it you will understand what and how you can improve it.