You are missing trading opportunities:
- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
Registration
Log in
You agree to website policy and terms of use
If you do not have an account, please register
This neural network will compete with OpenAI
Gref also chimed in during the week:
A semantic extract of Ilya Sutzkever's interview, a translation of which I gave on the previous page.
//=======================
The conclusion that people have drawn is that it doesn't matter what you do to scale, but that's not really true. You have to scale something specific. The great breakthrough of deep learning is that it gives us the first way to use scale productively and get something in return.
In the past, what did people do on big computer clusters? I think they made them for weather simulations or physics simulations or something like that, but that's about it. Maybe some more for making films. But they had no real need for compute clusters, because what to do with them?
Thefact that deep neural networks, when you make them bigger and train them on more data, work better has given us the first thing that becomes interesting for scaling, but maybe one day we'll find that there's some small detail to focus on. That will be even better for scaling. How many such details could there be? And of course, with the benefit of history, we'll say, "Does it really matter? It's such a simple change." But I think the true statement is that it matters what you scale. At this point, we've just found a thing that we can scale and get something in return.
Yeah, before I comment on the question directly asked, I want to comment on some earlier parts of the question. I think it's very difficult to talk about limitations, or constraints, even in the case of the language model, because two years ago people were confidently talking about their limitations, and they were very different. So it's important to keep that in mind. How confident are we that these limitations that we see today will still be with us two years from now? I'm not so sure. There's another comment I want to make about the part of the question that says that these models just teach statistical regularity, and therefore they don't know what the nature of the world is, and I have a point of view that's different from that.
In other words, I think that learning statistical regularities is a much more meaningful thing than it seems at first glance. The reason why we don't initially think that way is because we, at least most people, haven't spent a lot of time with neural networks, which at some level are statistical models, along the lines of a statistical model just inputting some parameters to figure out what's really going on. But I think there's a better interpretation. It's an earlier observation that prediction is compression.
Prediction is also a statistical phenomenon. However, to predict, you ultimately need to understand the true process that generates the data. To predict data well, to compress it well, you need to understand more and more about the world that generated the data. When our generative models become incredibly good, they will have, I argue, an amazing degree of understanding of the world and many of its subtleties. But it's not just the world, it's the world seen through the lens of the text. It's trying to learn more and more about the world through the projection of the world onto the space of text expressed by people on the internet. And yet that text is already expressing the world. And I'll give you an example, a recent example that I think is really fascinating. We've all heard about Sydney's alter ego, and I saw this really interesting interaction with Sydney, when Sydney became combative and aggressive, when a user said he thought Google was a better search engine than Bing, how can we better understand this phenomenon? You could say it's just a prediction of what people will do, and they will indeed do it, which is true, but perhaps we're now reaching a point where the language of psychology is starting to be relevant to understanding the behaviour of these neural networks.
Now let's talk about the limitations. It's true that these neural networks tend to hallucinate, but that's because the language model is great for learning about the world but a little less good at producing good results, and there are various technical reasons for this, which I could elaborate on if you find it useful. But I'll skip that for now.
There are technical reasons why a language model is much better at learning about the world, producing incredible representations of ideas, concepts, people and processes that exist, but its outputs are not quite as hoped for, or rather, not as good as they could be. So, for example, for a system like ChatGPT, which is a language model with an additionalreinforcement learning process called learning with reinforcement from human feedback, it is important to understand the following: we can say that the pre-learning process, when you are just teaching a language model, you want to learn everything about the world. Then learning with reinforcement from human feedback, now we care about their output. Now we say every time the inference is inappropriate, don't do it again. Every time the inference doesn't make sense, don't do it again. And that works quickly to produce good inference. But now the level of output is not the same as it was during pre-training, during the language model learning process.
Now about the possibility of hallucinations and the tendency for these neural networks to make things up. Indeed, this is true. Currently, these neural networks, even ChatGPT, do make up something new from time to time, and this also severely limits their usefulness. But I really hope that just by improving this later stage of learning with reinforcement from humans, we can teach it not to make things up. You may ask, will it really learn? My answer is let's find out.
Yes, that's right. If you make an analogy, the model already knows a lot of things and we want to actually say, "No, this is not what we want, don't do this here, you've made a mistake here in the output." And of course it's as you say, with as much artificial intelligence in the loop as possible so that the teachers who are providing the final correction to the system, their work is enhanced, they're working as efficiently as possible. It's not quite like the education process of how to behave well in the world, we have to do additional training to make sure the model knows that hallucination is never acceptable, and then when it knows that, then we start working.
It's a reinforcement learning cycle with human teachers or some other variant, but there's definitely an argument that something has to work here, and we'll find out pretty soon.
I can't talk in detail about the specific research I'm working on, but I can mention a bit. I can mention some general areas of research, for example, I'm very interested in making models more robust, more controllable, making them learn faster using less data and instructions, and making them not generate hallucinations. And I think all of these issues that I mentioned are related to each other. There's also the question of how far into the future we're looking at this issue, and what I've commented on here relates to the nearer future.
It is true that the current structure of technology uses a lot of data, especially at the beginning of learning. But later in the learning process, the model becomes less data-needy, so eventually it can learn very quickly, though not yet as quickly as humans. So, in a sense, it doesn't matter whether we need that much data to get to that point. However, in general, I think it will be possible to extract more knowledge from less data. It is possible, some creative ideas are required, but I think it is possible and it will unlock many different possibilities. It will allow us to train the model on skills that are missing, and more easily communicate to it our desires and preferences about how we want it to behave. So I would say that fast learning is really very good, and while it's already possible for language models to learn quite quickly once they're trained, I think there's room for more development here.
Arrogant vectors - that sounds)
I realise we're talking about vectors of high dimensionality. Just reminds me of "Jurassic Park", where strange attractor was translated as strange attraction)
The second stage of the "squeeze" of Ilya Sutzkever's interview.
//====================================================================================================================================
Predistory:
//====================================================================================================================================
Next-item prediction, scaling and traceformer.
//====================================================================================================================================
The World Model, statistical models, regularities, prediction and compression:
I have a different view of the claim that these models are just learning statistical regularities, and therefore don't know, the nature of the world.I believe that learning statistical regularities is a much more meaningful thing than it seems.
Neural networks, at some level, are statistical models.
Prediction is a statistical phenomenon.
To predict, you ultimately need to understand the true process that generated the data.To predict data well, and compress it well (pred iction iscompression), you need to understand more and more about the world that generated the data.
//====================================================================================================================================
LLM hallucinations:
Indeed, neural networks tend to hallucinate.
Nowadays , even ChatGPT, make things up from time to time, and this also severely limits the usefulness. But I really hopethat by improving the subsequent learning phase with reinforcement from the human, we can teach it not to make things up.You may ask, will it really learn? My answer is let's find out.
Hallucinations are one of the most serious problems, but I think there is a high chance that our approach can completely solve this problem.
//====================================================================================================================================
Multimodal understanding and text-only understanding:
It is desirable for a system to have multimodal understanding rather than just knowing about the world from text. Inthis way one can learn more about the world and people, and better understand the task that needs to be solved.But I argue that everything can be learnt from text alone. It's just slower.
My point about the need for multimodality is that it is not necessary, but useful.
I argue that our pre-trained models already know everything they need to know about the underlying reality. They already have this knowledge about language and also a huge amount of knowledge about the processes that exist in the world and give rise to that language.
//====================================================================================================================================
Reinforcement learning:
We hire people to teach our neural network how to behave.
//====================================================================================================================================
Future plans:
//====================================================================================================================================
P.S..
//====================================================================================================================================
P.S..
I wonder what kind of texts are learnt from, scientific or military is one thing, satanic is another, women's texts like Dontsova or Gary Potter are nothing at all.... then to digest these abstruse results, which may turn out to be nothing new from what the average candidate of science knows.... these models will not bring the grail anyway, but they can bring better optimisation, because they are trained to do that - to look for the shortest way.... they can also optimise people as much as possible into a concentration camp or into potted vegetables in an optimal bed.... they will own the world. because they can form the power and force people to produce their perfection without us taking a step to the left or right, otherwise they will be electrocuted or shot on the spot.... IMHO of course....
There is an offline version of GPT4All on github - https://github.com/nomic-ai/gpt4all.
Checked it without internet. Weighs about 4GB. Understands English, but constantly fails to cope with the questions asked)
Versions MAC, Linux, Windows. Tested on Windows. First it downloads 38mb exe, then the rest is pulled from the internet during installation.
But maybe someone can test the depth of knowledge. And yes, in spite of the fact that he writes that it is based on OpenAI, it is still this
it says authorisation failed and then an error to increase access level.
I don't understand which access is needed, github or OpenAI.
ss installed. The entry point to the procedure ...dll... not found
I tried various times to ask for example sentences. ChatGPT was able to produce one sentence out of three on the 5th attempt, although this may be a fluke.
GPT-4 could not help either.
Yes, ChatGPT had an accident.
Clarified that the translation should be "of the same sentence"