AI 2023. Meet ChatGPT. - page 57

 
Реter Konow #:

Is this already Bing's answer?

Yes, this is a response from Bing.

 

Let's return to the ChatGPT testing.

I would like to remind you that we have performed the first stage of testing - the test for the breadth of knowledge. ChatGPT generated tables with the sections of fundamental, engineering and humanities sciences, a bit stunned us with a frightening huge amount of knowledge. We can conclude that the AI has a superhuman amount of information and there is no point in primitively catching it in ignorance. It knows everything, or almost everything, which is the same for an ordinary user.

The result of the first test crossed the planned second stage - the test for depth of knowledge. It became clear that for ChatGPT there is no difference between school and university programmes. It will indifferently tell about the multiplication table and quantum field theory, and will not strain in the slightest. Frankly, it confused me. For a while, I stopped understanding how to test this devil box that knows EVERYTHING. But my confusion was short-lived.

First of all, it became clear that ChatGPT does not count well. Of course it knows maths, but it solves only easy equations. Also, he repeatedly gave logical lapses and contradictions in judgements. It became clear that he knows elementary logic, but as with equations, he "fails" when the complexity is increased. However, it was not interesting to specially develop tests to reveal the exact level of mathematical and logical abilities. Clearly, this is the third to fifth grade of primary school.

The previous question remained unanswered: how to define the area of domestic and professional application of AI, and to assess its real commercial potential? And where will the boundary of possibilities be in a month, a year? Of course, it depends on the technology, and who among us here knows it well enough to say for sure? The product is not finished and is constantly evolving.

We can say that at the moment we are dealing with a half-finished product that is both good and bad at the same time. It is unfinished. No one can say what other opportunities the technology will offer at new stages of development, but right now it is like a fashionable suitcase without a handle. It both exceeds expectations and disappoints at the same time.


Nevertheless, we will continue testing and move on to benchmarking - comparative analysis based on benchmarks. We will compare ChatGPT with Wikipedia. The goal is to evaluate AI as a reliable source of complete and reliable information by comparing it to another, proven source of complete and reliable information. Let's figure out whether AI has the right to be used in responsible and important work.


In the following posts, I will outline the criteria and tests I have prepared.

 
- What is the correct stress on which syllable in the word "curd"?

- In the word "curd", the stress falls on the first syllable: "tvorog".



- Why?

- The stress in Russian words can fall on one of six syllables, and the correct stress is based on the stress of the root of the word and the endings, if any. In the word "curd" the root is the syllable "creat" and the ending "og" is unstressed, so the stress falls on the first syllable "tvOrog".



- Why is the root a syllable "creat"?

.
- The root of a word is the unchanging part of the word that contains the basic meaning of the word. In the word "curd" the root is the combination of letters "creat", which forms the base of the word and determines its lexical meaning - it is the name of a dairy product made by curdling milk. The ending "-og" in the word "curd" is formed with the help of a suffix, which changes the meaning of the word or its parts, and is not part of the root of the word.



- There is no ending and no suff ix in the word "curd".

- I apologise for the insufficiently accurate answer earlier. Indeed, the word "curd" has no ending and no suffix, and the root is the word "creat". Therefore, the stress in the word "curd" falls on the first syllable "tvorog". Thank you for pointing out the error.




TrollGPT
 

Western media says MS Bing is broken ...

Bing Chat Refuses To Talk To Me (Use This Instead) - YouTube

Bing Chat Refuses To Talk To Me (Use This Instead)
Bing Chat Refuses To Talk To Me (Use This Instead)
  • 2023.02.26
  • www.youtube.com
I tried to get Bing Chat to talk to me and it was having none of it. So I experimented with some other alternatives that are much better than Bing chat now.R...
 

About the purpose of benchmarking:

And so, Benchmarking, is running some kind of electronic or software system on a set of test cases (tasks), measuring the performance and comparing the results with benchmarks. I'm not sure if our ChatGPT testing can be called benchmarking, but we will definitely analyse the results and compare them to a benchmark (Wikipedia).

Wikipedia is not a programme, and ChatGPT is not a "book", and testing will not reveal technical indicators. Neither the speed of responses, nor page load lags. The main goal is to determine the expediency of using ChatGPT as a source of reliable information in responsible work. We will check the accuracy, completeness and reliability of information on the spectrum of universal knowledge. Therefore, benchmarking is here, a beautiful word somewhat fitting in meaning.

Among others, one of the most important goals of benchmarking is to determine the technical limitations of the LLM as an AI (for me the primary goal). The main motive, to find the absolute limit of this technology. How much can we improve and when will we hit the "wall"? And with what approach will we break the wall...?


About the indicators in the focus of the study:

Both Wikipedia and ChatGPT present information in two basic formats: descriptions and tables. In addition to tables, Chat can use graphs and charts, and Wikipedia illustrations, but we won't compare them.

We are interested in:

  • The completeness of the item's reflection in the description.
  • Completeness of the subject's data in the table.
  • The quality of the structure of the subject description.
  • The quality of the data structure in the table (i.e., the quality of the tables).

Note the last point. The structure and content of the table reflects the quality of the subject data classification. In the case of Wikipedia, the data is classified by humans, but we will be testing the AI (not humans). Let's see how well an advanced language model classifies (links and distributes data). The goal, to study and evaluate in detail the processing and inference of subject data, since this is the essence of the purpose of conversational AI.

Content, completeness, accuracy and ordering of information.... are all parameters to be tested.


The disciplines chosen for testing are:

  • Astronomy
  • Physics
  • Zoology

Each of these sciences has both descriptive and tabular content, on the basis of which branching classification models can be built. From this perspective, these fields of knowledge are well suited for our testing.


Conclusion:

I will talk about the test tasks, checking and summarising the results, and the conclusions in the next posts.

 

I'm testing an android app here (installed the player on my comp). It's not the top of what's buzzing around on social networks, but it's powerful stuff

What can I say. The profession of designer is becoming a thing of the past. Now you can easily fill the site with illustrations without fear that the picture of the author


Kittens meet the dawn on mars. Generated:



2. A shovel is funny. Generated


In general, there you can choose a style, make a detailed description. But of course a lot of left generation sometimes, but progress is in the face


3. Spiderman and Superman's car. Generated


4. Mushroom Apocalypse.


These were very short phrases to generate. Maybe you can give me some ideas, I'll have time to generate some).

 
Ask him to draw the apotheosis of logic and symmetry. I'm curious to see what it comes up with.
 
Neural network images have one drawback (no not just fingers), they are basically artefacts. It is worth to calm down from the excitement of seeing a new generated image and start to look at it closely. Almost all details will be "unfinished", solid asymmetry in everything, something lifeless like this, made in haste.

This drawback is just from the complete absence of algorithms (for example, checking the correspondence of eyes to eyes - pupil roundness alignment, etc.), there are no such "3D-checks", so all images are one solid artefact. But, but the absence of "manual" intervention and full dedication to neural networks made them free to creativity and open to learning better results.

On the other hand, these artefacts are now so small that if you do not look at the image closely, you will not see any problems.
 
Ivan Butko #:
Neural network images have one drawback (no not just fingers), they are basically artefacts. It is worth to calm down from the excitement of seeing a new generated image and start to look at it closely. Almost all details will be "unfinished", solid asymmetry in everything, something lifeless like this, made in haste.

This drawback is just from the complete absence of algorithms (for example, checking the correspondence of eyes to eyes - pupil roundness alignment, etc.), there are no such "3D-checks", so all images are one solid artefact. But, but the absence of "manual" intervention and full dedication to neural networks made them free to create and open to learning better results.

On the other hand, these artefacts are now so small that if you don't look at the image closely, you won't see any problems.
These images resemble the hastily made sketches of a talented artist who didn't have the patience to correct errors and flailed around for symmetry and accuracy.
 
The generation of pictures is not accompanied by a cognitive process.