The article takes you to know what is generative artificial intelligence!
With the popularity of AI products such as ChatGPT and ERNIE Bot, generative AI has become a hot topic in our spare time.
However, why do you want to add the word "generative" before AI?
Is there any other AI?
one
What is a generative AI?
If artificial intelligence is simply classified according to its purpose, AI should actually be divided into two categories: decision-making AI and generative AI.
Decision AI: Focus on analyzing the situation and making decisions. It helps users or systems to choose the best course of action by evaluating various options and possible results.
For example, in self-driving vehicles, it is through the decision-making AI system to decide when to accelerate, decelerate or change lanes.

Generative AI: Focus on creating new content. It can automatically generate text, images, music and other content according to the learned data.
For example, you can send several papers to generative AI, and he can generate a literature review, which includes the key ideas and important conclusions of these papers.

See here, you will know why ChatGPT and ERNIE Bot belong to generative AI, right?
Next, let’s officially enter the world of generative AI.
2
Past lives of generative AI
2
In fact, the generative AI is not just born in recent years, it has actually gone through three stages:
In 1950, Alan Turing put forward the famous "Turing Test", which was a milestone in the field of generative AI, indicating the possibility of AI content generation.
In 1957, Lejaren Hiller and Leonard Isaacson completed the Illiac Suite, the first music work completely composed by computers in history.
From 1964 to 1966, Joseph Weizenbaum developed the world’s first robot "Eliza" with human-computer dialogue, which completed interactive tasks through keyword scanning and recombination.
In 1980s, based on the invisible Markov chain model, IBM created the voice-controlled typewriter "Tangora".

With the development of Internet, the scale of data expands rapidly, which provides massive training data for artificial intelligence algorithms. However, due to the limited hardware foundation, the development at this time is not rapid.
In 2007, the artificial intelligence system of Ross Goodwin, an artificial intelligence researcher at new york University, wrote the novel 1 The Road, which is the first novel created entirely by artificial intelligence in the world.
In 2012, Microsoft publicly demonstrated a fully automatic simultaneous interpretation system, which can automatically generate Chinese speech from the contents of English speakers through speech recognition, language translation, speech synthesis and other technologies.

Since 2014, a large number of deep learning methods have been proposed and iteratively updated, marking a new era of generative AI.
In 2017, Xiao Bing, an artificial intelligence girl from Microsoft, launched the world’s first poetry collection "Sunshine Lost Glass Window" created by artificial intelligence.
In 2019, the Google DeepMind team released the DVD-GAN architecture to generate continuous video.
In 2020, OpenAI released ChatGPT3, which marked an important milestone in the fields of natural language processing (NLP) and AIGC.
In 2021, OpenAI introduced DALL-E, which is mainly used to generate content by the interaction between text and images.
Since 2022, OpenAI has released a new model of ChatGPT for many times, which has set off another climax of AIGC. It can understand and generate natural language and have a complex dialogue with human beings.

Since then, the generative AI has reached a blowout state. So, what kind of principle is generative AI based on?
three
Easily understand the principle of "generative AI"
In the introduction just now, everyone should have a superficial understanding of generative AI: learning knowledge+generating new knowledge.
But how does it learn? How is it generated?
At this time, we have to look at the deeper definition of generative AI:
definition
Generative AI, represented by ChatGPT, is to carry out research on existing data and knowledge.Induction of vectorization, summarizing the dataJoint probability.So that when generating content, new content is generated according to the user’s requirements and the probability of associated words.
Are you suddenly confused?
Don’t worry, this touches on the principle of generative AI. Wait for Xiaobian to give you a slow analysis.
In fact, making a generative AI is like turning a clay figurine into a genius, which requires four steps: kneading the clay figurine → loading the brain → feeding knowledge → having output.

To build a "clay figurine" of generative AI, the first thing to consider is the underlying hardware. The underlying hardware constitutes the computing power and storage power of generative AI.
Computational power-the skeleton of clay figurine
Generative AI requires a lot of calculation, especially when dealing with images and videos. Large-scale computing tasks are inseparable from the following key hardware:
GPU (Graphics Processing Unit): Provides powerful parallel computing capabilities. By thousands of small processing units working in parallel, the calculation efficiency is greatly improved.
TPU (Tensor Processing Unit): Hardware specially designed to accelerate the learning of artificial intelligence, which can significantly speed up the calculation and further enhance the strength of the skeleton.

Persistence-blood of clay figurine
Generative AI needs to process and store a large amount of data.
Taking GPT-3 as an example, the number of training parameters alone reached 175 billion, the training data reached 45TB, and 4.5 billion words of content were generated every day.
The storage of these data can not be separated from the following hardware facilities:
Large-capacity RAM: When training the generative AI model, a large number of intermediate calculation results and model parameters need to be stored in memory, and large-capacity RAM can significantly improve the data processing speed.
SSD (Solid State Disk): A large-capacity SSD has the ability of high-speed reading and writing, which can quickly load and save data, enabling clay figurines to store information efficiently.

The clay figurine has been pinched, but now it can only be a marionette without any ability, so we have to equip him with a brain.
Software architecture is the brain of a clay figurine, which determines how the clay figurine will think and reason about the data.
From the perspective of bionics, human beings hope that AI can imitate the operating mechanism of the human brain and think and reason about knowledge-this is commonly called deep learning.

In order to realize deep learning, scholars have proposed a large number of neural network architectures:
Deep Neural Network (DNN) is the most common neural network architecture. However, as the requirements of data for network architecture become more and more complex, this method is becoming more and more difficult.
Convolutional neural network (CNN) is a kind of neural network architecture specially designed for processing image data, which can effectively process image data, but it needs complex preprocessing of input data.
With the increase of task complexity, the architecture of recurrent neural network (RNN) has become a common method to process sequence data.
Because RNN is easy to encounter the problems of gradient disappearance and model degradation when dealing with long sequences, the famous Transformer algorithm is proposed.

With the development of computing power, the network architecture of generative AI has become more and more mature, and it has also begun to focus on its own:
Transformer architecture: is the mainstream architecture in the field of text generation at present, and LLM (Large Language Model) such as GPT and LLMA 2 are all based on Transformer to achieve excellent performance.
GANs architectureIt is widely used in image generation, video generation and other fields, and can generate high-quality images and video content.
Diffusion architectureIt has achieved good results in the fields of image generation and audio generation, and can generate high-quality and diversified content.

The network architecture is built, and the brain is there, but the brain is empty. So we feed knowledge to this artificial brain through data training.
At present, there are two training methods: pre-training and SFT (supervised fine-tuning)
Pre-training:It refers to feeding a large and universal data set as knowledge to AI for preliminary learning.
The pre-trained model is called "basic model", which knows something about every field, but can’t become an expert in a certain field.
SFT: SFT refers to feeding the data set of a specific task to AI after pre-training to further train the model.
For example, on the basis of pre-trained language model, special medical texts are used to fine-tune the model to make it better at dealing with medical-related questions and answers or text generation tasks.
However, whether it is pre-training or SFT, how does AI’s brain absorb this knowledge?

This involves the ability to "understand". Let’s take the Transformer architecture as an example to talk about AI’s understanding of the text.
For AI, understanding is divided into two steps:Understand wordsandUnderstand sentences.
Understanding the essence of words is the classification of words. The researchers proposed a method:Words are disassembled in different dimensions to classify them.
Suppose there are four words: watermelon, strawberry, tomato and cherry. AI takes these words apart in two dimensions:
Color dimension: 1 represents red and 2 represents green.
Shape dimension: 1 represents a circle and 2 represents an ellipse.

Based on this dimension, AI scores and classifies words.
Watermelon: color =2 (green), shape =1 (round)
Strawberry: color =1 (red), shape =2 (oval)
Tomato: color =1 (red), shape =1 (round)
Cherry: color =1 (red), shape =1 (round)

Through these scores, we can see the classification of words in different dimensions.
For example, "tomato" and "cherry" are the same in color and shape dimensions, indicating that they have the same meaning in these two dimensions; "Strawberry" and "watermelon" are different in color and shape dimensions, which shows that they have different meanings in these two dimensions.
Of course, there are not only two dimensions to distinguish them, but AI can also score them from a large number of dimensions such as size, sweetness, and whether there are seeds, so as to classify them.
As long as there are enough dimensions and accurate scores, the AI model can understand the meaning of a word more accurately.

For the current advanced AI model, the number of dimensions can usually reach thousands.
Learning words and understanding them as quantitative results has only completed the first step, and then AI needs to further understand a set of words: sentences.
We know that even the same word in different sentences will have different meanings.
For example:
This is a topgreenThe hat.

So-and-so is committed to buildinggreenComputer room.

The word "green" has different meanings in different sentences. How does AI know that they have different meanings?
This is due to the "transformer Architecture"Self Attention (self attention)"mechanism.
To put it simply, when AI understands a sentence containing a group of words, it will "look" at the words around it in addition to understanding the words themselves. The correlation between a single word and other words in a sentence is called "attention", and it is called "self-attention" because it is understood in combination with the words in the same sentence itself.
Therefore, in the Transformer architecture, it can be divided into the following two steps:
Convert each word into a vector. This vector represents the position of words in multi-dimensional space and reflects various characteristics of words.
Use the self-attention mechanism to pay attention to different parts of the sentence. It can consider the information of other words in the sentence while processing each word.
After understanding a large number of words and sentences, AI can generate content. How does it generate content?
This is a question of probability.
Ask everyone a question:
I eat x in the restaurant.
Fill in a word. What would you fill in?
According to your past experience, there is a high probability that you will fill in the "rice".
In fact, x can also be "cake", "noodles", "eggs" and so on.

Like people, generative AI will add probability to these words according to the experience it learned in the third step. Then select the words with high probability as the generated content. Then, AI will repeat this process and select the next most likely word to generate more content.
But sometimes, we hope that the answer is colorful. Going back to that example, we don’t want the next word that AI receives now to be "rice". What should we do?
AI provides an adjustment parameter called temperature, which ranges from 0 to 1.
When the temperature is 0, it means that the matching probability should be as large as possible. In the above example, AI is likely to choose "rice";
When the temperature is 1, it means that the matching probability should be as small as possible. In the above example, AI is likely to choose "cake".
The closer the value is to 1, the more imaginative the content is.
For example, if the temperature is set to 0.8, the sentence generated by AI may be:
I eat a cake in the restaurant. This cake is big and round. I want to put it around my neck. ……

However, we see that most AI products have only one dialog box. How to modify the temperature parameters?
The answer is "prompt", which is what we usually call prompt.
If your input is "You are an expert in such and such fields, please write a literature review about xx in a rigorous tone." At this time, when the temperature of AI is close to 0, words with the highest matching probability will be selected to generate sentences.
If your input is "Please imagine the future of xx." At this time, when the temperature of AI is close to 1, words with the lowest matching probability will be selected to form sentences and unexpected content will be generated.
Now you know the importance of prompt!
Therefore, we can think that the essence of AI generation is a game.Words Solitaire: AI chooses the next word according to the current word, the occurrence probability of the next word recorded before contacting it and your expectation.

Of course, the internal principle of generative AI is far more complicated than what Xiaobian said. Xiaobian can only be regarded as a basic science popularization here.
three
Where is the "generative AI" going?
So will generative AI really realize universal artificial intelligence, thus replacing human beings? At present, there are two views:
Active faction:The activists headed by Altman, CEO of OpenAI, and Huang Renxun, CEO of NVIDIA, are very optimistic about the future of generative AI. They once said that "in a few years, artificial intelligence will be more powerful and mature than it is now; In another ten years, it will shine brilliantly ","AI may surpass human intelligence in five years.”。

Negative school:The negative faction headed by Yang Likun, a pioneer of deep learning, has always believed that generative AI cannot lead to the use of artificial intelligence. He said on many occasions that "a large language model like ChatGPT will never reach the level of human intelligence" and "artificial intelligence trained by human beings is difficult to surpass human beings".

So for us ordinary people, how do we treat generative AI?
Xiao Bian thinks that we ordinary people might as well regard it as a tool, learn to use it, improve our work efficiency, enrich our daily life, keep curiosity about the world and fully enjoy the convenience brought by technology!
Source: ZTE document
Original title: This is the [generative AI]! !
Editor: K.Collider
Reprinted content only represents the author’s point of view
Does not represent the position of Institute of Physics, Chinese Academy of Sciences.
If you need to reprint, please contact the original WeChat official account.
Reporting/feedback