MindMap Gallery First introduction to AI large models and development opportunities mind map
Establish a basic understanding of AI large models, and understand the key core of basic technologies and the opportunities of the times. Hope it helps everyone.
Edited at 2023-12-02 22:21:21This is a mind map about bacteria, and its main contents include: overview, morphology, types, structure, reproduction, distribution, application, and expansion. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about plant asexual reproduction, and its main contents include: concept, spore reproduction, vegetative reproduction, tissue culture, and buds. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about the reproductive development of animals, and its main contents include: insects, frogs, birds, sexual reproduction, and asexual reproduction. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about bacteria, and its main contents include: overview, morphology, types, structure, reproduction, distribution, application, and expansion. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about plant asexual reproduction, and its main contents include: concept, spore reproduction, vegetative reproduction, tissue culture, and buds. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about the reproductive development of animals, and its main contents include: insects, frogs, birds, sexual reproduction, and asexual reproduction. The summary is comprehensive and meticulous, suitable as review materials.
First introduction to AI large models and development opportunities
1. What is an AI large model?
AI large model is the abbreviation of "artificial intelligence pre-training large model", which includes the two meanings of "pre-training" and "large model". The combination of the two produces a new artificial intelligence model, that is, the model is trained on large-scale data sets After completing the pre-training, there is no need for fine-tuning, or only fine-tuning with a small amount of data is required, and it can directly support various applications.
Among them, pre-training large models is like a college student or even a doctoral student who knows all the basic knowledge and has completed a "general education". But they still need practice and fine-tuning after feedback to complete tasks better.
In addition, large AI models have many advantages such as general purpose and large-scale replication, and are an important direction for realizing AGI (artificial general intelligence).
Current large AI models include natural language processing (NLP), computer vision (CV), etc., as well as unified and integrated multi-modal large models. For example, ChatGPT is a breakthrough innovation in the field of natural language processing. It understands and speaks "human language". It surpasses previous natural language processing models and can handle various natural language processing tasks, including machine translation, question answering, text generation, etc.
To put it simply, we can think of a large model as a very large knowledge base, which stores a large amount of information and knowledge, which can help the computer better understand and process the input data. Each neuron and parameter in the large model together form a powerful network that can efficiently process and transform input data.
At present, domestic companies such as Baidu, Alibaba, Tencent, and Huawei have developed large AI models. Each model series has its own focus, and some have been launched and some applications have been implemented.
Baidu has been deploying AI for many years and has a certain first-mover advantage with large models. Currently, the number of companies that have applied for Wen Xin Yi Yan’s API call service testing has exceeded 65,000. In terms of large industry models, it has been applied in cases with State Grid, Shanghai Pudong Development Bank, Geely, TCL, People's Daily Online, Shanghai Dictionary Publishing House, etc.
Alibaba Tongyi's large model is good at logical operations, coding capabilities, and voice processing. The group has a rich ecosystem and product lines, which are widely used in travel scenarios, office scenarios, shopping scenarios, and life scenarios.
Tencent's Hunyuan large-scale model has been put into use in advertising and game production. The group is currently researching conversational intelligent assistants and is expected to optimize the QQ and WeChat ecology after being put into use.
Huawei cooperates closely with the B-side and it is expected that future applications will be mainly ToB. In addition, Huawei has abundant reserves in algorithms and computing power. For example: "Pengcheng Cloud Brain II" has won the global IO500 ranking for five consecutive times and has strong AI computing power and data throughput capabilities; Huawei Cloud ModelArts platform has the ability to efficiently process massive data, completing 40TB of text data processing in 7 days; Pangu Data The model was officially released as early as April 2021. The current Pangu large model training text data is up to 40 TB (GPT-3 is 45 TB).
2. Key technical points of AI large models
Large models usually consist of hundreds of millions to billions of parameters and need to be trained and optimized on massive amounts of data to achieve higher prediction accuracy and generalization capabilities. People in the industry often say that large models are the product of the combination of “big data, big computing power, and strong algorithms”. The key to industry development also lies in these three points.
Big Data
Data is the nourishment for algorithm training. In the early stage, the model needs to be fed with a large amount of data to form the model's understanding ability. The quality of the data fed in the middle and later stages determines the accuracy of the model.
Taking the GPT model as an example, one of the reasons why ChatGPT performs better is that it provides high-quality real data based on unsupervised learning.
However, machine learning data needs to be manually labeled in advance. Labeling is to process the primary data and convert it into machine-recognizable information. Only after a large amount of training and covering as many scenarios as possible can a good model be obtained.
Currently, most training data sources are public data. For example, according to the article of Dr. Alan D. Thompson (former chairman of Mensa International, artificial intelligence expert and consultant), the data sets for large models listed include Wikipedia, books, journals, and Reddit links. , Common Crawl and other datasets, etc.
On the one hand, there is a large amount of data. On the other hand, the richness and authenticity of the data are also crucial to the training of large models. In the middle and later stages of training, high-quality data will improve the accuracy of the model. for example:
More factual data will improve model accuracy;
A more fluent Chinese language will improve the model's ability to understand Chinese language;
More accurate vertical data can complete the construction of models in some more subdivided areas.
In addition, high-quality feedback data can improve model performance. For example, ChatGPT uses human reinforcement learning RLHF to enhance the model's understanding of human language logic through more professional questions, instructions, human feedback sorting, etc.
For domestic large-scale models, there are two challenges that still require efforts: the quality of domestic Internet corpus is relatively poor, and high-quality Chinese annotation data sets are scarce; labels are mainly manually annotated, and specific annotation technical details and training of annotators still require domestic Technology business exploration.
Big computing power
The data provides the foundation of the house. How high it can be built depends on computing power. Computing power is the computing power of a computer system, that is, the ability to process data and perform computing tasks.
In the field of AI, deep neural networks require a lot of calculations and training, especially for large-scale models and complex tasks, which require more computing power to support.
Taking the GPT large model as an example, as the number of parameters of GPT, GPT-2 and GPT-3 (the current open version is GPT-3.5) increases from 117 million to 175 billion, the amount of pre-training data increases from 5GB to 45TB. The demand for power increases accordingly.
Therefore, an increase in computing power can improve the training speed and efficiency of the model, as well as improve the accuracy and performance of the model.
To measure whether leading manufacturers can support the computing power requirements of training and inference, two more points need to be considered: whether the money is enough, how long it will be enough, and how long the company's strategy is.
A long-term investment strategy and sufficient capital budget are necessary elements to reproduce ChatGPT.
Take Baidu as an example. After "All IN AI" was proposed in 2017, capital expenditures fluctuated. Last year's capital expenditures (excluding iQiyi) reached 18.1 billion yuan. During the same period, operating cash flow increased by 30% to 26.17 billion yuan. As of 2022 At the end of the year, the company's balance of cash and cash equivalents used for capital expenditures was 53.16 billion yuan, which is enough money for a long time.
In addition, the computing power infrastructure is actually a chip. The better the chip performance, the faster the processing power of large models. This is why money and strategy are needed to support planning.
strong algorithm
An algorithm is a set of problem-solving steps and rules that can be used to perform a specific calculation or operation. Often used to design and implement computer programs to solve various problems.
The quality of the algorithm directly affects the efficiency and performance of the program. For example, ChatGPT’s algorithm breakthrough lies more in ideas rather than specific theories. It is an innovation in “recipes” rather than “ingredients”, which has become one of the difficulties in replication.
How to judge the quality of an algorithm? There are three main points: space complexity, time complexity and robustness.
Time is how long it takes the algorithm to complete its task;
Space refers to the memory space required by the algorithm to complete the task;
Robustness refers to the algorithm's tolerance for abnormal data and noise.
Usually, the smaller the time complexity and space complexity, the higher the efficiency of the algorithm. A good algorithm should have high robustness, be able to perform tasks correctly under various circumstances, and output clear information.
In practical applications, the most suitable algorithm can be selected according to specific needs and scenarios, and a balance point can be found by taking the above factors into consideration.
For example, GPT is developed based on the Transformer model. Compared with the traditional recurrent neural network (RNN) or convolutional neural network (CNN), the Transformer has better parallelism and shorter time when processing long text. Training time, achieving the right trade-off between cost, scale and efficiency.
From the perspective of domestic large models, the barriers to algorithms, data, and computing power are not insurmountable. With the flow of talents, the passage of time, and research progress, the performance of large models is likely to gradually converge.
With the deepening of industrial applications and the increase in scene complexity, there will be explosive growth of data, rapid iteration of algorithms, and an exponential increase in the consumption of computing power, all of which have put forward new requirements for the development of artificial intelligence.
3. Opportunities in the era of large AI models
In the future, the traditional requirements of "mastering general knowledge, process work ability, etc." will gradually become hidden bottom-level requirements, while the more explicit and high-level requirements are the ability to "create value and efficiently use tools to solve problems."
For ordinary people, the opportunities brought to us by large AI models can be roughly divided into two categories, one is short-term investment opportunities, and the other is long-term career opportunities.
In the short term, companies with technical reserves in the field of large models will have more advantages, such as Tencent Holdings, Alibaba, Baidu, etc. At the same time, you can pay attention to key targets that have taken the lead in video, marketing, reading and other related subdivisions, such as iFlytek, Danghong Technology, Jebsen Holdings, BlueFocus, Fengyuzhu, Zhejiang Internet, etc.
In the long run, to borrow what Lu Qi said in his speech: "This era (the era of large models) is very similar to the gold rush era. If you went to California to dig for gold at that time, a lot of people would die. But those who sell spoons and shovels People can always make money.”
Entrepreneurial innovation driven by human technology can be mainly divided into three types of opportunities - underlying technology, meeting needs, and changing the world.
The first is the lowest level of digital technology. Digitalization is an extension of human beings. All large-model AI currently released, including GPT, are based on technology. Chip companies including Nvidia and Cambrian also provide hardware facilities for the underlying technology. We can look for opportunities that suit us, or work hard to improve our skills for this position, such as front-end, back-end, equipment, chips, etc.
The second is to use technology to solve needs. Demand can be divided into two directions: To C, AI can be used to solve everyone's entertainment, consumption, social networking, content, etc., and all needs that can help people live a better life need to be met; To B, it can help enterprises reduce costs and increase growth. effect. The opportunities in this part are mainly to contact people, better understand user needs, and bring better products or experiences.
The third is to change the world. For example, energy technology, transformed energy, or life sciences, or new space. For example, Musk is working on robots, brain-computer interfaces, etc., even the Metaverse and Web 3.
Lu Qi mentioned in his speech his views on large models: Larger scale and more complex model structures mean wider application fields and more opportunities - but they must be carefully considered, think first, and then use Action oriented.
The opportunities for ordinary people are very similar to the development of large models. Long-term development must be driven by technology, but dismantling, analyzing, sorting out and controlling the needs during implementation is everything. Do what you can, and leave the rest to the future!