MindMap Gallery 【AIGC】6 AIGC application maps
As artificial intelligence technology continues to achieve breakthroughs and iterations, the topic of generative AI has become popular many times, and the industrial development, market response and corresponding regulatory requirements of artificial intelligence content generation (AIGC) have also received widespread attention. Yitu takes the content generation mode as the perspective, covering the technological development, key capabilities, and typical application scenarios of AIGC in the fields of image generation, audio generation, video generation, three-dimensional generation, language generation, molecular discovery and circuit design (picture generation), and introduces the challenges faced by different AIGC industries in my country in the commercialization process and their prospects.
Edited at 2025-02-10 15:40:35Ceci est une carte mentale sur la carte mentale des experts en bourse (version détaillée).
This is a mind map about the mind map of stock trading experts (detailed version). The main contents include: 1. Mindset management, 2. Basic knowledge, 3. Technical analysis, 4. Fundamental analysis, 5. Trading strategy, 6. Risk control, 7. Continuous evolution.
Questa è una mappa mentale sulla mappa mentale degli esperti di trading azionari (versione dettagliata).
Ceci est une carte mentale sur la carte mentale des experts en bourse (version détaillée).
This is a mind map about the mind map of stock trading experts (detailed version). The main contents include: 1. Mindset management, 2. Basic knowledge, 3. Technical analysis, 4. Fundamental analysis, 5. Trading strategy, 6. Risk control, 7. Continuous evolution.
Questa è una mappa mentale sulla mappa mentale degli esperti di trading azionari (versione dettagliata).
CogVideo
Implementation principle: CogVideo is a large-scale text-video generation model based on autoregression method. It applies the image generation model CogView2 to text-video generation to achieve efficient learning, and generates videos by predicting and constantly splicing the previous frame. Pros and cons: Advantages: This model supports Chinese prompt. The multi-frame rate hierarchical training method can better understand the text-video relationship, and the generated video looks more natural. Disadvantages: There are restrictions on the length of the input sequence.
Product Usability Challenge
Video production speed, convenience, integration of content and interactivity
Stable and controllable challenge
Duration control, content control, utilization and training of limited data, generation results and process adjustment
Material copyright, privacy and security, ethics
Compliance application challenges
Video style migration
● Artistic expression of film and television works ● Advertising style conversion
● Film and television/advertising performance optimization ● Repair of old movies and precious image data ● Improved security monitoring and medical image quality
Video enhancement
● Virtual scenes, characters, and special effects generation ● Movie trailer generation ● Video ad generation ● Dynamic human body structure and disease model generation
Video generation
● Post-production editing and special effects processing of film and television ● Short video material editing and special effects addition
Video Editing
● Security monitoring and early warning, intelligent traffic management ● Marketing content label generation, sentiment analysis ● Film and television analysis
Video content recognition
● Transition effect between frames ● Continuity of action ● The smoothness of the picture ● Smooth switching of scenes
Continuity
● High resolution ● The realism of the scene and characters ● Clear and rich picture details ● The logic of video content
● The video length is variable and controllable ● Relevance to a given description ● Video attributes and video elements are controllable and editable
Controllability
Realism
Principles of mainstream model implementation and advantages and disadvantages
● Mainstream models:
Imagen-Video
Gen
Implementation principle: Imagen-Video is a video model based on text conditions developed based on the Imagen model. The model uses a combination of multiple diffusion models to generate the initial video based on the text propt, and then gradually improve the resolution and frame number of videos to generate the video. Pros and cons: Advantages: The generated videos have high fidelity, controllability and world knowledge, support the generation of various videos and text animations in various artistic styles, and have the ability to understand 3D objects. Disadvantages: The parallel training method used by the cascading model requires high computing resources.
Implementation principle: The Gen model learns text-image features through the potential diffusion model, and can generate new videos based on a given text prompt or reference image, or drive images to perform video style conversion based on the original video. Pros and cons: Advantages: This model has good performance in video rendering and style conversion, and the generated video has strong artistic and image structure retention ability, so it can better adapt to the model customization requirements. Disadvantages: The Gen model still has limitations in the stability of the generated results.
● Domestic and foreign representative models:
Is it open source?
Organization
Introduction
Model
Not open source
Text-video generation model based on diffusion model has the advantages of the generation speed, good video quality, and ability to understand a variety of artistic styles and 3D objects.
Imagen Video
Make-a-Video
Meta
Not open source
No text-video data is required, use text-image data training to achieve video generation, which improves the time and spatial resolution of generated videos
NUWA-XL
Microsoft Research Institute
Already open sourced on GitHub
The ultra-long video generation model based on Diffusion over Diffusion architecture has good video quality and continuity, and can greatly reduce inference time
CogVideo
Tsinghua & Zhiyuan
Already open sourced on HuggingFace
Large-scale text-video pre-training model, multi-frame rate hierarchical training strategy can better align text and videos, and large-scale training data significantly improves the quality of generated videos.
Self-return Diffusion model stage
GAN/VAE Flow-based Generation stage
Image stitching Generation stage
video generate
● Decoupling through foreground and background, motion and content segmentation Solutions, image translation and other methods to improve the generation effect ● Video quality is still low
● Autoregressive model: frame prediction generation, good coherence, but low efficiency, and errors are easy to accumulate ● Diffusion model: Migrate the literary picture architecture to video generation, with high fidelity, but high resource consumption
● Static image stitching to form a video stream ● Simple and easy to use, but low video quality and poor coherence
Film and television game scene production, advertising, digital people
Visual post-production effects
Film and television editing, video face change
Typical Applications
Transformer-TTS
Implementation principle: Transformer-TTS is an end-to-end speech generation model that combines the Transformer structure and applies it to the TTS system. Specifically, Transformer-TTS improves training efficiency by introducing a multi-head attention mechanism to construct an encoder-decoder structure, uses phoneme sequences as input to generate a Mel spectrum, and outputs waveforms through a WaveNet vocoder. Pros and cons: Advantages: The speech model with the Transformer structure can speed up the training speed, solving the problems of low training speed and difficulty in establishing a long-dependence model in Tacotron2. Transformer is based on the understanding of semantics and relationships, and also makes the effect of sound synthesis more natural. Disadvantages: There are problems with slow inference and model deviation caused by accumulation of autoregressive errors.
● Domestic and foreign representative models:
Tacotron2
Implementation principle: Tacotron2 is an end-to-end speech synthesis model composed of a sound spectrum prediction network and a vocoder based on the combination of WaveNet and Tacotron. Among them, the sequence-to-sequence prediction network extracts the input model of text features, superimposes the predicted values onto the Mel spectrum, and the vocoder generates a time domain waveform based on the predicted sequence. Pros and cons: Advantages: The gradient vanishing problem is optimized through the improvement of attention mechanism, the sound quality of speech generation is good, and it is good robust to the input text data. Disadvantages: The synthesis speed of autoregressive models using RNN structure is slow, it is difficult to pronounce complex words, the generated speech lacks emotional color, and the training time and cost of large data sets are high, and the model lacks controllability.
● Mainstream models:
Customized demand challenges
3
Multimodal Fusion Challenge
2
1
Data gap challenge
Personalized voice synthesis, professional interaction ability, customized voice engineering ability
Convergence of perception, cognitive and synthetic technologies for application of product controllability and generalization capabilities
Low-resource speech synthesis training, text enhancement, construction of synthetic data, and user data compliance precipitation
Voice conversion Style Transfer Application: Film, TV, animation, games and other fields: setting of voices of different characters Scenarios involving personal privacy and security: Privacy processing of sounds Synthetic data application: Constitute synthetic data and increase the scale of training data
Voice enhancement, voice repair Perform noise reduction, filtering, gain and other processing on voice signals Application scenarios: telephone recording, video conferencing, voice interaction services in public environments to improve voice recognition capabilities and generation quality Application of historical audio data: restoration of historical audio data, speculative synthesis of ancient language pronunciation Value of historical research: It has important application value for historical research
Music generation Coherent music with semantic and consistent style can be generated based on the prompted audio clip or text description. Music and film and television fields: song arrangement, music style refinement, background music and ambient sound generation, etc.
Voice interaction Human-computer dialogue scenario application: widely used in various types of human-computer dialogues Enterprise services, finance and other industries: Intelligent customer service robots conduct voice Q&A with customers to save labor costs Home appliances, automobiles and other industries: smart home, smart car scenarios, voice assistants complete user commands News and media industries: simultaneous interpretation work in international conferences, exhibitions and other activities
Phonetic synthesis Applications in the field of pan-entertainment: long-term sound production scenes such as news broadcasting, audio reading, and other long-term sound production scenarios and industrial manufacturing applications: voice navigation, traffic command Industrial automation control Cross-language synthesis applications: pronunciation translation, language learning Application in the medical field: medical wearable devices such as artificial throat
Voice recognition Feature extraction and conversion of input audio into corresponding text or commands to realize text conversion of vocal oral statements or various audio contents C-end scenario application: voice input method and oral notes on smartphones Industry application scenarios: archive retrieval, electronic medical record entry, film and television subtitle production
Audio generate
● Control of voice speed, rhythm and rhythm ● Text and pronunciation comprehension ability in different language backgrounds ● Grasp the characteristics of emotional phonemes
control ability
voice quality
● High accuracy ● Anti-interference ability
generate speed
● Individual users: Can the generation speed respond to requirements in real time ● Enterprise users: The impact of generation speed on business processes
● Current mainstream audio synthesis method ● Reduce the difficulty of training linguistics knowledge ● The sound is natural, approaching the effect of real person's voice
● The required original sound data is small in scale and smooth ● Much noise ● Heavier voice mechanics
● The sound quality is better based on real person recording ● Relying on voice database data volume ● The word connection transition is relatively stiff
End-to-end synthesis stage
Parameter synthesis stage
Splicing synthesis stage
Generate melody, music
Based on visual content (Image or Video) Make a voice description
Text-to-Speech Synthesize pronunciations based on text
3
Regulatory compliance and privacy protection Copyright protection AI governance
2
Productization capability prompt understand Use of fine-tuning tools
Data capability Closed-loop operation of data assets
1
Image super resolution Medical cases and anatomical structure creation Astronomical observation, satellite remote sensing martingale color tons measurement
Image repair Digital historical document restoration, Image repair old photos and old movie repair
Image generation, image style conversion Art creation, image editing, image artistic enhancement cartoon characters, game scene production posters, product LOGO and packaging design
Image classification, image segmentation Target recognition, image retrieval industrial design Analysis of changes in medical imaging annotation and anatomical pathological structure
3
4
2
1
Image controllability Image Detail Control Subsequent adjustments
Image diversity Detail expression and style expression Semantic consistency of multiple images or different styles
Image stability Describe data for distortion, distortion, and exceptions Anti-interference ability
Image quality The richness of picture quality and detailed information The realness of the image
Introduction
Based on the potential diffusion model framework, it can reduce the computing power requirements and deployment thresholds specifically used in literary graphics tasks. It has now become the basic framework of most image generation models.
Based on the CLIP and diffusion model framework, the generation of images can maintain good semantic consistency
A multimodal generation model based on the diffusion model framework, propose a hybrid expert model, and automatically select the optimal generation network
Based on the image generation model after fine adjustment of the diffusion model, it is deployed on Discord, and is good at artistic style image expression
Is it open source?
GitHub open source
Not open source
Not open source
Not open source
mechanism
StabilityAl
OpenAl
Baidu
Midjourney
Model
Stable Diffusion
DALL-E2
Midjourney V5
Wenxin ERNIE-VILG2.0
● Domestic and foreign representative models:
CLIP: Contrastive Language-image Pre-training
Implementation principle: The text-image cross-modal pre-trained model based on contrast learning is to extract the text and images respectively through an encoder, map the text and images to the same representation space, and train the model through the similarity and difference calculation of the text-image pair, so that images that conform to the description can be generated based on the given text. Pros and cons: Advantages: No data need to be marked in advance, perform well in zero-sample image text classification task, more accurate grasp of text description and image style, change non-essential details of the image without changing the accuracy, and better perform in terms of diversity in generated images. Disadvantages: There are limitations in the performance of complex and abstract scenarios, and the training effect depends on large-scale text-images to consume relatively large-scale data sets and training resources.
Implementation principle: By defining a Markov chain of a diffusion step, the image is generated by continuously adding random noise to the data until a pure Gaussian noise data is obtained, and then the reverse diffusion process is learned, and the image is generated by reverse noise reduction inference. The diffusion model systematically perturbs the distribution in the data and then restores the data distribution, making the entire process appear a gradual optimization property, ensuring the stability and controllability of the model. Pros and cons: Advantages: More accurately restore real data, stronger ability to keep image details, and better realistic image. Especially in applications such as image completion repair and molecular map generation, it can achieve good results. Disadvantages: complex calculation steps, slow sampling speed, and weak generalization ability to data types.
Diffusion Model
● Mainstream models:
Diffusion model generation stage
● Current mainstream image generation model ● The diffusion process significantly improves stability, accuracy and diversity Combined with CLIP, it can be applied to cross-modal image generation tasks ● Significantly improve the speed and quality of images generated.
Autoregression generation stage
● The self-attention mechanism based on the Transformer structure improves stability and rationality ● Issues of inference speed and training cost limit applications
● Previous generation image generation model ● Improve generation and identification ability through confrontation training ● Poor stability, lack of diversity, and pattern collapse
GAN generation stage
True color image generation
RGB diagram
Use the combination of RGB three primary colors to represent the color value of each pixel and store it directly in the image matrix
Image generation with relatively simple color composition such as molecular maps
Constructed by a two-dimensional matrix and a color index matrix MAP that stores the image
Index graph
Image generation
Image-to-Image Generate new images from existing images
Image Composition Image synthesis
Text-to-Image Generate semantic-compliant images based on text description
AIGC - Audio Generation
AIGC - Video Generation
AIGC - Image Generation
definition
definition
definition
Audio generation refers to the process of synthesizing corresponding sound waveforms based on the input data. It mainly includes synthesizing speech (Text-to-Speech) based on text, performing speech conversion between different languages, performing speech descriptions based on visual content (image or video), and generating melody, music, etc.
Video generation refers to the training of artificial intelligence, which enables it to automatically generate high-fidelity video content that conforms to the description based on given single-modal or multimodal data such as text, images, videos, etc.
Image generation refers to the process of using artificial intelligence technology to generate images single-modal or cross-modal based on given data. Depending on the task objectives and input modes, image generation mainly includes image synthesis, generating new images based on existing images (Image-to-Image), and generating semantic images based on text descriptions (Text-to-Image).
The main types and application areas of audio generation
The main types and application areas of video generation
Main types and application fields of image composition
Typical Applications
nature
Data Type
nature
Data Type
Image Type
nature
Typical Applications
Information broadcasting, human-computer interaction service
Extract text information features and synthesize voice information
Text information
Edit, synthesize and edit multiple videos to generate new videos, including video attribute editing, clip editing, video part editing, etc.
Edit Generation
Binary graph
The two-dimensional matrix of the image consists of only 0 (black) and 1 (white), which can be regarded as binarization of the grayscale graph.
Text extraction, image feature extraction
Voice editing, voice translation, music production
Edit based on a given voice segment, or convert one language to another language's voice information
Audio information
Add a variety of effects to existing videos, such as filters, light and shadow, fireworks, etc. to enhance the creativity and artistic effect of the video
Special effects generation
Grayscale
The value range of two-dimensional matrix elements is usually 0 (pure black) to 255 (pure white), and the middle number represents the transition color between black and white.
Medical image and remote sensing image generation
Medical wearable devices
Perceive muscle movements such as the throat and face and synthesize voice
Muscle vibration
Content generation
Generate corresponding video content based on the given text, images and other information
Identify and understand visual content such as images and videos, and generate voice information corresponding to the lip shape
Digital people
Visual content
Key stages of technological development of image composition
Key stages of the technological development of audio generation
Key stages of technological development of video generation
Principles of mainstream model implementation and advantages and disadvantages
Image generation commercialization implementation challenge
Key factors affecting the application capabilities of models
Typical industrial application scenarios for image generation
Mainstream audio generation applications
Model
Introduction
Organization
Is it open source?
Tacotron2
First, it proposes an end-to-end speech synthesis model as the infrastructure of multiple speech system solutions
Already open sourced on GitHub
Automatic speech recognition model to improve speech recognition capabilities through large-scale and diversified data sets, and supports speech transcription, speech translation, etc.
Already open sourced on GitHub
OpenAl
Whisper
Fully convolutional sequence-to-sequence speech synthesis model, the multi-person speech synthesis effect can be improved by extending the speech synthesis model training dataset
Not open source
Baidu
DeepVoice3
Industrial-grade Chinese voice pre-training model, supporting multimodal voice recognition, emotional recognition, voiceprint recognition and other tasks
Not open source
iFlytek
SMART-TTS
Key factors affecting the application capabilities of models
Key factors affecting the application capabilities of models
The challenge of commercialization of audio generation
Typical industrial application scenarios for audio generation
Typical industrial application scenarios for video generation
Challenge for commercialization of video generation
Chip design
Food and Agriculture
energy
Materials Science
Personal Care
Artificial intelligence technology is developing rapidly, and new technologies may replace existing technologies, thus affecting the commercial value of existing technologies.
Technology development competition
● Drug development requires strict approval ● Copyright issues for integrated circuit design ● Molecular discovery models can be used to develop banned drugs and hazardous products
law With safety
Development and Verification Cost
Drug Design
applicability
Generate quality
key factor
Molecular discovery and integrated circuit design models need to be suitable for specific design purposes. The integrated circuit design model also needs to retrain the model, modify the architecture, manually adjust parameters, plan discovery principles, etc. in combination with the industrial design purposes to apply to industrial requirements.
For molecular discovery and integrated circuit design models, the quality of generation is the core factor that determines its application capabilities.
wiring
Job Type
Job Objectives
Representative model
Basic algorithms and models
DREAMPlace
Neural network parameter optimization
Machine Learning Layout Optimization
Graph Neural Network
PL-GNN
Graph Neural Network
Reinforcement learning
Fully connected convolutional network
Convolutional neural network
Convolutional neural network
Multi-layer sensing machine
... ...
Convolutional neural network
Variational Autoencoder
Monte Carlo Tree Search
RL for CF2
DeepPlace
Artificial Intelligence Layout Decision
CNN for RDP3
FCN for RDP4
Consider layout decisions for wiring
ML for RDPE
DLRoute
Artificial Intelligence Cabling Optimization
... ...
DeepPR
VAE for CR6
MCTS for CR5
Artificial Intelligence Cabling Decision
layout
Complete layout and wiring design
Wiring layout
Application scenarios for molecular discovery and circuit design
Combination optimization method
Deep generation method
Markov Chain Monte Carlo
MIMOSA
Two-dimensional
MARS
Two-dimensional
ConfGF
Three-dimensional
Diffusion model
Three-dimensional
EVFN
One-dimensional
ORGAN
Two-dimensional
One-dimensional
Two-dimensional
Three-dimensional
Three-dimensional
MoIDQNMOIDON
Reinforcement learning
GB-GA
STONED
Genetic Algorithm
BOKEI
BOA
Bayesian Optimization
Defactor
One-dimensional
Generate adversarial networks
Two-dimensional
ORGAN
GraphNVP
Two-dimensional
Standardized flow
Two-dimensional
MoFlow
SG-VAE
One-dimensional
Variational Autoencoder
Two-dimensional
CGVAE
SF-RNN
One-dimensional
Two-dimensional
MolecularRNN
Autoregressive model
Generate a representation
Representative model
Algorithms and models used
Generate method
principle
Job Type
Given a set of integrated circuit components, including standard units, macro modules, logic gates, etc., as well as characteristic information such as width and height of these components, It is also necessary to give information about the connection relationship between the pin positions of these components and the components, and to allocate the physical positions of the components based on the above information so that the components do not overlap each other.
layout
wiring
After the layout is completed, the pin position of the component has been determined, and the connection relationship between the components has also been determined. In the wiring area reserved during layout, According to the connection relationship between components and requirements such as the minimum total wiring length and the timing relationship between components, the connection circuit between components will be designed without violating the wiring rules.
Usually written text content that is factual, functional or entertaining
Three-dimensional representation
AIGC - Molecular Discovery and Circuit Design
definition
Molecular discovery and circuit design refer to using machine learning, deep neural network and other technologies to learn the structure, rules and properties of molecules and integrated circuits, and generate molecules and integrated circuits with similar structures, conform to specific rules and have target properties.
Major types and application fields of molecular discovery and circuit design
Expression method
principle
One-dimensional representation
Express molecules as strings to express the atoms and structures of molecules in characters
Two-dimensional representation
Express molecules as graph data, where atoms and bonds are represented as points and edges of graph data respectively
Mainstream model of molecular discovery
Mainstream circuit design model
Key factors affecting the application capabilities of models
Risks of commercialization of molecular discovery and circuit design
● Development requires a lot of data and talent costs ● The verification process requires a lot of uncertainty and has a long cycle
03 Technology and application substitution risks
02 Information security risks
01 error message Harmful information
E-commerce industry
News and Media
Education Industry
● Generate product description ● Analyze product reviews ● Generate product recommendations ● Generate an analysis report
● Generate news reports ● Conduct content creation ● Generate hosting oral broadcast ● Generate ad document
● Generate teaching plan ● Generate teaching plan ● Assist in correcting homework ● Provide study tutoring
Product R&D
● Assist in the development of IT products ● Generate test cases ● Generate product manual ● Generate operation steps
Customer Service Industry
● Generate a solution ● Intelligent customer service solution ● Understand customer intentions ● Exclusive customer service for large customers
Marketing
● Generate a quote ● Generate a sales plan ● Analyze market data ● Analyze sales data
Medical industry
● Assist doctors in writing medical plans ● Assist doctors in writing medical records ● Help patients match medical resources ● Provide diagnosis and treatment guidance for patients
● Analyze a large number of financial reports ● Generate a summary of key information ● Provide investment strategy advice ● Generate a data analysis report
Financial Industry
Customization and innovation capabilities
Meet differentiated customer needs Respond to market changes with innovation
Product Operation and Customer Support
Increase user stickiness and increase migration costs Realize user conversion and retention
Marketing capability
Effective dilution of costs Ensure profit margin
Extended model knowledge domain Discover application pain points and application needs
Generate quality
Tongyi Qian Questions
Wen Xin's words
discuss
Alibaba Cloud
Baidu Smart Cloud
Shang Tang
China
China
China
Benchmarking against ChatGPT's Chinese universal language generation application, accessing applications such as Baidu search engine, Baidu library, Xiaodu smart assistant, etc., reaching cooperation with many companies and institutions to explore language generation application scenarios
yes
no
no
Benchmarking the Chinese common language generation application of ChatGPT, supporting customized models of enterprises
Benchmarking against ChatGPT's Chinese universal language generation application, and vertical language generation application for medical and programming scenarios will be launched
Claude
Benchmarking the common language generation application of ChatGPT, optimize the generation language assistance and security, and provide enterprise-level language generation security application services
Anthropic
overseas
yes
Open online
yes
yes
Market
overseas
overseas
Organization
OpenAl
2022-To date: Application acceleration period ● Generating language quality is consistent with human level ● Industry companies actively explore the application scenarios and application methods of language generation in various industries and fields
Ⅵ
Ⅴ
F
E
Ⅳ
2020-2021: Application exploration period ● Language generation ability meets the basic application requirements ● Industry companies initially explore application scenarios for low-modal language generation tasks
2018-2019: Model exploration period ● The language generation model has clear paradigm ● Lay the technical foundation for low-modal language generation applications
D
Ⅲ
C
Ⅱ
B
Ⅰ
2017: Technology development period ● Propose Transformer architecture, Yingding technology foundation ● Only highly patterned language generation tasks can be completed
Before 2017 ● Weak language generation ability ● The application can only complete highly patterned language generation tasks
Rapid development period
Market start-up period
Exploration period
A
Mainstream language generation applications
Introduction
Application name
Benchmark universal language generation application, obtaining the best results in multiple language generation tasks such as text generation, text summary, text modification, natural language interaction, code generation, etc., and cooperating with many leading enterprises and institutions to explore language generation application scenarios
ChatGPT
Benchmarking the universal language generation application of ChatGPT, accessing Google search engine to optimize search experience, and accessing Google office product ecosystem
BARD
Provide interaction
Generate content
Usually written text content that is factual, functional or entertaining
Blog articles, news, emails, novels, codes
Generate a fixed format contract, etc.
It can assist in the creation of literary content and summarize various contents.
AIGC - Language Generation
definition
Language generation refers to the semantic probability model learned by neural networks that can generate languages according to task requirements, and the generated languages include natural language, programming language and logical language, etc.
The main types and application areas of language generation
Data Type
nature
Typical Applications
Generate common language
Have a lot of common domain knowledge and can complete different types of language generation tasks according to requirements
Voxel grid, point cloud and mesh
Vertical language generation
In addition to having certain common domain knowledge, there is also professional domain knowledge. Usually, the application pattern design is more in line with the requirements in professional domain application.
Financial report writing and analysis, etc.
Key stages of technological development in language generation
Key capabilities for commercialization of language generation applications
Typical industrial application scenarios for language generation
Risk of commercialization of language generation
Due to the basic nature of semantics, all kinds of applications can be decoupled and deconstructed from the semantic level. Therefore, many language generation applications may be quickly replaced or replaced due to technological advances and application design iterations, etc., and are difficult to maintain their commercial competitive advantages.
In the process of generating applications using languages, since many products and services are based on public cloud services or require uploading information to the vendor server, the risk of information leakage may occur.
Generating misinformation and harmful information may have a huge impact on brand reputation and product image, and therefore becomes a huge risk for the commercialization of language generation applications.
Technological innovation challenges
Scenario application implementation challenge
Copyright Challenge
At present, many application scenarios that have the opportunity to be commercialized by artificial intelligence three-dimensional generation, such as film production, product concept design, game three-dimensional asset production, etc., are still accepted by users in actual applications. The reason why three-dimensional generation is still used in this type of application scenario is that it is necessary to ensure the unity of the picture content from different perspectives, so three-dimensional generation still has its unique application value in these scenarios.
At present, many artificial intelligence three-dimensional generation applications still require a large amount of text data and two-dimensional image data as the basis for training models. If this data comes from copyrighted assets, then using this data for commercialization is prone to copyright issues.
03
02
01
The application scenarios of three-dimensional generation can be roughly divided into scenarios for professionals and scenarios for ordinary consumers. Scenarios for professionals require the ability of artificial intelligence to meet industrial line-level application requirements, such as high-quality generation and high controllability. However, application scenarios for ordinary consumers have relatively low requirements for the generation quality and controllability of three-dimensional generation of artificial intelligence, but applications for ordinary consumers generally have high requirements for generation efficiency.
Challenges for commercialization of 3D generation
Virtual reality
Educational training
Use 3D generation technology to create realistic virtual worlds and characters to enhance the realism and immersion of virtual reality.
Teachers and students use three-dimensional generation technology to better understand and learn complex scientific and technical knowledge, and improve teaching effectiveness and learning efficiency.
Movie and animation production
Art Design
Use 3D generation technology to create realistic 3D scenes and characters, and achieve complex visual effects to improve the quality and viewing of movies and animations.
Use three-dimensional generation technology to create digital artworks, digital sculptures and other creative works to improve the efficiency and expressiveness of creation.
Architectural Design
Healthcare
Use 3D generation technology to create architectural models and visual renderings faster, improving design efficiency and accuracy.
Use three-dimensional generation technology to create realistic human organ models and medical devices for use in areas such as medical education, surgical simulation and disease diagnosis.
Industrial Manufacturing
Use 3D generation technology to create parts and molds faster, improving production efficiency and accuracy and reducing manufacturing costs.
Use 3D generation technology to quickly create realistic 3D scenes and virtual characters to improve the realism and immersion of the game.
Game development
Typical industrial application scenarios for three-dimensional generation
Controllability
Strengthen instruction understanding Modeling work is separated from rendering work, and grid expression is required
Generation efficiency
Large calculations and slow generation speed Training generation requires high hardware requirements
Model fineness and accuracy Render resolution, accuracy Material expression accuracy
Generate quality
Key factors affecting the application capabilities of models
Magic3D Model
Implementation principle: First, a low-resolution, simple rendering hash grid 3D model is used to generate a low-resolution, simple rendering of hash grid 3D model, and then a higher quality rendering of the 3D model is used to use a method similar to traditional computer graphics. Pros and cons: Advantages: The three-dimensional model generated by Magic3D models has higher resolution, better rendering effect, and significantly improved generation efficiency. Disadvantages: Magic3D model has high demand for computing resources, long model training time, large impact on text description, and strong dependence on specific domain knowledge.
DreamFusion model
Implementation principle: It is mainly based on the diffusion model technology in deep learning, and combines the concepts of neural radiation fields (NeRF) and text-image diffusion model. Pros and cons: Advantages: It can generate high-quality and realistic 3D models from text descriptions, and supports multi-angle generation and optimization, improving the coherence and reality of 3D scenes. Disadvantages: It has a high dependence on hardware resources, and the generalization ability of the model needs to be improved.
CLIP-NeRF model
Implementation principle: The CLIP (Contrastive Language–Image Pre-training) model is introduced into the editing of NeRF (Neural Radiance Fields) to implement text- or image-guided NeRF modification. Pros and cons: Advantages: The CLIP-NeRF model pays more attention to adjusting the generated three-dimensional model and three-dimensional rendering effects in natural language or two-dimensional diagrams. Disadvantages: In terms of generation effect and commercial value, the CLIP-NeRF model has the same problems as the Dream Field model.
Implementation principle: Using CLIP's ability to generate from text to two-dimensional images, combined with NeRF's ability to learn three-dimensional structures and texture rendering from two-dimensional images, we can achieve generation from natural language to three-dimensional. Pros and cons: Advantages: The Dream Fields model proves that the CLIP model can be applied in combination with the NeRF model and breaks through the imagination limitations of previous three-dimensional generative models. Disadvantages: The structure of the three-dimensional content generated by the Dream Fields model is still relatively simple, and the three-dimensional rendering effect is poor, so that large-scale three-dimensional scenes cannot be generated. In addition, the generation efficiency of Dream Fields model is very low and has poor connection with traditional three-dimensional generation work, so it does not have commercial value.
Dream Fields Model
● Mainstream models:
Principles of mainstream model implementation and advantages and disadvantages
Two-dimensional dimension upgrading application exploration period 2022-To date
● Two-dimensional generation develops rapidly ● The two-dimensional dimension upgrade route is clear ● GAN still has applications
The development period of two-dimensional dimension upgrading technology 2020-2022
● Propose a neural radiation field ● Dimensional upgrading research accelerates ● GAN becomes the mainstream of three-dimensional
Two-dimensional dimensional germination period 2018-2020
● Propose three-dimensional expression of nerve field ● Dimensional upgrading research is developing slowly ● There are many native three-dimensional research
The key stage of technological development in three-dimensional generation
Typical Applications
Voxel grid, point cloud and mesh
Three-dimensional scene reconstruction and rendering
nature
Express the shape, structure and position of three-dimensional objects in an intuitive form
A three-dimensional scene expressed in neural network parameters, namely, a neural field
Implicit expression data
Dominant expression data
Data Type
The main types and application areas of 3D generation
Three-dimensional generation (artificial intelligence) refers to using deep neural networks to learn and generate three-dimensional models of objects or scenes, and on the basis of the three-dimensional model, giving colors and light to objects or scenes makes the generation result more realistic. In applications, the generation of a three-dimensional model of an object or scene is called three-dimensional modeling, and the color and light and shadow of a three-dimensional model are called three-dimensional rendering.
definition
AIGC - 3D generation