MindMap Gallery Multimodal large model technology system
Describe key technologies for multimodal too-modal models, including pre-training data collection, basic model construction, self-supervised learning and model optimization training, and downstream tasks fine-tuning.
Edited at 2025-01-05 13:43:37Rumi: 10 dimensions of spiritual awakening. When you stop looking for yourself, you will find the entire universe because what you are looking for is also looking for you. Anything you do persevere every day can open a door to the depths of your spirit. In silence, I slipped into the secret realm, and I enjoyed everything to observe the magic around me, and didn't make any noise. Why do you like to crawl when you are born with wings? The soul has its own ears and can hear things that the mind cannot understand. Seek inward for the answer to everything, everything in the universe is in you. Lovers do not end up meeting somewhere, and there is no parting in this world. A wound is where light enters your heart.
Chronic heart failure is not just a problem of the speed of heart rate! It is caused by the decrease in myocardial contraction and diastolic function, which leads to insufficient cardiac output, which in turn causes congestion in the pulmonary circulation and congestion in the systemic circulation. From causes, inducement to compensation mechanisms, the pathophysiological processes of heart failure are complex and diverse. By controlling edema, reducing the heart's front and afterload, improving cardiac comfort function, and preventing and treating basic causes, we can effectively respond to this challenge. Only by understanding the mechanisms and clinical manifestations of heart failure and mastering prevention and treatment strategies can we better protect heart health.
Ischemia-reperfusion injury is a phenomenon that cellular function and metabolic disorders and structural damage will worsen after organs or tissues restore blood supply. Its main mechanisms include increased free radical generation, calcium overload, and the role of microvascular and leukocytes. The heart and brain are common damaged organs, manifested as changes in myocardial metabolism and ultrastructural changes, decreased cardiac function, etc. Prevention and control measures include removing free radicals, reducing calcium overload, improving metabolism and controlling reperfusion conditions, such as low sodium, low temperature, low pressure, etc. Understanding these mechanisms can help develop effective treatment options and alleviate ischemic injury.
Rumi: 10 dimensions of spiritual awakening. When you stop looking for yourself, you will find the entire universe because what you are looking for is also looking for you. Anything you do persevere every day can open a door to the depths of your spirit. In silence, I slipped into the secret realm, and I enjoyed everything to observe the magic around me, and didn't make any noise. Why do you like to crawl when you are born with wings? The soul has its own ears and can hear things that the mind cannot understand. Seek inward for the answer to everything, everything in the universe is in you. Lovers do not end up meeting somewhere, and there is no parting in this world. A wound is where light enters your heart.
Chronic heart failure is not just a problem of the speed of heart rate! It is caused by the decrease in myocardial contraction and diastolic function, which leads to insufficient cardiac output, which in turn causes congestion in the pulmonary circulation and congestion in the systemic circulation. From causes, inducement to compensation mechanisms, the pathophysiological processes of heart failure are complex and diverse. By controlling edema, reducing the heart's front and afterload, improving cardiac comfort function, and preventing and treating basic causes, we can effectively respond to this challenge. Only by understanding the mechanisms and clinical manifestations of heart failure and mastering prevention and treatment strategies can we better protect heart health.
Ischemia-reperfusion injury is a phenomenon that cellular function and metabolic disorders and structural damage will worsen after organs or tissues restore blood supply. Its main mechanisms include increased free radical generation, calcium overload, and the role of microvascular and leukocytes. The heart and brain are common damaged organs, manifested as changes in myocardial metabolism and ultrastructural changes, decreased cardiac function, etc. Prevention and control measures include removing free radicals, reducing calcium overload, improving metabolism and controlling reperfusion conditions, such as low sodium, low temperature, low pressure, etc. Understanding these mechanisms can help develop effective treatment options and alleviate ischemic injury.
Multimodal large model Technical system
Pre-training data collection
Source of data
Public data sets (such as Wikipedia, encyclopedia of newspapers, online forums, social platforms, etc.).
Enterprise internal data sets (such as internal logs, documents, databases).
Self-collected data sets (through network crawlers, API interfaces, etc.).
Data cleaning
Deduplication (remove duplicate samples), denoising (filtering out meaningless data), unified format (filtering out meaningless data, such as advertising, spelling errors, etc.), repairing data (correcting errors in data, such as spelling Errors, etc.).
Data annotation
The label types include text labeling (such as naming entity recognition, sentiment analysis, etc.) and image labeling (such as object bounding boxes, image classification labels, etc.). Label quality is crucial, and it is usually preliminarily marked with automated tools, followed by manual review and correction to ensure consistency of labels.
Application of pre-trained models
The pre-trained model learns a common language model by training on a large-scale text corpus. These models can be fine-tuned on different tasks to suit specific needs.
Network structure design
Process images and text
Transformer or CNN is usually used to capture the complex relationship between vision and language.
Event flow
Pulse neural networks are more suitable and can effectively simulate the timing dynamics of information.
With the language model as the core
DeepMind Flamingo visual language model, KOSMOS-1 connects the Transformer with the visual perception module, and ChatBridge.
Self-supervised learning optimization
Mask Language Modeling (MCM): Some words or markers in the input sequence are replaced with special mask markers, and then the pretrained model is required to predict these masked words or markers based on the visible multimodal context. .
Mask Image Modeling (MIM): Some areas in the input image are hidden or replaced with special mask marks, and then the pre-trained model is required to see only other modal information such as the rest of the image content and text. Next, predict or restore the obscured image area.
Image-Text Match (ITM): Implement global alignment of images and text. Usually, a given picture and text pair is used as a positive sample, then pair it as a negative sample, and then the matching of the image and text is achieved through a binary classification method, thereby establishing a semantic relationship between the image and the text.
Image-Text Comparison Learning (ITC): Use contrast learning to draw closer vector representations of the same sample pairs of images and text, and push different sample pairs of vector representations farther, thereby enhancing the semantic correlation between images and text.
Downstream task fine-tuning adaptation
Task-specific model fine-tuning adaptation: The weights of the multimodal large model are used as initial parameters and supervised fine-tuning is performed on task-specific data. With this fine-tuning, the model will learn fine-grained features and representations for specific tasks, thereby adapting to the requirements of specific tasks.
Fine-tuning adaptation of model for joint prompt learning: design a template that fits the upstream pre-training task, tap the potential of the upstream pre-training model, and allow the upstream pre-training model to complete downstream tasks better without the need to label data. Prompt learning allows the reuse of pre-trained models on different types of tasks, and can adapt to specific tasks simply by modifying the prompt template, saving training time and computing resources.
Adapter network-based model fine-tuning adaptation: Each task has its own independent adapter layer, so that the model can share the representation of a common pre-trained model among different tasks, while making personalized adjustments on each task. Adapter layers are usually composed of fewer parameters, so they are more efficient than fine-tuning throughout the model. During training, the parameters of the pretrained model are fixed, and only the parameters of the adapter layer are updated.