MindMap Gallery Transformer and sequence-to-sequence (Seq2Seq)
A more detailed explanation of Transformer and Seq2Seq. Note that it only includes explanations of NLP, not ViT's transformer.
Edited at 2023-07-30 10:32:52This is a mind map about bacteria, and its main contents include: overview, morphology, types, structure, reproduction, distribution, application, and expansion. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about plant asexual reproduction, and its main contents include: concept, spore reproduction, vegetative reproduction, tissue culture, and buds. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about the reproductive development of animals, and its main contents include: insects, frogs, birds, sexual reproduction, and asexual reproduction. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about bacteria, and its main contents include: overview, morphology, types, structure, reproduction, distribution, application, and expansion. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about plant asexual reproduction, and its main contents include: concept, spore reproduction, vegetative reproduction, tissue culture, and buds. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about the reproductive development of animals, and its main contents include: insects, frogs, birds, sexual reproduction, and asexual reproduction. The summary is comprehensive and meticulous, suitable as review materials.
Transformer and Sequence to sequence (Seq2Seq)
sequence to sequence
Sequence-to-sequence (or Seq2Seq) is a neural network that converts a given sequence of elements (such as a sequence of words in a sentence) into another sequence
Seq2Seq models are particularly good at translation, i.e. converting a sequence of words in one language into a different sequence of words in another language
A very basic option for the encoder and decoder of a Seq2Seq model is to use a long short-term memory (LSTM) model each
LSTM module
A recurrent network module (note this information)
Can use sequence correlation data to give meaning to the sequence while remembering (or forgetting) the parts it deems important (or not)
For example, sentences are order dependent because the order of words is crucial to understanding the sentence. LSTM is a natural choice for this type of data
The encoder takes the input sequence and maps it into a higher dimensional space (n-dimensional vectors). This abstract vector is fed into the decoder, which converts it into an output sequence. The output sequence can be another language, notation, a copy of the input, etc.
Note that "I am very happy" and "I am very happy" are both sequences. Now seq2seq needs to be converted into Chinese and English.
To put it more vividly, after the encoder encodes the sentence "I am very happy" to 1000110110, the decoder can convert this string of numbers into "I am very happy"
Attention (attention mechanism)
The attention mechanism looks at the input sequence and decides at each step which other parts of the sequence are important (similar to humans)
Similar to when reading this article, you always focus on the words you read, but at the same time your brain still retains the important keywords in the text to provide context
For example, "The price of a shirt is 9 yuan and 25". You will think that "shirt" and "9 yuan and 25" are more important. This is the result of paying attention to several keywords in this sentence.
For a given sequence, the attention mechanism knows which words are the key information in the sentence, then the translation becomes very simple. Even if a small part of the information is ignored, the general meaning will be the same.
Summarize
Every time the LSTM (encoder) reads an input, the attention mechanism will consider several other inputs at the same time and assign different weights to them according to the situation to decide which inputs are important.
The decoder then takes as input the encoded sentence and the weights provided by the attention mechanism
Transformer
The paper "Attention Is All You Need" describes Transformer
Compared with previous work, it does not have any recurrent network, and only uses the attention mechanism to improve the results of any translation task.
explain
Encoder is on the left and decoder is on the right
are composed of stackable modules
Inputs first enter the pink box "input embedding". This is to split the sequence into each phrase and map the phrases.
For example, "I am a handsome guy" ---> I am a handsome guy, three phrases, each phrase maps to 0, 1, 2, then the input into the network is "012" like this
The white Tai Chi diagram is position embedding, which means that the phrases you input have a first-come, first-served basis, and additional position information must be assigned to this phrase.
For example, "I am a handsome guy", then three phrases are input into the network "012", and there is a position code, also in order, 001002003, which corresponds to the position information of the sequence.
Multi-Head Attention module (the highlight)
Expand as shown above
The most important principle is here. Simply put, it is to input a sequence, and then average the attention through three parallel attentions. It is similar to three people voting on one thing. After all, three stooges are the same. Zhuge Liang. Usually there are eight stooges here
The specific mathematical principles need to be learned by yourself and will not be expanded upon here.
Feed Forward Feed Forward Network
This thing is an ordinary network structure, such as full connection, convolution and other operations. I have mentioned it before. I won’t go into details. You can see it for yourself.