MindMap Gallery DA-DETR (In-depth interpretation of thesis ideas)
An in-depth interpretation of DA-DETR's article shows that the solution DA-DETR can use one-level detectors, use a single discriminator for inter-domain alignment networks, introduce a hybrid attention mechanism to determine the modules that respond to its functions, and simplify the domain adaptation channel.
Edited at 2023-07-30 10:35:14This is a mind map about bacteria, and its main contents include: overview, morphology, types, structure, reproduction, distribution, application, and expansion. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about plant asexual reproduction, and its main contents include: concept, spore reproduction, vegetative reproduction, tissue culture, and buds. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about the reproductive development of animals, and its main contents include: insects, frogs, birds, sexual reproduction, and asexual reproduction. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about bacteria, and its main contents include: overview, morphology, types, structure, reproduction, distribution, application, and expansion. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about plant asexual reproduction, and its main contents include: concept, spore reproduction, vegetative reproduction, tissue culture, and buds. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about the reproductive development of animals, and its main contents include: insects, frogs, birds, sexual reproduction, and asexual reproduction. The summary is comprehensive and meticulous, suitable as review materials.
DA-DETR
DA-DETR
question
The mainstream architecture faster R-CNN has many hyper-parameters, and the parameters that need to be manually adjusted are many and complex, making the feature alignment method too complicated.
What is faster R-CNN
Reference: https://zhuanlan.zhihu.com/p/31426458
Conv layers.
Conventional convolution, activation, and pooling operations
Region Proposal Networks.
K rectangular boxes are densely generated on the feature map, and softmax is used to classify activated rectangular boxes and inactive rectangular boxes.
bbox reg returns the activated rectangular box, and the proposal layer generates proposals (selecting the appropriate activated rectangular box)
Roy Pooling.
This layer collects input feature maps and proposals, and extracts proposal feature maps after synthesizing this information.
Since the rectangular frame is divided and maxpool processing is performed on each small square, the output of each activated rectangular frame is of a fixed size and is sent to the subsequent fully connected layer to determine the target category.
Classification.
Use proposal feature maps to calculate the category of the proposal
At the same time, bounding box regression is performed again to obtain the final precise position of the detection frame.
SolutionDA-DETR
Use a primary detector
Inter-domain aligned networks using a single discriminator
Introducing a hybrid attention mechanism to determine the modules that respond to its functions, simplifying the domain adaptation channel
Effect
Extensive experiments show that the new architecture has higher accuracy
Basic process
Hybrid Attention Module HAM
Function: Clear positioning of hard alignment features, simple and effective alignment
constitute
Coordinate attention module CAM
Embedding information locations into channel attention to find hard-aligned target features
What is channel attention?
Reference: https://zhuanlan.zhihu.com/p/350953067
Squeeze-and-Excitation Networks (CVPR article in 2018)
Perform adaptive average pooling on the feature map of the spatial dimension (convert the nxn feature map of each channel into a feature value of 1x1), and then learn the channel attention (normalized with Sigmoid) through two FCs. Use this The learned feature matrix is multiplied by the original matrix to obtain the features weighted in the spatial dimension.
Understanding: There are C feature maps of HxW, which are transformed into C 1x1 feature values through adaptive average pooling (at this time, there is FC learning), and then through Sigmoid normalization (the second FC learning comes), Weighted multiplication is applied to the C feature maps corresponding to HxW. Two FCs need to be learned in this transformation, which is called channel Attention.
CAM divides the backbone network features into two parts and fuses them with the transformer encoder latent features to obtain rich location and association information. Then all features are randomly connected to promote the cross-channel flow of information.
advantage
Unambiguously locate the corresponding features and use a single discriminator to achieve direct domain alignment.
Greatly simplifies the domain adaptive detection channel, producing better detection results
Horizontal attention module LAM
Aggregating attentional features across scales at the level of transformations
method
mission details
Mapping from source domain to target domain
Framework overview
DA-DETR structure
basic detector
Backbone network G
a transformer codec T
A discriminator Cd
Note the hybrid module HAM