MindMap Gallery Artificial Intelligence Machine Learning
Artificial intelligence basic learning series, machine learning essentials, interested students can collect the picture below.
Edited at 2020-11-12 23:52:59This is a mind map about bacteria, and its main contents include: overview, morphology, types, structure, reproduction, distribution, application, and expansion. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about plant asexual reproduction, and its main contents include: concept, spore reproduction, vegetative reproduction, tissue culture, and buds. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about the reproductive development of animals, and its main contents include: insects, frogs, birds, sexual reproduction, and asexual reproduction. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about bacteria, and its main contents include: overview, morphology, types, structure, reproduction, distribution, application, and expansion. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about plant asexual reproduction, and its main contents include: concept, spore reproduction, vegetative reproduction, tissue culture, and buds. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about the reproductive development of animals, and its main contents include: insects, frogs, birds, sexual reproduction, and asexual reproduction. The summary is comprehensive and meticulous, suitable as review materials.
Zhou Zhihua·Machine Learning
1. introduction
1.1. introduction
1.1.1. basic terminology
sample
every record
train
Learn from models
data set
collection of all records
dimension
Number of features of the sample
Generalization
The ability of machine learning models to apply to new samples
Classification
Predicted values are discrete values
return
Predicted values are continuous values
1.2. learning type
1.2.1. supervised learning
Tasks where the training data has labeled information
1.2.2. unsupervised learning
Learning tasks without labeled information in training data
1.3. hypothesis space
1.3.1. Reasoning means
induction
broad sense
Sample learning
narrow sense
concept formation
boolean concept
interpretation
1.4. inductive preference
1.4.1. Definition: The preference of a machine learning algorithm for certain types of hypotheses during the learning process
1.4.2. Method: Occam’s razor principle
1.5. development path
1.5.1. symbolism
decision tree
Logic based learning
1.5.2. connectionism
perceptron
1.5.3. statistical learning
Support Vector Machines
kernel method
1.6. applied learning
1.6.1. data analysis
1.6.2. unmanned
1.6.3. Understanding how humans learn
2. Model selection and evaluation
1. Empirical error and overfitting
1.1. Error: actual output/sample output
Training set: training error
Test set: Generalization error
1.2. fitting
Underfitting
Learning ability is too weak
overfitting
Too strong learning ability
2. assessment method
2.1. outflow method
mutually exclusive collection
2.2. cross validation method
K-fold cross validation
2.3. self-help method
Autonomous Sampling: Basics
2.4. Parameter adjustment and self-service model
3. Performance metrics
3.1. Measure model generalization ability
3.2. Common performance metrics
Error rate and precision
Be applicable
Two categories
Multiple categories
P: Precision rate
R: recall rate
contradict each other
F1
Weighted harmonic mean of P and R
ROC
receiver operating characteristic
AUC
ROC curve area
cost sensitive error rate
cost curve comparison test line
4. comparative test
4.1. hypothetical test
4.2. Cross-validation T-test
4.3. F test
4.4. Nenmeni test
5. Bias and Variance
5.1. deviation
own fitting ability
5.2. variance
Impact of data changes
conflict
3. linear model
3.1. Basic form
3.1.1. linear model
3.2. linear regression
3.2.1. Input attributes: 1
Typical: least squares method
Error between predicted value and true value and summed
MSE
Find the partial derivative and calculate the fitted straight line
3.2.2. Input attribute>1
Applicable matrix
3.2.3. Logarithmic linear regression
exponential change
3.3. log odds regression
3.3.1. Linear Regression Results: Approximating Log Odds
3.4. linear discriminant analysis
3.4.1. Bayesian theory explained
3.5. Multi-category learning
3.5.1. ECOC
coding
decoding
3.6. Category imbalance
3.6.1. The number of samples in different categories varies greatly
4. decision tree
4.1. Basic process
4.1.1. Features: Recursion
4.1.2. Decision-making based on tree structure
4.1.3. Purpose: To obtain a tree with strong generalization ability
4.2. Divide selection
4.2.1. information gain
Information entropy: sample collection purity indicator
4.2.2. Gain rate
Optimal partitioning attributes
4.2.3. Gini Coefficient
Inconsistency probability
4.3. Pruning
4.3.1. pre-pruning
Correspondence: overfitting
4.3.2. post-pruning
4.4. Continuous and missing values
4.4.1. Continuous value processing
dichotomy
4.4.2. Missing value handling
4.5. multivariate decision tree
4.5.1. Build a suitable linear classifier
5. Neural Networks
5.1. Neurons
5.1.1. Basic components: neurons
5.1.2. Processing: activation function
5.2. Perceptron multi-layer network
5.2.1. Perceptron: 2 layers of neurons
Layer 1: functional neurons
Unsolvable: nonlinear problems
XOR problem: 2-layer perceptron
5.3. Error back propagation algorithm
5.3.1. More powerful algorithm (BP)
Standard BP
Target 1 set of data each time
Open question: Setting the number of neurons
trial and error
Accumulated BP
Minimize cumulative error
Alleviating overfitting
Stop early
Regularization
5.4. Global minimum and local minimum
5.4.1. global minimum
genetic algorithm
A point in parameter space with zero gradient, as long as its error function value is smaller than the error function of neighboring points value is the local minimum point
5.5. Other common neural networks
5.5.1. RBF
Determine neuron center
Determine parameters using BP algorithm, etc.
5.5.2. ART
5.5.3. SOM
5.5.4. Cascading related networks
5.5.5. ELMAN
5.5.6. Boltzmann machine
5.6. deep learning
5.6.1. multilayer neural network
6. Support Vector Machines
6.1. Margins and support vectors
6.1.1. interval
The sum of distances from two heterogeneous support vectors to the hyperplane
subtopic
6.1.2. support vector
Maximize Margin: Support Vector Machine
Definition: The closest point to the hyperplane
6.2. dual problem
6.2.1. Quadratic planning problem
6.2.2. Samples are not retained after training
6.3. kernel function
6.3.1. The kernel matrix corresponding to the symmetric function is positive semi-definite
6.4. Soft margins and regularization
6.4.1. Vector machine error: some samples
6.4.2. 3 alternative loss functions
hinge loss
exponential loss
rate loss
6.4.3. Regularization
Minimize the risk of overfitting by training error
6.5. support vector regression
6.6. kernel method
7. Bayesian classification
7.1. Bayesian Decision Theory
7.1.1. probabilistic decision making method
Select the optimal class based on probability and misclassification loss
7.2. maximum likelihood estimation
7.2.1. Advantages: Conditional probability estimation is simple
7.2.2. Disadvantages: Reliance on hypothetical probabilities
7.3. Naive Bayes classifier
7.3.1. to estimate the class prior probability
7.3.2. Estimate the conditional probability for each attribute
7.4. .4 Semi-Naive Bayes Classifier
7.4.1. Relax the attribute conditional independence assumption to a certain extent
7.5. Bayesian net
7.5.1. composition
structure
parameter
7.5.2. Features
Directed acyclic graph depicts dependencies
Conditional probability table: attribute probability distribution
7.5.3. Learn: Data Compression
7.5.4. infer
7.5.5. EM algorithm
8. Ensemble learning
8.1. Definition: Build multiple learners
8.1.1. Individual learner generation method
There are dependencies
No dependencies exist
8.1.2. How to: Use Strategies
8.2. boosting
8.2.1. weak learning→strong learning
adaboosting
8.2.2. Purpose: Reduce bias
8.3. bagging
8.3.1. Parallel ensemble learning method
8.3.2. Goal: reduce variance
8.4. random forest
8.4.1. Basics: Decision Tree
8.4.2. Random attributes
8.5. Combining strategies
8.5.1. advantage
Reduce poor generalization capabilities
Reduce the risk of getting stuck in bad local minima
get a good approximation
8.5.2. method
averaging method
voting laws
learning method
8.6. Diversity
8.6.1. Error-divergence decomposition
8.6.2. Diversity measure
8.6.3. Increased diversity
disturbance
data
Attributes
express
parameter
9. clustering
9.1. definition
9.1.1. unsupervised learning
9.1.2. Explain the inner relationship of data
9.2. Performance metrics
9.2.1. external indicators
9.2.2. internal indicators
9.3. distance calculation
9.4. prototype clustering
9.4.1. Iterate first
9.4.2. post-initialization
9.4.3. method
K-means algorithm
Learning vector quantization
Gaussian Mixture Clustering
density clustering
9.5. density clustering
9.5.1. According to the tightness of the sample
9.6. hierarchical clustering
10. Dimensionality reduction and metric learning
10.1. k-neighbor learning
10.1.1. Supervised learning methods
10.1.2. Problem: No explicit training process
10.2. low dimensional embedding
10.2.1. Cause: Curse of Dimensionality
10.3. Principal component analysis
10.3.1. recent reconstruction
10.3.2. maximum divisibility
10.4. Kernelized Linear Dimensionality Reduction
10.5. manifold learning
10.5.1. Equimetric mapping
Specify the number of adjacent points
10.5.2. local linear embedding
Maintain linear relationships within the local area
10.6. Metric learning
10.6.1. Purpose: low-dimensional space
11. Feature selection and sparse learning
11.1. Search and rate yourself
11.1.1. Features: Properties
11.1.2. Selecting a subset of features: feature selection
11.1.3. Feature selection
Data preprocessing method
11.1.4. Purpose
Avoid the curse of dimensionality
Reduce learning difficulty
11.1.5. subset search
11.1.6. Evaluate yourself
11.2. Feature selection method
11.2.1. filter type
11.2.2. Embedded
11.2.3. wrapped
11.3. Filter selection
11.3.1. Related statistics
11.3.2. Relief
Determine relevant statistics
11.4. Wrapped design
11.5. Embedded selection and regularization
11.5.1. Feature selection during training
11.5.2. Regularization
Alleviating overfitting
11.6. sparse representation
11.6.1. Feature selection removes irrelevant columns
11.7. dictionary learning
11.7.1. sparse coding
11.8. compressed sensing
11.8.1. Exploiting signal sparsity
11.8.2. Example: Reading hobby
12. computational learning theory
12.1. basic knowledge
12.1.1. Nature: Difficulty in analysis
12.2. finite hypothesis space
12.2.1. basic concept
Probability is approximately correct
12.3. finite hypothesis space
12.3.1. separable situations
12.3.2. indivisible situation
12.3.3. VC dimension
Current Situation: Infinite Hypothesis Space
12.3.4. Several basic concepts
growth function
Divide in half
break up
12.3.5. Rademacher
12.4. stability
12.4.1. According to the characteristics of the algorithm itself
12.4.2. Review the generalization error sister
13. semi-supervised learning
13.1. Unlabeled sample
13.1.1. Real demand is strong
13.2. type
13.2.1. Pure semi-supervised learning
13.2.2. direct learning
13.3. method
13.3.1. clustering hypothesis
13.3.2. manifold hypothesis
13.4. generative methods
13.4.1. Hypothesis: Potentially identical model generation
13.5. Semi-supervised SVM
13.5.1. Assumption: low-density segmentation
13.5.2. Typical method: TVSM
2 classification methods
13.6. Graph semi-supervised learning
13.6.1. matrix based
13.6.2. color diffusion
13.6.3. question
Cannot be calculated on a large scale
Need data reconstruction
13.7. disagreement-based approach
13.7.1. Less affected
13.7.2. multiple learners
13.8. semi-supervised clustering
13.8.1. Obtain supervisory information
13.8.2. Type of supervision
Must connect
Belong to the same race
Misconnection
Not of the same race
14. Probabilistic graphical model
14.1. Hidden Markov Model
14.1.1. Dynamic Bayesian Nets
State variables
Flexible variables
14.1.2. Use graphs to express probability relationships
14.2. Markov random field
14.2.1. undirected graph model
local
independent of other variables
in pairs
adjacency variable
14.3. conditional random field
14.3.1. Discriminant
14.3.2. undirected graph model
14.4. learning and judgment
14.4.1. maximum likelihood estimation
14.4.2. Probability graph inference method
Accurate inference
Approximate inference
14.4.3. Change disappears
Graphical models reduce computational effort
shortcoming
redundant calculations
14.4.4. belief spread
14.5. Approximate inference
14.5.1. MCMC sampling
14.5.2. variational inference
Approximate posterior distribution
Inference from known distribution
14.6. topic model
14.6.1. Generative
14.6.2. directed graph
14.6.3. Typical representative
LDA
15. Rule learning
15.1. Rule definition
15.1.1. Clear semantics
15.1.2. objective laws
15.1.3. Nature
greedy search
15.2. sequential coverage
15.2.1. Target
Data coverage
15.2.2. method
top down
bottom down
15.3. Pruning optimization
15.3.1. Statistical significance test
15.4. First-order rule learning
15.4.1. Propositional logical expression
15.5. inductive logic programming
15.5.1. Features
expressive
Generalization based on background
15.5.2. minimal generalization
15.5.3. Inverse reduction
Need to be replaced
16. reinforcement learning
16.1. Tasks and rewards
16.1.1. reinforcement learning
repeated many times
Keep summarizing
16.2. K-rocker gambling machine
16.2.1. Explore and exploit
reward for each action
The action with the greatest reward
16.2.2. e-greedy
16.2.3. softmax
16.3. Learning with models
16.3.1. Strategy evaluation
16.3.2. Strategy improvements
Strategy iteration
value iteration
16.4. Model-free learning
16.4.1. The problem cannot be evaluated
16.4.2. Monte Carlo reinforcement learning
Not utilizing MDP
16.4.3. temporal difference learning
16.5. value function approximation
16.5.1. state-action function
Get strategy
16.6. imitation learning
16.6.1. direct imitation learning
16.6.2. inverse reinforcement learning
Application: reward function
Example function optimal