Features
Features:

Product Tour >

Edraw AI >

Paid Plans:

Individuals >

Business >

Eduaction >
Resources
Blog

History

How-tos & Tips

Discovery

Biography

Business Analysis

Examples

AI concept Map

Free AI Mind Map Generator

Onenote Mind Map

Bcg Matrix Examples

Nike Marketing Strategy

Unilever SWOT Analysis

Make Mind Maps in Google Docs

Guide

FAQs

What's New

Resource Center
Templates
All Templates

Brain Storming Templates

Strategy and Planning Templates

Project Management Templates

Product Management Templates

Human Resources Templates

Agile Workflow Templates

Marketing Templates

Education Templates

Fun and Games Templates

User Gallery
Download
Pricing
Enterprise

MindMap Gallery Artificial Intelligence Machine Learning

Artificial Intelligence Machine Learning

Artificial intelligence basic learning series, machine learning essentials, interested students can collect the picture below.

Edited at 2020-11-12 23:52:59

PlotWizard

Recent works View more works>>

Artificial Intelligence Machine Learning

PlotWizard

Recent works View more works>>

Recommended to you
Outline

Zhou Zhihua·Machine Learning

1. introduction

1.1. introduction

1.1.1. basic terminology

sample

every record

train

Learn from models

data set

collection of all records

dimension

Number of features of the sample

Generalization

The ability of machine learning models to apply to new samples

Classification

Predicted values are discrete values

return

Predicted values are continuous values

1.2. learning type

1.2.1. supervised learning

Tasks where the training data has labeled information

1.2.2. unsupervised learning

Learning tasks without labeled information in training data

1.3. hypothesis space

1.3.1. Reasoning means

induction

broad sense

Sample learning

narrow sense

concept formation

boolean concept

interpretation

1.4. inductive preference

1.4.1. Definition: The preference of a machine learning algorithm for certain types of hypotheses during the learning process

1.4.2. Method: Occam’s razor principle

1.5. development path

1.5.1. symbolism

decision tree

Logic based learning

1.5.2. connectionism

perceptron

1.5.3. statistical learning

Support Vector Machines

kernel method

1.6. applied learning

1.6.1. data analysis

1.6.2. unmanned

1.6.3. Understanding how humans learn

2. Model selection and evaluation

1. Empirical error and overfitting

1.1. Error: actual output/sample output

Training set: training error

Test set: Generalization error

1.2. fitting

Underfitting

Learning ability is too weak

overfitting

Too strong learning ability

2. assessment method

2.1. outflow method

mutually exclusive collection

2.2. cross validation method

K-fold cross validation

2.3. self-help method

Autonomous Sampling: Basics

2.4. Parameter adjustment and self-service model

3. Performance metrics

3.1. Measure model generalization ability

3.2. Common performance metrics

Error rate and precision

Be applicable

Two categories

Multiple categories

P: Precision rate

R: recall rate

contradict each other

Weighted harmonic mean of P and R

ROC

receiver operating characteristic

AUC

ROC curve area

cost sensitive error rate

cost curve comparison test line

4. comparative test

4.1. hypothetical test

4.2. Cross-validation T-test

4.3. F test

4.4. Nenmeni test

5. Bias and Variance

5.1. deviation

own fitting ability

5.2. variance

Impact of data changes

conflict

3. linear model

3.1. Basic form

3.1.1. linear model

3.2. linear regression

3.2.1. Input attributes: 1

Typical: least squares method

Error between predicted value and true value and summed

MSE

Find the partial derivative and calculate the fitted straight line

3.2.2. Input attribute>1

Applicable matrix

3.2.3. Logarithmic linear regression

exponential change

3.3. log odds regression

3.3.1. Linear Regression Results: Approximating Log Odds

3.4. linear discriminant analysis

3.4.1. Bayesian theory explained

3.5. Multi-category learning

3.5.1. ECOC

coding

decoding

3.6. Category imbalance

3.6.1. The number of samples in different categories varies greatly

4. decision tree

4.1. Basic process

4.1.1. Features: Recursion

4.1.2. Decision-making based on tree structure

4.1.3. Purpose: To obtain a tree with strong generalization ability

4.2. Divide selection

4.2.1. information gain

Information entropy: sample collection purity indicator

4.2.2. Gain rate

Optimal partitioning attributes

4.2.3. Gini Coefficient

Inconsistency probability

4.3. Pruning

4.3.1. pre-pruning

Correspondence: overfitting

4.3.2. post-pruning

4.4. Continuous and missing values

4.4.1. Continuous value processing

dichotomy

4.4.2. Missing value handling

4.5. multivariate decision tree

4.5.1. Build a suitable linear classifier

5. Neural Networks

5.1. Neurons

5.1.1. Basic components: neurons

5.1.2. Processing: activation function

5.2. Perceptron multi-layer network

5.2.1. Perceptron: 2 layers of neurons

Layer 1: functional neurons

Unsolvable: nonlinear problems

XOR problem: 2-layer perceptron

5.3. Error back propagation algorithm

5.3.1. More powerful algorithm (BP)

Standard BP

Target 1 set of data each time

Open question: Setting the number of neurons

trial and error

Accumulated BP

Minimize cumulative error

Alleviating overfitting

Stop early

Regularization

5.4. Global minimum and local minimum

5.4.1. global minimum

genetic algorithm

A point in parameter space with zero gradient, as long as its error function value is smaller than the error function of neighboring points value is the local minimum point

5.5. Other common neural networks

5.5.1. RBF

Determine neuron center

Determine parameters using BP algorithm, etc.

5.5.2. ART

5.5.3. SOM

5.5.4. Cascading related networks

5.5.5. ELMAN

5.5.6. Boltzmann machine

5.6. deep learning

5.6.1. multilayer neural network

6. Support Vector Machines

6.1. Margins and support vectors

6.1.1. interval

The sum of distances from two heterogeneous support vectors to the hyperplane

subtopic

6.1.2. support vector

Maximize Margin: Support Vector Machine

Definition: The closest point to the hyperplane

6.2. dual problem

6.2.1. Quadratic planning problem

6.2.2. Samples are not retained after training

6.3. kernel function

6.3.1. The kernel matrix corresponding to the symmetric function is positive semi-definite

6.4. Soft margins and regularization

6.4.1. Vector machine error: some samples

6.4.2. 3 alternative loss functions

hinge loss

exponential loss

rate loss

6.4.3. Regularization

Minimize the risk of overfitting by training error

6.5. support vector regression

6.6. kernel method

7. Bayesian classification

7.1. Bayesian Decision Theory

7.1.1. probabilistic decision making method

Select the optimal class based on probability and misclassification loss

7.2. maximum likelihood estimation

7.2.1. Advantages: Conditional probability estimation is simple

7.2.2. Disadvantages: Reliance on hypothetical probabilities

7.3. Naive Bayes classifier

7.3.1. to estimate the class prior probability

7.3.2. Estimate the conditional probability for each attribute

7.4. .4 Semi-Naive Bayes Classifier

7.4.1. Relax the attribute conditional independence assumption to a certain extent

7.5. Bayesian net

7.5.1. composition

structure

parameter

7.5.2. Features

Directed acyclic graph depicts dependencies

Conditional probability table: attribute probability distribution

7.5.3. Learn: Data Compression

7.5.4. infer

7.5.5. EM algorithm

8. Ensemble learning

8.1. Definition: Build multiple learners

8.1.1. Individual learner generation method

There are dependencies

No dependencies exist

8.1.2. How to: Use Strategies

8.2. boosting

8.2.1. weak learning→strong learning

adaboosting

8.2.2. Purpose: Reduce bias

8.3. bagging

8.3.1. Parallel ensemble learning method

8.3.2. Goal: reduce variance

8.4. random forest

8.4.1. Basics: Decision Tree

8.4.2. Random attributes

8.5. Combining strategies

8.5.1. advantage

Reduce poor generalization capabilities

Reduce the risk of getting stuck in bad local minima

get a good approximation

8.5.2. method

averaging method

voting laws

learning method

8.6. Diversity

8.6.1. Error-divergence decomposition

8.6.2. Diversity measure

8.6.3. Increased diversity

disturbance

data

Attributes

express

parameter

9. clustering

9.1. definition

9.1.1. unsupervised learning

9.1.2. Explain the inner relationship of data

9.2. Performance metrics

9.2.1. external indicators

9.2.2. internal indicators

9.3. distance calculation

9.4. prototype clustering

9.4.1. Iterate first

9.4.2. post-initialization

9.4.3. method

K-means algorithm

Learning vector quantization

Gaussian Mixture Clustering

density clustering

9.5. density clustering

9.5.1. According to the tightness of the sample

9.6. hierarchical clustering

10. Dimensionality reduction and metric learning

10.1. k-neighbor learning

10.1.1. Supervised learning methods

10.1.2. Problem: No explicit training process

10.2. low dimensional embedding

10.2.1. Cause: Curse of Dimensionality

10.3. Principal component analysis

10.3.1. recent reconstruction

10.3.2. maximum divisibility

10.4. Kernelized Linear Dimensionality Reduction

10.5. manifold learning

10.5.1. Equimetric mapping

Specify the number of adjacent points

10.5.2. local linear embedding

Maintain linear relationships within the local area

10.6. Metric learning

10.6.1. Purpose: low-dimensional space

11. Feature selection and sparse learning

11.1. Search and rate yourself

11.1.1. Features: Properties

11.1.2. Selecting a subset of features: feature selection

11.1.3. Feature selection

Data preprocessing method

11.1.4. Purpose

Avoid the curse of dimensionality

Reduce learning difficulty

11.1.5. subset search

11.1.6. Evaluate yourself

11.2. Feature selection method

11.2.1. filter type

11.2.2. Embedded

11.2.3. wrapped

11.3. Filter selection

11.3.1. Related statistics

11.3.2. Relief

Determine relevant statistics

11.4. Wrapped design

11.5. Embedded selection and regularization

11.5.1. Feature selection during training

11.5.2. Regularization

Alleviating overfitting

11.6. sparse representation

11.6.1. Feature selection removes irrelevant columns

11.7. dictionary learning

11.7.1. sparse coding

11.8. compressed sensing

11.8.1. Exploiting signal sparsity

11.8.2. Example: Reading hobby

12. computational learning theory

12.1. basic knowledge

12.1.1. Nature: Difficulty in analysis

12.2. finite hypothesis space

12.2.1. basic concept

Probability is approximately correct

12.3. finite hypothesis space

12.3.1. separable situations

12.3.2. indivisible situation

12.3.3. VC dimension

Current Situation: Infinite Hypothesis Space

12.3.4. Several basic concepts

growth function

Divide in half

break up

12.3.5. Rademacher

12.4. stability

12.4.1. According to the characteristics of the algorithm itself

12.4.2. Review the generalization error sister

13. semi-supervised learning

13.1. Unlabeled sample

13.1.1. Real demand is strong

13.2. type

13.2.1. Pure semi-supervised learning

13.2.2. direct learning

13.3. method

13.3.1. clustering hypothesis

13.3.2. manifold hypothesis

13.4. generative methods

13.4.1. Hypothesis: Potentially identical model generation

13.5. Semi-supervised SVM

13.5.1. Assumption: low-density segmentation

13.5.2. Typical method: TVSM

2 classification methods

13.6. Graph semi-supervised learning

13.6.1. matrix based

13.6.2. color diffusion

13.6.3. question

Cannot be calculated on a large scale

Need data reconstruction

13.7. disagreement-based approach

13.7.1. Less affected

13.7.2. multiple learners

13.8. semi-supervised clustering

13.8.1. Obtain supervisory information

13.8.2. Type of supervision

Must connect

Belong to the same race

Misconnection

Not of the same race

14. Probabilistic graphical model

14.1. Hidden Markov Model

14.1.1. Dynamic Bayesian Nets

State variables

Flexible variables

14.1.2. Use graphs to express probability relationships

14.2. Markov random field

14.2.1. undirected graph model

local

independent of other variables

in pairs

adjacency variable

14.3. conditional random field

14.3.1. Discriminant

14.3.2. undirected graph model

14.4. learning and judgment

14.4.1. maximum likelihood estimation

14.4.2. Probability graph inference method

Accurate inference

Approximate inference

14.4.3. Change disappears

Graphical models reduce computational effort

shortcoming

redundant calculations

14.4.4. belief spread

14.5. Approximate inference

14.5.1. MCMC sampling

14.5.2. variational inference

Approximate posterior distribution

Inference from known distribution

14.6. topic model

14.6.1. Generative

14.6.2. directed graph

14.6.3. Typical representative

LDA

15. Rule learning

15.1. Rule definition

15.1.1. Clear semantics

15.1.2. objective laws

15.1.3. Nature

greedy search

15.2. sequential coverage

15.2.1. Target

Data coverage

15.2.2. method

top down

bottom down

15.3. Pruning optimization

15.3.1. Statistical significance test

15.4. First-order rule learning

15.4.1. Propositional logical expression

15.5. inductive logic programming

15.5.1. Features

expressive

Generalization based on background

15.5.2. minimal generalization

15.5.3. Inverse reduction

Need to be replaced

16. reinforcement learning

16.1. Tasks and rewards

16.1.1. reinforcement learning

repeated many times

Keep summarizing

16.2. K-rocker gambling machine

16.2.1. Explore and exploit

reward for each action

The action with the greatest reward

16.2.2. e-greedy

16.2.3. softmax

16.3. Learning with models

16.3.1. Strategy evaluation

16.3.2. Strategy improvements

Strategy iteration

value iteration

16.4. Model-free learning

16.4.1. The problem cannot be evaluated

16.4.2. Monte Carlo reinforcement learning

Not utilizing MDP

16.4.3. temporal difference learning

16.5. value function approximation

16.5.1. state-action function

Get strategy

16.6. imitation learning

16.6.1. direct imitation learning

16.6.2. inverse reinforcement learning

Application: reward function

Example function optimal