MindMap Gallery CFA Level 2 Machine Learning (1)
The definition of machine learning, classification of machine learning, and some concepts for rating the effectiveness of machine learning models.
Edited at 2020-01-05 03:14:41This is a mind map about bacteria, and its main contents include: overview, morphology, types, structure, reproduction, distribution, application, and expansion. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about plant asexual reproduction, and its main contents include: concept, spore reproduction, vegetative reproduction, tissue culture, and buds. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about the reproductive development of animals, and its main contents include: insects, frogs, birds, sexual reproduction, and asexual reproduction. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about bacteria, and its main contents include: overview, morphology, types, structure, reproduction, distribution, application, and expansion. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about plant asexual reproduction, and its main contents include: concept, spore reproduction, vegetative reproduction, tissue culture, and buds. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about the reproductive development of animals, and its main contents include: insects, frogs, birds, sexual reproduction, and asexual reproduction. The summary is comprehensive and meticulous, suitable as review materials.
machine learning (Machine Learning)
What is machine learning
machine learning seeks to extract knowledge from large amounts of data with no priori restrictive assumptions.
find the pattern, apply the pattern
high dimensionality, non-linearity
Classification
supervised learning (supervised learning)
Given a data set and given the correct answer (target Y and feature X have been calibrated), for each example in the data set, the algorithm will predict the "correct answer" for the example
Multiple linear regression is an example of supervised learning
Supervised learning applications
Regression
Regression problems focus on solving the prediction of continuous target variables and are suitable for large data sets containing a large number of numerical features, many of which are correlated. For example, use historical stock market returns to predict future stock price performance, or use a company's historical financial indicators to predict the probability of bond default.
Regression focuses on making predictions of continuous target variables.
Penalized regression
classification
Classification problems focus on classifying observations. When the dependent variable (target) is a categorical variable, the model that associates the results with independent variables (features) is called a "classifier". Examples include financial fraud or no fraud (two categories), rating assignment (multiple categories, sequential)
Classification focuses on sorting observations into distinct categories
unsupervised learning (unsupervised learning)
The given data set has no target results, and the algorithm discovers the structure between the data by itself, which is suitable for areas where the amount of data is too large or too complex for humans to observe directly.
Unsupervised learning applications
dimensionality
Reduce the number of features while retaining the differences in observed results. For example, in the field of investment and risk management, identify the main factors affecting asset prices.
A set of techniques for reducing in the number of features in a dataset while retaining variation across observations to preserve the information contained in that variation.
clustering
Categorize observational data, such as companies by financial metric characteristics rather than by industry or region.
Clustering has been used by asset managers to sort companies into empirically determined groupings (e.g., based on their financial statement data) rather than conventional groupings (e.g., based on sectors or countries).
Deep Learning and Reinforcement Learning (Deep Learning and Reinforcement Learning)
Based on neural networks (NNs, or ANNs)
deep learning
image classification, face recognition, speech recognition, natural language processing
Reinforcement Learning
Computers learn by interacting with themselves (or data formed by algorithms)
Machine learning algorithm selection
Evaluation of model performance
data set data set
training sample
validation sample
test sample
fitting
Overfitting
The in-sample fit is good, but it does not predict new out-of-sample data well. (Some noise or random fluctuations are accounted for in the model)
The evaluation of any ML algorithm focuses on the prediction error on new data rather than the goodness of fit on the data the algorithm was fit to (i.e., the training data).
Bias error
Algorithms with erroneous assumptions
underfitting, high in-sample error
Variance error
Unstable models pick up noise.
overfitting and high out-of-sample error.
Base error
Base error due to randomness in the data
underfitting
No appropriate relationship between data found
underfitting means the model does not capture the relationships in the data
robust fitting
Learning curves and fitting curves
learning curve
fitting curve
Prevent overfitting
Add overfitting penalty function overfitting penalty
1) preventing the algorithm from getting too complex during selection and training, which requires estimating an overfitting penalty
Occam’s razor Occam’s razor
cross-validation
2) proper data sampling achieved by using cross-validation, a technique for estimating out-of-sample error directly by determining the error in validation samples.