Machine Learning - A Probabilistic Perspective
This mind map is about the reading notes, which is Machine Learning.
Tags:
Similar Mind Maps
Outline


Types
Supervised Learning
Classification
binary classification
multiclass classification
Regression
Unsupervised Learning
Reinforcement Learning
Concepts
Parametric vs non-parametric models
The curse of dimensionality
Overfitting
Model selection
cross validation (CV)
No free lunch theorem

Interpretations
Frequentist
probabilities represent long run frequencies of events
Bayesian
probability is used to quantify our uncertainty about something
can model uncertainty about events with short term frequencies
Concepts
Discrete random variables
Probability mass function, pmf
state space
indicator function
Fundamental rules
product rule
sum rule
Bayes rule
Independence and conditional independence
Continuous random variables
cumulative distribution function, cdf
probability density function, pdf
Quantiles
Mean and variance
Some common discrete distributions
Binomial
Bin(n, θ)
Bernoulli
Ber(θ)
Multinomial
Mu(n, θ)
Multinoulli
Cat(θ)
The empirical distribution
Some common continuous distributions
Gaussian (normal) distribution
N(μ,σ2)
Laplace distribution
Lap(μ, b)
The gamma distribution
Ga(a,b)
gamma function, Γ(a)
The beta distribution
Beta(a, b)
Pareto distribution
Pareto(k, m)
long tails
Joint probability distributions
Covariance and correlation
Multivariate Gaussian, Multivariate Normal (MVN)
Multivariate Student t distribution
Dirichlet distribution
Dir(x|α)
Transformations of random variables
Monte Carlo approximation
Information theory
Entropy
a measure of the random variable's uncertainty
KL divergence/Relative Entropy
a measure of the dissimilarity of two probability distributions
Cross Entropy
Mutual information
Conditional Entropy

Bayesian concept learning
Likelihood
Prior
Posterior
MLE
MAP
The beta-binomial model
The Dirichlet-multinomial model
Naive Bayes classifiers
Feature selection using mutual information










feature selection/ sparsity

Introduction
not clear how to best represent some kinds of objects as fixed-sized feature vectors
deep learning
define a generative model for the data, and use the inferred latent representation and/or the parameters of the model as features
kernel function
measuring the similarity between objects, that doesn’t require preprocessing them into feature vector format
Support vector machines (SVMs)

Introduction
before, infer p(θ|D) instead of p(f|D)
Bayesian inference over functions themselves
Gaussian processes or GPs
defines a prior over functions, which can be converted into a posterior over functions once we have seen some data

adaptive basis- function model (ABM)
dispense with kernels altogether, and try to learn useful features φ(x) directly from the input data
Boosting
Ensemble learning

probabilistic models for sequences of observations
Markov models
Hidden Markov models

state space model or SSM
just like an HMM, except the hidden states are continuous

Introduction
undirected graphical model (UGM), also called a Markov random field (MRF) or Markov network
Advantages
they are symmetric and therefore more “natural” for certain domains
discrimi- nativel UGMs which define conditional densities of the form p(y|x), work better than discriminative DGMs
Disadvantages
he parameters are less interpretable and less modular
parameter estimation is com- putationally more expensive
Markov random field (MRF)
Conditional random fields (CRFs)
Structural SVMs

Introduction
forwards-backwards algorithm
generalize these exact inference algorithms to arbitrary graphs

Introduction
approximate inference methods
variational inference
reduces inference to an optimization problem
often gives us the speed benefits of MAP estimation but the statistical benefits of the Bayesian approach


Introduction
Monte Carlo approximation
generate some (unweighted) samples from the posterior
compute any quantity of interest
non-iterative methods
iterative method

Gibbs sampling

Introduction
Clustering
the process of grouping similar objects together.
flat clustering, also called partitional clustering
hierarchical clustering


Introduction
symbols or tokens
bag of words
Distributed state LVMs for discrete data
Latent Dirichlet allocation (LDA)
Quantitatively evaluating LDA as a language model
Perplexity
Fitting using (collapsed) Gibbs sampling
Fitting using batch variational inference
Fitting using online variational inference
Determining the number of topics
Extensions of LDA
Correlated topic model
Dynamic topic model
LDA-HMM
Supervised LDA

Introduction
Deep generative models
Deep neural networks