# Machine Learning - A Probabilistic Perspective

This mind map is about the reading notes, which is Machine Learning.

Tags:

Similar Mind Maps

Outline

Machine Learning - A Probabilistic Perspective

Introduction

Types

Supervised Learning

Classification

binary classification

multiclass classification

Regression

Unsupervised Learning

Reinforcement Learning

Concepts

Parametric vs non-parametric models

The curse of dimensionality

Overfitting

Model selection

cross validation (CV)

No free lunch theorem

Probability

Interpretations

Frequentist

probabilities represent long run frequencies of events

Bayesian

probability is used to quantify our uncertainty about something

can model uncertainty about events with short term frequencies

Concepts

Discrete random variables

Probability mass function, pmf

state space

indicator function

Fundamental rules

product rule

sum rule

Bayes rule

Independence and conditional independence

Continuous random variables

cumulative distribution function, cdf

probability density function, pdf

Quantiles

Mean and variance

Some common discrete distributions

Binomial

Bin(n, θ)

Bernoulli

Ber(θ)

Multinomial

Mu(n, θ)

Multinoulli

Cat(θ)

The empirical distribution

Some common continuous distributions

Gaussian (normal) distribution

N(μ,σ2)

Laplace distribution

Lap(μ, b)

The gamma distribution

Ga(a,b)

gamma function, Γ(a)

The beta distribution

Beta(a, b)

Pareto distribution

Pareto(k, m)

long tails

Joint probability distributions

Covariance and correlation

Multivariate Gaussian, Multivariate Normal (MVN)

Multivariate Student t distribution

Dirichlet distribution

Dir(x|α)

Transformations of random variables

Monte Carlo approximation

Information theory

Entropy

a measure of the random variable's uncertainty

KL divergence/Relative Entropy

a measure of the dissimilarity of two probability distributions

Cross Entropy

Mutual information

Conditional Entropy

Generative Models for Discrete Data

Bayesian concept learning

Likelihood

Prior

Posterior

MLE

MAP

The beta-binomial model

The Dirichlet-multinomial model

Naive Bayes classifiers

Feature selection using mutual information

Gaussian models

Bayesian statistics

Frequentist statistics

Linear regression

Logistic Regression

Generalized linear models and the exponential family

Directed graphical models (Bayes nets)

Mixture models and the EM algorithm

Latent linear models

Sparse linear models

feature selection/ sparsity

Kernels

Introduction

not clear how to best represent some kinds of objects as fixed-sized feature vectors

deep learning

define a generative model for the data, and use the inferred latent representation and/or the parameters of the model as features

kernel function

measuring the similarity between objects, that doesn’t require preprocessing them into feature vector format

Support vector machines (SVMs)

Gaussian processes

Introduction

before, infer p(θ|D) instead of p(f|D)

Bayesian inference over functions themselves

Gaussian processes or GPs

defines a prior over functions, which can be converted into a posterior over functions once we have seen some data

Adaptive basis function models

adaptive basis- function model (ABM)

dispense with kernels altogether, and try to learn useful features φ(x) directly from the input data

Boosting

Ensemble learning

Markov and hidden Markov models

probabilistic models for sequences of observations

Markov models

Hidden Markov models

State space models

state space model or SSM

just like an HMM, except the hidden states are continuous

Undirected graphical models (Markov random fields)

Introduction

undirected graphical model (UGM), also called a Markov random field (MRF) or Markov network

Advantages

they are symmetric and therefore more “natural” for certain domains

discrimi- nativel UGMs which define conditional densities of the form p(y|x), work better than discriminative DGMs

Disadvantages

he parameters are less interpretable and less modular

parameter estimation is com- putationally more expensive

Markov random field (MRF)

Conditional random fields (CRFs)

Structural SVMs

Exact inference for graphical models

Introduction

forwards-backwards algorithm

generalize these exact inference algorithms to arbitrary graphs

Variational inference

Introduction

approximate inference methods

variational inference

reduces inference to an optimization problem

often gives us the speed benefits of MAP estimation but the statistical benefits of the Bayesian approach

More variational inference

Monte Carlo inference

Introduction

Monte Carlo approximation

generate some (unweighted) samples from the posterior

compute any quantity of interest

non-iterative methods

iterative method

Markov chain Monte Carlo (MCMC) inference

Gibbs sampling

Clustering

Introduction

Clustering

the process of grouping similar objects together.

flat clustering, also called partitional clustering

hierarchical clustering

Graphical model structure learning

Latent variable models for discrete data

Introduction

symbols or tokens

bag of words

Distributed state LVMs for discrete data

Latent Dirichlet allocation (LDA)

Quantitatively evaluating LDA as a language model

Perplexity

Fitting using (collapsed) Gibbs sampling

Fitting using batch variational inference

Fitting using online variational inference

Determining the number of topics

Extensions of LDA

Correlated topic model

Dynamic topic model

LDA-HMM

Supervised LDA

Deep Learning

Introduction

Deep generative models

Deep neural networks