MindMap Gallery CFA Quantitative Analysis (1)
The content of quantitative analysis (1) includes linear regression, multiple linear regression, and time series. To be updated later
Edited at 2019-12-30 14:46:25This is a mind map about bacteria, and its main contents include: overview, morphology, types, structure, reproduction, distribution, application, and expansion. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about plant asexual reproduction, and its main contents include: concept, spore reproduction, vegetative reproduction, tissue culture, and buds. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about the reproductive development of animals, and its main contents include: insects, frogs, birds, sexual reproduction, and asexual reproduction. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about bacteria, and its main contents include: overview, morphology, types, structure, reproduction, distribution, application, and expansion. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about plant asexual reproduction, and its main contents include: concept, spore reproduction, vegetative reproduction, tissue culture, and buds. The summary is comprehensive and meticulous, suitable as review materials.
This is a mind map about the reproductive development of animals, and its main contents include: insects, frogs, birds, sexual reproduction, and asexual reproduction. The summary is comprehensive and meticulous, suitable as review materials.
Quantative Methods (quantitative analysis)
Tradition (study session 2)
Regression
Linear regression
Model
Model components
variables(variables)
corss-sectional data
time-series data
regression coefficients (Regression coefficients)
After solving b1, use the mean (X, Y) to solve b0
estimated regression coefficient
linear least squares (least squares method)
Minimum value:
(dependent variable – predicted value of dependent variable)2
estimated parameters or fitted parameters:
Note that we never observe the population parameter values b0 and b1 in a regression model.
error term (residual error)
excel model
Assumptions (6 items)
1.Y,X are straight lines with b0 and b1 as parameters (X is linear in the parameters b0 and b1), multiple powers of
Linear regression cannot be used
Linear regression analysis can be used
2.X, is not random
3.
The error term ε is a random variable with an expected value of zero.
Ensure that the correct b0, b1 are calculated
4.
Homoscedasticity Assumption (homoskedasticity assumption)
5.
The error terms ε are independent of each other
6. ε, normally distributed
Check whether there are specific estimated parameters
SEE(Standard Error of Estimate)
Measure how accurately the regression model describes the relationship between variables
The coefficient of determination (decisive factor)
Univariate
Square of correlation coefficient, r^2
Multivariate
total variation=
RSS(SSr): sum of squares of regression
unexplained variation=
SSe: sum of squares of error
Total variation = Unexplained variation Explained variation
The larger the coefficient of determination, the better the fitting effect.
Hypoththesis testing
H0:ρ=0, H1:ρ≠0. H0:null hypothesis,H1:alternative hypothesis
confidence interval
Outside the confidence interval, reject H0.
t-test
reject H0
The larger the t-value, the better
we can reject the hypothesis that the true parameter is equal to 0 at the 0.5 percent significance level (99.5 percent confidence).
p-value
The p-value is the smallest level of significance at which the null hypothesis can be rejected. Reject the minimum value of H0.
The smaller the p-value, the better. The usual reference value is 0.05
The significance level (significance level) is the probability of making an error when the estimated overall parameter falls within a certain interval, represented by α. --Baidu Encyclopedia
error type
Analysts often choose the 0.05 level of significance, which indicates a 5 percent chance of rejecting the null hypothesis when, in fact, it is true (a Type I error)
Analysis of variance (ANOVA)
Analysis of variance (ANOVA) is a statistical procedure for dividing the total variability of a variable into components that can be attributed to different sources.
F-test
The F-statistic tests whether all the slope coefficients in a linear regression are equal to 0.
H0: b1 = 0, Ha: b1 ≠ 0
The bigger the F, the better
subtopic
Prediction Intervals
two sources of uncertainty
the error term itself contains uncertainty.
estimated parameters
limitations
parameter instability
regression relations can change over time, just as correlations can.
public knowledge
public knowledge of regression relationships may negate their future usefulness.
assumptions are violated
hypothesis tests and predictions based on linear regression will not be valid
Multiple Linear Regression
Introduction
t-test
ANOVA
two types of uncertainty:
SEE (standard error of estimate): uncertainty in the regression model itself
b0,b1: uncertainty about the esti mates of the regression coefcients.
As the number of independent variables Xi increases, R^2 will increase and the reliability of R^2 decreases. At this time, it is necessary to compare it with adjusted R^2.
Dummy Variables
Dummy variables in a regression model can help analysts determine whether a particular qualitative independent variable explains the model’s dependent variable. Can a qualitative independent variable explain the dependent variable?
value(0,1)
To confirm in n categories, n-1 dummy variables are required
The intercept represents the mean value of Y corresponding to the omitted category X, and the slope represents the incremental effect of each category on Y (incremental effect).
Similar to linear regression on one variable
Assumptions and Violations
Asumtions
A linear relation exists between the Xj and Y.
Xj are not random;no exact linear relation exists between Xj,Xk
homoskedasticity
ε is uncorrelated across observations.
ε is normally distributed
Violations
heteroskedasticity
The variance of the error term is different from the variance of the observations
the variance of the errors differs across observations
no conditional heteroskedasticity
conditional heteroskedasticity
Breusch–Pagan test
serial correlation (autocorrelated)
The error term is related to the observation
regression errors are correlated across observations
Positive serial correlation
The variance will decrease
t-statistics:inflates
F-statistic:inflates
Durbin–Watson statistic (DW)
DW=2*(1-r)
The value of DW is between 0-4
Reference value: DW=2,
DW deviates too far from DW=2, indicating a sequence correlation problem
multicollinearity
There is a linear relationship between the two independent variables
One or more independent variables X are highly correlated
not perfectly related
t-statistics: not significant t value is small
F-statistic: significant, large F value
F test and t test Baodun, F is big and t is small
The coefficient of determination R^2 will increase
The variance of individual slope coefficients increases and the overall variance decreases
Model Specification misSpecification
Model Specification
cogent economic reasoning
The model should be grounded in cogent economic reasoning
functional form .(LN, logarithmic)
The functional form chosen for the variables in the regression should be appropriate given the nature of the variables. (LN, logarithmic)
parsimonious (simple)
The model should be parsimonious.
Little X, big Y, know the subtleties.
assumptions violations
be examined for violations of regression assumptions before being accepted.
useful out of sample
The model should be tested and be found useful out of sample before being accepted.
misSpecification
functional form
variables could be omitted
variables may need to be transformed
pools data from different samples
X correlated with the error term
estimated regression coefficients to be biased and inconsistent
time-series misspecifcation
labeled dependent variables as independent
including a function of dependent variable as an independent variable
independent variables that are measured with error
qualitatively dependent variable
Probit models
based on the normal distribution
logit models
based on the logistic distribution
discriminant analysis
Time Series
Trend Models
linear trend
A fixed amount that grows over time
log-linear trend
have exponential growth (exponential growth)
A fixed rate that grows over time
predicted trend value of yt
growth rate
Linear trend regression will have the problem that the regression error is related to the observed value. The log will be corrected some, but it is not solved.
Testing for Correlated Errors
DW-test
H0: There are no sequence-related problems,
The premise of using the trend model is that the covariance is stationary. If the covariance is not stationary, the model will be invalid.
Autoregressive (AR) Time-Series Models
We must assume that the time series we are modeling is covariance stationary
Covariance-Stationary Series
the expected value of the time series must be constant and finite in all periods
the covariance of the time series with itself for a fixed number of periods in the past or future must be constant and finite in all periods
the variance of the time series must be constant and finite in all periods
How to check whether the covariance is stationary? Look directly at the plot, if the plot shows the same mean and variance
Autocorrelation coefficient for all lag values = 0
A random walk
Previous value Unpredictable random term
previous period plus an unpredictable random error
incovariance stationary
If the time series is a random walk, it is not covariance stationary
Random walk with drift
A random walk with drift is a random walk with a nonzero intercept term.
Has a unit root
All random walks have unit roots.
If a time series has unit roots, it is impossible to have stationary covariances
Treatment of unit roots
first-differencing the time series; (first-order splitting), perform autoregressive estimation of the sequence after first-order splitting.
Moving-Average Time-Series Models
moving average
Lags behind actual data and plays a role in smoothing data (such as smoothing seasonal fluctuations)
Because of the lag, the prediction effect cannot be achieved.
MA(1) model
MA(q):A qth order moving-average model
ts first q autocorrelations are nonzero while autocorrelations beyond the first q are zero.
ARMA models
autoregressive moving average models
the parameters in ARMA models can be very unstable;
determining the AR and MA order of the model can be difficult;
ARMA models may not forecast well
ARCH
Autoregressive conditional heteroskedasticity modelAutoregressive conditional heteroskedasticity model
If the coefficient on the squared residual is statistically significant, the time-series model has ARCH(1) errors
if a time-series model has ARCH(1) errors
Multivariate time series problem
All timelines have no cell roots and regression is available
If neither of the time series has a unit root, then we can safely use linear regression.
Only one of the time series has a unit root, can regression be used?
If one of the two time series has a unit root, then we should not use linear regression
All series have unit roots, and time series are cointegrated, regression is available
If both time series have a unit root and the time series are cointegrated, we may safely use linear regression
All series have unit roots, and the time series is not cointegrated, so regression is not available.
however, if they are not cointegrated, we should not use linear regression
(Engle–Granger) Dickey–Fuller test cointegration test
The (Engle–Granger) Dickey–Fuller test can be used to determine if time series are cointegrated
Some issues with time series
Covariance stationarity will form mean reverting
Compare the accuracy of different regression models
The root mean squared error (RMSE) root mean squared error: Error squared and square root of mean
The smaller the better
The parameters of the time series model will be unstable. When using the time series model for estimation, it is necessary to check whether the time series is stable.
The steps of time series forecasting
Understand your investment problem and choose an initial time series model
regression model
Use one variable to predict another variable
time-series model
Predict the same variable using previous data on the same variable
If you use a time series model, first draw a graph to see if the covariance is stationary.
Does not contain
a linear trend a linear trend
an exponential trend an exponential trend
seasonalality
There is an idle deviation in the data within the sample interval, a significant shift in the mean or covariance.
step
Draw a graph to check whether a linear trend or an exponential trend makes the most sense
Estimate trend parameters
Calculate remaining residuals
Durbin–Watson statistic detection sequence related issues
if it does not exist
Model available
if exists
Use autoregressive model autoregressive model
autoregressive model
Treatment of covariance stationarity violations of stationarity
a linear trend,
first-difference the time series.
exponential trend
take the natural log of the time series and then first-difference it
shifts significantly during the sample period
estimate different time-series models before and after the shift
significant seasonality
include seasonal lags
Construction of autoregressive model
Estimate an AR(1) model
Test to see whether the residuals from this model have significant serial correlation. If there is no sequence correlation problem, AR(1) can be used.
If there is a sequence correlation problem, use AR(2) for further estimation and repeat the previous steps. Until there are no sequence problems.
Check for seasonal issues
Method 1: Draw and observe
Method 2: Examine the data to see whether the seasonal autocorrelations of the residuals from an AR model are significant (for example, the fourth autocorrelation for quarterly data)
To correct for seasonality, add seasonal lags to your AR model. For example, if you are using quarterly data, you might add the fourth lag of a time series as an additional variable in an AR(1) or an AR(2) model .
Detecting heteroskedasticity problems conditional heteroskedasticity
ARCH(1)
Regress the squared residual from your time-series model on a labeled value of the squared residual.
Test whether the coefficient on the squared lagged residual differs significantly from 0
If the coefficient on the squared lagged residual does not differ significantly from 0, the residuals do not display ARCH and you can rely on the standard errors from your time-series estimates.
use generalized least squares or other methods to correct for ARCH
out-of-sample forecasting performance
subtopic
FinTech (study session 3)
Machine Learning
Big Data Projects
Probabilistic Approaches
scenario analysis
Decision Trees
Simulation
theme