Features
Features:

Product Tour >

Edraw AI >

Paid Plans:

Individuals >

Business >

Eduaction >
Resources
Blog

History

How-tos & Tips

Discovery

Biography

Business Analysis

Examples

AI concept Map

Free AI Mind Map Generator

Onenote Mind Map

Bcg Matrix Examples

Nike Marketing Strategy

Unilever SWOT Analysis

Make Mind Maps in Google Docs

Guide

FAQs

What's New

Resource Center
Templates
All Templates

Brain Storming Templates

Strategy and Planning Templates

Project Management Templates

Product Management Templates

Human Resources Templates

Agile Workflow Templates

Marketing Templates

Education Templates

Fun and Games Templates

User Gallery
Download
Pricing
Enterprise

CTT, reliability

Classic test theory (CTT) and measurement reliability in psychometrics are two core concepts, which together form the basic framework of psychometrics. Below is a detailed explanation of these two concepts.

Edited at 2024-11-13 11:42:33

Sophiamarie_

Recent works View more works>>

CTT, reliability

Sophiamarie_

Recent works View more works>>

Recommended to you
Outline

Classical test theory (CTT), measurement reliability

Classical Test Theory (CTT)

Psychological traits and their measurability hypothesis

psychological traits

Meaning: refers to the unique and relatively stable behavior manifested in a person, such as intelligence, interests, attitude, personality, etc.

Properties: relative stability, abstraction, implicitness, predictability

Measurability of psychological traits:

Psychological traits are an objective existence

Thorndike: Everything that exists objectively has its quantity

McColl: Everything that has quantity can be measured

Measurement errors and their sources

The meaning of psychometric error

Refers to an inaccurate or inconsistent measurement effect caused by changing factors that have nothing to do with the measurement purpose during the measurement process.

Types of measurement errors

Random error: an error caused by accidental factors that have nothing to do with the purpose of measurement and which is not easy to control. Its direction and size changes are completely random.

Systematic error: a constant and regular effect caused by changing factors that have nothing to do with the purpose of measurement, and its magnitude and direction remain unchanged.

Sources of Psychometric Error

1.Measuring Tools

Psychometric scales are unstable (low reliability)

Not really measuring what we want to measure (low validity)

2.Measurement object

The test subject’s true level has not been properly demonstrated

3. Testing process

Physical environment: temperature, light, sound, etc. at the measurement site

Testing time

unexpected interference

Main test factors

4. Rater

Subject effect, subject effect

Experimenter effect: Also known as the experimenter effect, it means that the experimenter may intentionally or unintentionally influence the subjects in some way (such as expressions, gestures, tone, etc.) during the experiment, so that their responses meet the expectations of the experimenter. Therefore , this effect is often called the Rosenthal effect or the expectation effect

Participant effect: Also known as the Hawthorne effect, it refers to the experimental bias caused by the subject's perception of his subject's identity and attitude. Simply put, the subject changes his behavior because he receives extra attention, resulting in A situation in which performance or effort increases

Methods to reduce measurement errors

Measurement Tools: Improving the Reliability and Validity of Measurement Tools

Measurement object: Ensure the normal performance of the measurement object

Testing process: standardization

Raters: Unified grading standards

True fractions and related assumptions

The meaning of true fraction

The value that reflects the true level of a subject's psychological trait is called the true score of the trait; the actual measured score is called the observed score of the trait.

Mathematical model of classical test theory and its assumptions

X=T E (observed score = true score random error score)

①ε(X)=T or ε(E)=0 The expected value of the observed score is the true score, and the expected value of the random error score is 0. Operational definition of a true fraction: the average of the results obtained from numerous measurements.

②ρ(T,E)=0 True fractions and random errors are independent of each other

③ρ(E1,E2)=0 Random errors on each parallel test are independent of each other

parallel test

If two tests with different questions measure the same trait and the question format, number, difficulty, discrimination, and test score distribution are consistent, the two tests are said to be parallel to each other. Parallel test: Two tests that use different questions to measure the same content, and the mean and standard deviation of the test results are the same. Rigorous parallel tests are difficult to construct.

measurement reliability

Reliability overview

Definition of reliability: Reliability is the degree of stability/reliability/consistency of measurement results. The degree of consistency in the results obtained by repeating measurements on the same subjects at different times using the same test (or using another set of equivalent tests).

The role of reliability

One of the important indicators for evaluating test quality

Reflection of the size of random errors existing in the measurement process

Interpreting the meaning of individual test scores—standard error of measurement

Compare the differences in scores on different tests

How to estimate reliability

test-retest reliability

meaning

Test-retest reliability, also called test-retest reliability, refers to the degree of consistency of the results obtained by using the same measurement tool to test the same group of subjects twice under the same conditions, and reflects the results of the measurement tool. Affected by time interval factors. The most appropriate time interval varies depending on the purpose and nature of the test and the characteristics of the test subject. Generally, two weeks to four weeks are suitable, and preferably no more than six months.

Assessment method

The size of the test-retest reliability can be marked by calculating the test-retest coefficient or stability coefficient of the measurement tool. Specifically, it is to obtain the Pearson product-difference correlation coefficient between the scores of the same group of subjects on the two tests.

Application conditions

The individual psychological traits measured by the measurement tool should be relatively stable over time. Such as: personality test There should be no obvious practice effect and forgetting effect on the individual psychological traits measured by the measurement tool, and the effects of practice and forgetting basically cancel each other out. Such as: intelligence test (6 months) No special training or training should be conducted between the two administrations to ensure that the test-retest reliability reflects the influence of random factors.

Replicate reliability

meaning

Duplicate reliability refers to the degree of consistency of the results obtained by two duplicate tests (parallel tests) measuring the same group of subjects. The degree of replica reliability is calculated by calculating the Pearson product-difference correlation coefficient of the scores obtained by the same group of subjects on the two replica tests. Replicate reliability reflects measurement errors caused by differences in questions and time intervals.

Assessment method

Equivalence coefficient: the two duplicate tests are administered simultaneously and continuously;

Stability and equivalence coefficient/test-retest reliability: The two replicate tests are administered twice, separated by a period of time.

order effect

The effect of the presentation order of independent variables on the dependent variable. That is, when the same subjects receive different experimental treatments, the possible impact of the first experimental treatment on the second experimental treatment. The impact may be huge or slight, short-lived or long-lasting.

Balanced design

An experimental design technique that controls the order of experimental treatments to offset sequence errors caused by the order of experimental treatments.

Application conditions

1) Construct two or more truly parallel tests (i.e. Papers A and B); Duplicate or parallel tests: Two tests that use different items to measure the same content and whose test results have the same mean and standard deviation. 2) Subjects must be qualified to accept two tests. (time, money, etc.) 3) You should try your best to explain in detail the time interval between the two tests, the test sequence arrangement, the test experience of the subjects during the test, etc. in the test result report.

internal consistency reliability

meaning

Internal consistency reliability, also called homogeneity reliability, mainly evaluates whether the same psychological traits are measured among the random components of the test, and reflects the degree of sampling consistency of the question content.

Estimation method

split-half reliability

meaning

Split-half reliability refers to the consistency of the scores obtained by all subjects on the two halves after dividing a test into two equal halves. Split-half reliability reflects whether the items in the two random components of the test measure the same psychological trait.

split in half method

Odd and even half method

Assessment method

Spearman-Brown formula Flanagan formula Lu Lun formula

Library-theoretical reliability

Cronbach's coefficient

Hoyt reliability

inter-rater reliability

meaning

Interrater reliability refers to the degree of consistency with which multiple raters rate responses from the same group of people. Generally, the average consistency between pairs of trained raters is required to be above 0.90 before the rating is considered objective.

Assessment method

Two raters: Calculate the correlation coefficient (Pearson product-difference correlation or Spearman rank correlation) between the scores given by the two raters to the same batch of subjects' answer sheets.

More than two raters: estimated using Kendall's concord coefficient.

Ways to improve reliability

Factors affecting measurement reliability

Subject characteristics

single subject

Exam motivation

test anxiety

Quiz experience

practice effect

response tendency

Physiological variables

Heterogeneity of the subject group, average ability level of the subject group

Main test characteristics

tester

grader

Testing situation

measuring tools

Test length

Test difficulty

time interval

Ways to improve reliability

Increase the length of the test appropriately

Control the difficulty distribution of test questions

Try to improve the discrimination of each question

Select the appropriate subject group

Standardize the testing process and unify the testing environment

Ensure sufficient time for subjects to answer questions

floating theme