Features
Features:

Product Tour >

Edraw AI >

Paid Plans:

Individuals >

Business >

Eduaction >
Resources
Blog

History

How-tos & Tips

Discovery

Biography

Business Analysis

Examples

AI concept Map

Free AI Mind Map Generator

Onenote Mind Map

Bcg Matrix Examples

Nike Marketing Strategy

Unilever SWOT Analysis

Make Mind Maps in Google Docs

Guide

FAQs

What's New

Resource Center
Templates
All Templates

Brain Storming Templates

Strategy and Planning Templates

Project Management Templates

Product Management Templates

Human Resources Templates

Agile Workflow Templates

Marketing Templates

Education Templates

Fun and Games Templates

User Gallery
Download
Pricing
Enterprise

MindMap Gallery Test validity, test item analysis

Test validity, test item analysis

Test validity, test item analysis, validity refers to the degree to which a test or scale can actually measure the psychological traits it is intended to measure. Simply put, it is the accuracy/validity of a test.

Edited at 2024-11-27 12:35:35

Mason·Carter

Recent works View more works>>

Test validity, test item analysis

Mason·Carter

Recent works View more works>>

Recommended to you
Outline

Essential Characteristics of Effective Measurement Tools
- 94
WSwWE1MG

Measurement validity, item analysis of tests

test validity

Validity overview

Definition of validity

It refers to the extent to which a test or scale can actually measure the psychological traits it is intended to measure. Simply put, it is the accuracy/validity of a test. In test theory, validity is defined as the ratio of the variance related to the purpose of measurement (or effective variance) to the total variance in a set of measurements, that is: r2xy=S2V/S2X

The relationship between reliability and validity

1. High reliability is a necessary but not sufficient condition for high validity; 2. Validity is subject to reliability

Assessment of Validity

content validity

content validity meaning

It refers to the degree of agreement between the content actually measured by a test and the content to be measured, that is, whether the test is a representative sample of the behavioral field to be measured. The content or behavioral area to be measured is determined based on the purpose of measurement, which usually includes the scope of knowledge to be measured and the degree of mastery required for each knowledge point within the scope.

How to assess content validity

Logical analysis/expert judgment

Relevant experts are asked to make judgments on the consistency of the test questions with the original content to see whether the test questions represent the specified content. Therefore, content validity is sometimes also called logical validity

statistical analysis

1. Calculate the consistency of assessments between two raters, that is, rater reliability. 2. Cronbach proposed that content validity can be estimated by the correlation of scores of a group of subjects on two replica tests independently taken from the same content range - replica reliability. (Watch out for false correlations!) 3. Test-retest reliability/test-retest method (test before and after learning): Take a test before and after learning a certain knowledge for comparison.

Empirical speculation

That is to test the validity through practice. Different groups of subjects have great differences in their test scores and responses to each question. Generally, the total score and the passing rate of each question increase with grade and age. (Evidence of growth and developmental changes) For example, whether the Child Development Scale is valid, after investigating and analyzing children of different ages, if the pass rate increases with age, it can be inferred that the test has content validity.

Content Validity Advantages and Disadvantages

Advantages: The detailed description of the test content is a reference for any test preparation

Disadvantages: Lack of reliable quantitative indicators, inconvenient comparison between tests

construct validity

meaning

Also known as construct validity and construct validity, it refers to the extent to which a test actually measures the theoretical structures and characteristics to be measured, or whether the test results can confirm or explain the assumptions, terms or constructs of a certain theory.

Basic characteristics

①The size of the construct validity depends on the pre-assumed psychological structure theory; The construct validity of tests with different theoretical constructs cannot be compared. ②Construct validity is sometimes difficult to obtain. When it cannot be confirmed, there may be problems with the theory or the experimental design cannot properly test the hypothesis; ③ There is no single quantitative index for construct validity, and it is evaluated based on the accumulation of evidence from all aspects. ④The definition or explanation of the measured content or behavioral range of some tests is similar to the explanation of theoretical constructs, so high content validity essentially means high structural validity.

How to determine construct validity

1. Search for evidence within the test 2. Search for evidence between tests 3. Empirical validity method of examination test 4. Multiple characteristics-multiple methods matrix method 5. Experimental operation method

internal test evidence

1. Content validity 2. Characteristics of subjects’ responses to the questions 3. Homogeneity reliability

Finding evidence between tests

Compatible validity: Examine the correlation between the newly compiled test and an old test that is known to have high validity and can effectively measure the same trait. The higher the better.

Discriminant validity: Also called heterogeneous validity, if two tests measure different traits, the correlation between them should be very low even if they are measured using the same method.

Convergent validity: Also called identical validity and convergent validity, if two tests measure the same trait, the correlation between them should be high even if different measurement methods are used.

Factor analysis method: By conducting factor analysis on a group of tests, we find out the common factors that affect the test. The loading of each test on the common factors (that is, the correlation between the test and each factor) is the factor validity of the test. These factors may It’s the psychological trait/construct we want to measure. The proportion of the total variation in test scores that comes from relevant factors is an indicator of the construct validity of the test.

Empirical Validity Method of Examination Test

The nature and type of the criterion predicted by the test are used as structural validity indicators for analyzing the test

Multiple Traits-Multiple Method Matrix Method (MTMM)

Measure two or more psychological traits simultaneously, and each psychological trait is measured using two or more methods. Suppose that multiple traits (such as A, B, C) are measured by multiple methods (such as 1, 2, 3, 4).

Experimental operation method

Suitable for tests that are susceptible to certain conditions, such as anxiety. Based on the theoretical conception of the trait to be measured, we can predict that under certain circumstances or after certain treatments, the subject's score will change.

Advantages and Disadvantages of Construct Validity

Advantages: It enables researchers to focus on the test's goal of proposing and verifying hypotheses, and pays attention to its role as a theoretical research tool, not just as an auxiliary decision-making function, thus providing a broader development prospect for the test.

Disadvantages: The concepts of some constructs are vague, and there is no consistent definition. It is impossible to clarify its operational steps and unique and objective indicators to determine validity. (Operational difficulty)

Empirical validity

meaning

Also known as criterion-related validity (referred to as criterion validity), it refers to the validity of a test in estimating the behavior of individuals in a specific situation, that is, using the practical effect as the test standard. Validity criterion: The behavior being estimated is the standard for testing the validity of the test.

Efficacy standard

meaning

It is an external standard to measure whether a test is valid. It is independent of the test and can be directly obtained from the behavioral characteristics of the test subjects in practice.

feature

Diversity; complexity; specificity and temporality

Good efficacy standard

Relevance; validity; reliability; objectivity/anti-bias; practicality; non-contamination

Prevent calibration standard contamination

(1) The evaluator should make a comprehensive evaluation and refine the evaluation details or rules in order to be as objective as possible; (2) Try not to let the evaluators know the previous test results to prevent subjective tendencies during evaluation.

How to determine empirical validity

Related laws: Calculate the correlation coefficient between the test score and the validity criterion measurement. This correlation coefficient is called the validity coefficient. The most commonly used is the product-difference correlation method because test scores and criterion measures are usually continuous variables.

distinction method; Discrimination is a method of testing whether test scores can effectively differentiate between groups defined by a criterion. The specific method is to divide the subjects into high and low groups and analyze whether the difference in test scores between the two groups is significant.

Hit rate method: The hit rate method is a method that uses the proportion of correct decisions as a validity indicator when a test is used as a basis for making choices. ①Positive hit rate It refers to the ratio of the number of people selected by the test who are actually selected correctly. ②Negative hit rate Refers to the ratio of those who were eliminated by the test to the number of people who really should have been eliminated. ③Total hit rate It refers to the ratio of the sum of the number of people who were tested correctly and the number of people who were eliminated correctly to the total number of people.

Ways to Improve Validity

1. The composition of the test 2. Interfering factors in test implementation 3. Nature of the sample group 4. Nature of the criterion 5. Reliability of measurement

Test item analysis

Difficulty of test items

Meaning: refers to the degree of difficulty that subjects encounter when completing test project tasks, usually represented by P.

Calculation of project difficulty

dichotomous scoring

pass rate

extreme grouping

non-dichotomous scoring

Determination of test difficulty level

The main purpose of conducting item difficulty analysis is to screen items. The appropriate level of item difficulty depends on the purpose and nature of the test.

Isometric transformation of difficulty

Effect of Difficulty on Tests

1. The distribution pattern of test scores: Generally, the score distribution of a test of moderate difficulty is normal. Too difficult or too easy will lead to a skewed score distribution.

2. The degree of dispersion of test scores: When the test difficulty is around 0.5, the degree of dispersion of the score distribution is the largest; if the difficulty is too high or too small, the degree of dispersion of the score distribution becomes smaller.

3. Test difficulty affects test reliability: When the test difficulty is around 0.5, the distribution range of scores is the widest, and the reliability based on the correlation coefficient becomes relatively high.

4. Discrimination of test items

Discrimination between test items

Meaning: Also called discrimination, it refers to the ability of test items to distinguish or identify differences in the psychological quality levels of subjects, usually represented by D.

The significance of project differentiation

The D value is between -1~+1. The D value is positive and is called positive discrimination. +1 indicates that the item can completely distinguish the psychological traits of the subjects. A D value of 0 is called no distinction, indicating that the item has nothing to do with the subject's psychological trait level. The D value is negative and is called negative distinction. -1 indicates that the item is opposite to the psychological level of the subject.

Calculation of project differentiation

Item Discrimination Index Method

Calculation of discrimination index: D=PH-PL D: Project identification index PH: high group pass rate PL: Low group pass rate

Division of extreme groups

variance method

Relevant law

1. Point-to-two column correlation: scope of application: items are scored as 0 or 1 (or dichotomous variables), while criterion scores or total test scores are continuous variables.

2. Bi-column correlation: Scope of application: Both are continuous variables, but one of the variables is artificially divided into two categories. Item scores are continuous, while criterion scores or total test scores are divided into two categories: high, low, or passing or failing; The criterion score or total test score is continuous, while the item scores are divided into two categories: true, false, or pass or fail.

3.φ correlation: Scope of application: Both variables are dichotomous variables. Four-square table: high and low groups of standard scores or total test scores, passing and failing of project scores.

4. Product-difference correlation: Scope of application: Both variables are continuous variables. If the total test score is used, the greater the item score, the greater the discrimination.

Difference test method

Test whether there is a significant difference in the scores of each item between the subjects with high total test scores and the subjects with low total scores. If the difference is significant, the discrimination of the item is good.

The relationship between discrimination and difficulty

If the passing rate of a certain item is 1 or 0, it means that all high-level or low-level groups passed or no one passed, then D=0 at this time. If the pass rate of a certain project is 0.50, it is possible that all the high-level groups passed, but no one in the low-level groups passed, then D=1 at this time. When the difficulty is 1 or 0, the potential discrimination of the item is smaller and closer to 0; when the difficulty is 0.50, the potential discrimination of the item is larger and closer to 1.

relativity of distinction

Different calculation methods result in different degrees of discrimination; The size of the sample affects the size of the relevant legal distinction value; Grouping criteria affect the discrimination index value (D); The degree of homogeneity of the subject sample affects the size of the discrimination value.

Guessing and Project Function Difference Analysis

Guess and Difficulty Guess Correction

Item Analysis of Multiple Choice Questions

Project function differences

In statistics, item function difference represents the difference in performance of two groups of subjects on a certain item. The occurrence of differences in project functions means that the functions of a project in different groups are different or unfair to different groups.

floating theme