Types of Reliability and Validity

Types of Reliability & Validity
Construct Validity
- used to ensure that the measure is actually assessing the correct content (i.e. the
construct), and not other variables. Using a panel of "experts" familiar with the construct is a way in which this type of validity can be assessed. The experts can
examine the items and decide what that specific item is intended to measure.
Students can be involved in this process to obtain their feedback.
Content Validity/Logical Validity
- an important research methodology term that
refers to how WELL a test measures the behavior
for which it is intended.
Sampling Validity
- (very similar to content validity) ensures that the measure covers the broad range of areas within the concept that is being studied. Not everything can be covered, so items need to be sampled from all of the domains. This may need to be completed using a panel of "experts" to ensure that the content area is adequately sampled. Additionally, a panel can help limit the "expert" bias (i.e. a test reflecting what an individual personally feels are the most important or relevant ideas).
Criterion-Related Validity
- used to predict future or current performance - it
correlates test results with another criterion of
interest.
- PREDICTIVE VALIDITY - a form of criterion- related validity: a
measurement of how well a test predicts future performance. In order for a
test to have predictive validity, there must be a statistically significant
correlation between test scores and the criterion being used to measure
the validity.
Concurrent Validity
- a measure of how well a particular test correlates
with a previously validated measure. It is
commonly used in the educational field.
Face Validity
ascertains that the measure appears to be assessing the intended construct that
is being studied. The stakeholders can easily assess face validity. Although this is
not a very "scientific" type of validity, it may be an essential component in enlisting
motivation of stakeholders. If the stakeholders do not believe the measure is an
accurate assessment of the ability, they may become disengaged with the task.
Validity
- how well a test measures what it
is purported to measure
In order for an assessment to be "sound", they must be free of
bias and distortion. Reliability and Validity are two concepts
that are important for defining and measuring bias and
distortion.
Reliability
the degree to which an assessment tool produces stable and consistent results
Parallel Forms Reliability
a measure of reliability obtained by administering different versions of an assessment tool (both versions must contain items that probe the same construct, skill, knowledge base, etc.) to the same group of individuals. The scores from the two versions can then be correlated in order to evaluate the consistency of results across alternate versions.
Inter-Rater Reliability
a measure of reliability used to assess the degree to which different judges or raters agree in their assessment decisions. Inter-rater reliability is useful because human observers will not necessarily interpret answers the same way; raters may disagree as to how well certain responses or material demonstrate knowledge of the construct or skill
Test-Retest Reliability
a measure of reliability obtained by administering the same test twice over a period of time to a group of individuals. The scores from Time 1 and Time 2 can then be correlated in order to evaluate the test for stability over time.
Internal Consistency Reliability
a measure of reliability used to evaluate the degree to which different test items that probe the same construct produce similar results.
Average Inter-Item Correlation
a sub-type of internal consistency reliability; the process of obtaining average inter-item correlation reliability is begun by taking all of the items that are on a given test that probe the same construct (e.g. reading comprehension), determining the correlation coefficient for each PAIR of items, and finally taking the average of all of these correlation coefficients. This final step yields the average inter-item correlation.
Split-Half Reliability
a sub-type of internal consistency reliability; the process of obtaining split-half reliability is begun by "splitting in half" all items of a test that are intended to probe the same construct (e.g. World War II) in order to from two "sets" of items. The ENTIRE test is administered to a group of individuals, the total score for each "set" is computed, and finally, the split-half reliability is obtained by determining the correlation between the two "set" of scores.
23