______________________________________________________

ASSESSMENT VALIDITY



The essence of measurement theory is a four step process:
1. STIMULUS elicits a
2. RESPONSE which is compared to a
3. REFERENCE which leads to an
4. INFERENCE

_____________________________________________________________________

back to measurement primer menu next screen












_____________________________________________________________________

On a test the stimuli (test questions) call forth responses (often which are marks on an answer sheet) which are compared to a reference (criterion or norm) which then lead to some inference about the characteristics of the test taker.

The basic process of (S)timulus, (R)esponse, (R)eference, (I)nference is the same whether the assessment is based on observation, a portfolio, or a more traditional test.

_____________________________________________________________________

back to measurement primer menu next screen















_____________________________________________________________________

In the test norms section we explored issues about the "reference".

In the reliability section of this program we will examine issues regarding "stimulus-response".

Validity issues in assessment are issues related to INFERENCE.

Because validity technically has to do with the appropriateness of the inference made from the assessment data, validity is not a characteristic which can be directly associated with a test.

_____________________________________________________________________

back to measurement primer menu next screen























_____________________________________________________________________

When we look at the types of validity evidence which can be gathered about a test, we are actually not looking at "test validity".

Instead we are looking at information which suggests the probability that an appropriate inference will be made from the test.

It is ultimately the inference, not the test, which is judged valid or invalid.

Thus when a test appears to have satisfactory content validity (to be defined shortly) what we really mean is that it appears likely that a valid inference regarding competence with this content can be made from the results of the test.

_____________________________________________________________________

back to measurement primer menu next screen












_____________________________________________________________________

In effect, validity is the APPROPRIATENESS OF THE INTERPRETATION.

It is important to remember that an apparently "good test" is no guarantee of a valid interpretation.

It is also important, however, to remember that poor tests are very unlikely to result in valid interpretation, so the validity evidence for a test is an important consideration.

_____________________________________________________________________

back to measurement primer menu next screen




















_____________________________________________________________________

REVIEW QUESTION

The term validity is closest in meaning to

a. consistency

b. practicality

c. relevance

_____________________________________________________________________
















_____________________________________________________________________

Validity evidence for tests in educational and psychological measurement is classified into is three categories:

1. content-related evidence

2. criterion-related evidence

3. construct-related evidence

_____________________________________________________________________

back to measurement primer menu next screen























_____________________________________________________________________

CONTENT-RELATED EVIDENCE is typically the most important type of validity evidence for teacher-made tests and also for some personality inventories.

To prepare a test with satisfactory content validity, the steps are:

1. identify the domain to be tested (instructionally relevant if a classroom test)

2. prepare a table of specifications, a test blueprint (number of questions for each content category, level of difficulty, etc.)

3. use a representative sample questions from the domain

_____________________________________________________________________

back to measurement primer menu next screen























_____________________________________________________________________

Below is an example of a table of specifications for a test when the instructional domain is test score conversions.

         CONTENT        knowledge   computation
        --------------------------------------
          percentile       4            1
          stanine          5            0
          z score          3            2
          T score          4            1
                        -------       ----
                          16            4

This table indicates that the 20 items on this test will be equally balanced between percentiles, stanines, z scores, and T scores. Knowledge of the scores is weighted more heavily than computation with the scores.

_____________________________________________________________________

back to measurement primer menu next screen























_____________________________________________________________________

There is no "number" to indicate the extent to which a test is content valid. Content-related validity evidence is the evaluator's impression of the extent to which the 3-step process for development was followed.

Content validity, however, is not the same as FACE validity. Face validity is the initial impression about whether a test appears to measure what it claims to measure. Face validity can be important in the motivation of the test taker but does not involve the analysis used in content validation.

_____________________________________________________________________

back to measurement primer menu next screen























_____________________________________________________________________

REVIEW QUESTION

A test blueprint is associated with:

a. expectancy table

b. experience table

c.. table of specificiations

_____________________________________________________________________



























_____________________________________________________________________

YOU ARE CORRECT.

Of the choices (consistency, practicality, or relevance), relevance comes closest to the meaning of test validity.

The term "truthfulness" is also often associated with validity.

In either case we look for validity evidence to determine if the test score will be helpful (relevant) in the inference we want to make from it.

_____________________________________________________________________

back to measurement primer menu next screen





















_____________________________________________________________________

No, the correct answer is "c".

Of the choices (consistency, practicality, or relevance), relevance comes closest to the meaning of test validity.

The term "truthfulness" is also often associated with validity.

In either case we look for validity evidence to determine if the test score will be helpful (relevant) in the inference we want to make from it.

_____________________________________________________________________

back to measurement primer menu next screen























_____________________________________________________________________

GOOD WORK!.

A test blueprint is another name for the table of specifications used to construct a test and used to assess the content validity of the test.

For example, if you looked at the table of specifications for a final exam and discovered that almost all of the questions came from information presented in the first week of class, the content validity of that test would be suspect.

_____________________________________________________________________

back to measurement primer menu next screen



















_____________________________________________________________________

WHOOPS!

A test blueprint is another name for the table of specifications used to construct a test and used to assess the content validity of the test.

For example, if you looked at the table of specifications for a final exam and discovered that almost all of the questions came from information presented in the first week of class, the content validity of that test would be suspect.

_____________________________________________________________________

back to measurement primer menu next screen