______________________________________________________
Our last topic in this study program may be the most challenging, and it is among the most important.
Appropriate use of assessment data requires that we recognize that our observations, whether from standard tests or other techniques, are unlikely to be perfectly "reliable".
_____________________________________________________________________
back to measurement primer menu
next screen
_____________________________________________________________________
Let's begin with review of the basic four step process in assessment.
1. STIMULUS elicits a
2. RESPONSE which is compared to a
3. REFERENCE which leads to an
4. INFERENCEInference issues are concerned with validity. Reference issues are concerned with test norms.
Reliability has to do with effects of the stimulus-response combination.
The simplest definition of reliability is that it refers to the CONSISTENCY of the assessment results.
_____________________________________________________________________
back to measurement primer menu
![]()
next screen
_____________________________________________________________________
You make a score of 41 on Form A of a test.
What would your score have been if you had taken Form B instead?What would your score have been if you had taken the test at a different time of the day?You observe a student's behavior on the playground and conclude that the student is extremely aggressive.
Would a different observer have come to the same conclusion?Would your conclusion have been the same if you had observed him on the day before?These are issues of RELIABILITY.
_____________________________________________________________________
back to measurement primer menu
![]()
next screen
_____________________________________________________________________
Most of the examples in this program will be with assessment data based on results from traditional tests (these are the easiest to quantify).
It is important to remember, though, that concern about the effect of occasion (when did the observation occur) and concern about the effect of the sample conditions (what is the source of the behavior sample) are just as real when the data come from sources other than responses to a typical test.
_____________________________________________________________________
back to measurement primer menu
![]()
next screen
Classical Reliability Theory _____________________________________________________________________
The key concept in classical reliability theory, the TRUE SCORE, is the most difficult one.
Have you ever had the experience of studying so hard for a test that you you knew EVERYTHING that could possibly be asked on the test, everything it turns out except for the questions which the instructor chose to use on that test?
The score you actually got on the test is the OBSERVED score. The score that would have better represented your mastery of the content is the TRUE score.
_____________________________________________________________________ back to measurement primer menu
![]()
next screen
_____________________________________________________________________ In classical reliability theory, the TRUE SCORE is defined as the average score a person would get from taking an infinite number of equivalent forms of a test, assuming that the person was not affected by taking tests.
You take a test with 40 questions the instructor has chosen from the unit of material. Your score from these 40 questions is your OBSERVED score.
Imagine an unlimited number of 40 question tests drawn from that same unit of material. If you took all possible combinations of 40 question tests (form a, form b, form c, etc.) added the scores together, and then divided by the number of tests you took, the answer would be your TRUE score.
Obviously that is not feasible, especially with the additional assumption that you didn't learn anything while taking the tests, didn't experience fatigue from taking so many tests, and so forth.
That, however, is the classical definition of a TRUE score.
_____________________________________________________________________ back to measurement primer menu
![]()
next screen
_____________________________________________________________________
The true score is a HYPOTHETICAL CONSTRUCT. It could never actually exist, but we can use the idea, as if it could exist, to create some useful tools to estimate the reliability of our observations.
I don't actually know if there is electricity. I don't know what it looks like. But I can "act as if" it existed and create behavior patterns (like not sticking my finger in electrical outlets) which are certainly in my long-range best interests.
(That's not a great metaphor, let me try again. . . . )
_____________________________________________________________________
back to measurement primer menu
![]()
next screen
_____________________________________________________________________
In educational psychology you learn that information is stored in the nervous system in the form of schema, a mental image or code used to organize or structure the information.
A "schema" is also a hypothetical construct. We don't really know how information is physically stored in the brain. But, even before findings in neuroscience tell us what the storage procedure actually is, we can use the construct of schema.
People appear to store information in the brain "as if" there were such a thing as schema. We thus can use the hypothetical construct to design effective instructional strategies.
In the same way, we can "use" the hypothetical construct of true score to design techniques which allow the consideration of imperfect precision in our measuring procedures.
_____________________________________________________________________
back to measurement primer menu
![]()
next screen
_____________________________________________________________________
In classical measurement theory the RELIABILITY of an observation is based on the relationship between true scores and observed scores.
Since we can never actually know the true score, it follows then that we can never actually know THE reliability of an assessment.
Instead, particularly for use with the results of typical tests, we have a number of techniques for estimating the reliability.
_____________________________________________________________________
back to measurement primer menu
![]()
next screen
_____________________________________________________________________
Remember, all of the techniques we have for calculating the reliability coefficients for a test are estimates. The "true" reliability remains unknown and unknowable.
Remember also, that estimates can be quite useful!
At the time I am writing this, I don't know the exact time of day. But the fact that I can estimate that it's about 2 pm provides needed information for decision making (in this case that it's time to stop writing and get ready for my next class).
_____________________________________________________________________
back to measurement primer menu
![]()
next screen
_____________________________________________________________________
REVIEW QUESTION Reliability is closest in meaning to:
a. relevance
b. objectivity
c. consistency_____________________________________________________________________
_____________________________________________________________________
REVIEW QUESTION Which is a hypothetical construct?
a. observed score
b.. true score_____________________________________________________________________
_____________________________________________________________________
YOU ARE CORRECT.
Of the choices (consistency, objectivity, or relevance), consistency comes closest to the meaning of test reliability.
_____________________________________________________________________
back to measurement primer menu
![]()
next question
_____________________________________________________________________
No, of the choices (consistency, objectivity, or relevance), consistency comes closest to the meaning of test reliability.
_____________________________________________________________________
back to measurement primer menu
![]()
next question
_____________________________________________________________________
YOU ARE CORRECT!.
The true score is defined as the average score a person would get from taking an infinite number of equivalent forms of a test, assuming that the person was not affected by taking tests.
Since those conditions can't actually be met, it is a hypothetical construct.
_____________________________________________________________________
back to measurement primer menu
![]()
next screen
_____________________________________________________________________
No, the true score is defined as the average score a person would get from taking an infinite number of equivalent forms of a test, assuming that the person was not affected by taking tests.
Since those conditions can't actually be met, it is a hypothetical construct.
_____________________________________________________________________
back to measurement primer menu
![]()
next screen