STATISTICAL ANALYSIS TOOLS
____________________________________________________________________Time Series
Autocorrelation
Chi Square
ANOVA, t test and Mann-Whitney U
Binomial Expansion
Bayesian Analysis
Finally
Time Series
The most useful tool for statistical analysis in single-case design is probably a simplified form of time series analysis. This tool is flexible, makes few assumptions about the form of the data, and is particularly appropriate for the A-B design recommended earlier in this overview.
In essence, time series analysis is about whether there is an evident trend in the sequential measurements. Trend is evaluated by two features: slope (going up, going down, or flat) and magnitude (if there is change, how rapidly is it occurring). You will typically evaluate the slope visually using the time series graph. The magnitude of change is evaluated in this program using the C statistic.
For example, 5-4-5-5-4-5 would be identified as a horizontally stable or flat pattern. There is no evident trend. Differences in this series of numbers are likely to be only a result of random variation. On the other hand, a series of: 1-2-3-4-5-6 shows a clear trend. These differences indicate a nonrandom variation through the series.
In simplest form your goal is to have horizontal stability (only random variation) in the baseline data. Data from the treatment period are then appended to the baseline data and tested to determine if the horizontal stability continued or if a trend was now evident (nonrandom variation in the combined sequence).
Assume that you have gathered a series of baseline measurements, identifying them as variable 1, and gathered a series of measurements after treatment, identifying them as variable 2. With this time series program you can determine the likelihood that the:
1. baseline was stable before treatment began (random variation).
2. treatment measures have stabilized (random variation within the treatment).
3. treatment resulted in any significant change from the baseline pattern (nonrandom variation when treatment data are appended to baseline data).The output of this time series program for various variable combinations is:
1. C statistic
2. z (C statistic divided by standard error)
3. p valueTo illustrate, assume that the measurements identified as variable 1 in the sample data (loaded when you access the statistical analysis program) represent baseline measures of a student's feelings about being successful in attaining goals. Eight baseline measures are included.
Assume that variable 2 in the sample data represents measurements of the same perception gathered after the student begins participation in a weekly group guidance activity. There are eight variable 2 measures in the sample data.
If you analyze the sample data using the time series program, the results indicate that:
variable 1 has horizontal stability
variable 2 has horizontal stability
there is an evident trend when variable 1 and variable 2 are combined (which is the desired outcome)In these sample data there is thus a statistically significant change associated in time with the variable identified here as the treatment variable. As discussed in the section on research design, the data in this hypothetical example do not alone address whether the treatment was the cause of the change. This analysis also does not indicate whether the change was positive or negative. Questions about cause require other designs. Questions about direction of change, however, are evident in visual inspection of the data and also can be more specifically identified by clicking the programs which provide the mean and the median for the variables.
A horizontally stable baseline is a desired characteristics in single-case research. The horizontal stability in the treatment variable can also be significant for the practitioner, providing some information in decisions about whether treatment can appropriately be terminated.
This simplified time series is a very flexible program. It is the method of choice for most A-B analysis, and the general concept can easily be applied with data obtained from a multiple-baseline design. For example, you could enter the data for one student (problem or setting) and complete the analysis. Then for the other student (or other problem or other setting) you would enter the data with the extended baseline first to test whether changes occurred before treatment was initiated and then to determine if there was evident trend after treatment began.
A primary advantage of the C statistic for time series analysis is the significant reduction in the number of data points required. While not a substitute for the more complex time series techniques, neither does it require the collection of 50-100 data points per phase (variable) before analysis. In fact the C statistic appears to have little loss in power to detect a trend with as few as eight data points (measurements) per variable. You can of course compute the statistic with even fewer measures, but, unless clearly evident, a trend may not be detected.
Formulae used in the time series statistical analysis program are available in Suen and Ary (1989), Tripoldi (1994), and Tryon (1982). Original work on the C statistic was done by Young (1941). Additional detail about the C-statistic is available in these sources cited in the references.
Autocorrelation
The autocorrelation function is closely related to the C-statistic described above. Both are intended to assess the serial dependency of time series data. When applied to the same set of data, the results are typically comparable. If the C-statistic suggests that a data set is horizontally stable, the autocorrelation r will not be significant at the .05 level.
In essence, autocorrelation is a form of Pearson's product moment correlation coefficient. It is both a simple and a complex analysis tool. In simplest form the autocorrelation analysis is a correlation of each measurement with the measurement immediately following it in the series. This form is a lag-1 correlation, and it is the form most often used in single-case studies. (Lag-2 correlates the measurement with the one which is two ahead, and so forth).
To illustrate, assume that the repeated measures in one variable in a single-case study were 10-9-8-7-6-5-4-3. A lag-1 autocorrelation in effect creates two variables: 10-9-8-7-6-5-4 and 9-8-7-6-5-4-3 and then correlates the two simulated variables. Notice that if there are eight measures in the original variable, the lag-1 autocorrelation is between simulated variables with an n of seven.
There is, however, complexity with this tool. Some texts recommend calculation using the standard Pearson product-moment formula. Others suggest a formula specifically designed for use in time series analysis. The standard Pearson formula uses the mean of each simulated variable in the calculation. The special formula uses a common mean based on the original variable. Results of the calculations are similar but not identical.
The process is further confounded when determining whether the obtained autocorrelation r is statistically significant. There is a major problem in use of the special autocorrelation formula when the number of observations within a variable or phase is relatively small. For example, if there are fewer than twelve measurements, the autocorrelation r has to be perfect (1.0) in order to be judged as statistically significant using the recommended estimation of standard error.
The problem of determining statistical significance of the autocorrelation r is, unfortunately, not solved by using the Pearson formula. The usual criteria for evaluating a Pearson r assume a sampling distribution not typically evident in the sequential measures. As a result, the standard tables typically underestimate the statistical significance level of an autocorrelation r. And, mathematical corrections to the standard tables do no work effectively when the number of measurements is small.
After weighing the pros and cons of the two calculation formulae, the decision was to use the more familiar Pearson coefficient in this program. Unless you have an especially large number of measurements within a variable, you will want to be aware that the reported significance level may be overly rigorous.
Tests of serial dependency using the C-statistic and the autocorrelation function will generally result in the same identification of significance level. However, the intended use of the two is quite different, so both are included in this statistical analysis program.
In the time series analysis with the C-statistic, the general concept was an analysis of each variable separately and then an analysis of the outcome if one variable was appended to the next (e.g. baseline, treatment, and then baseline + treatment). In contrast, the autocorrelation function is calculated only within each of the variables.
In single-case statistical analysis, assessing the extent of autocorrelation is only the first step. If you are not using the C-statistic time series for data analysis, the autocorrelation is calculated to determine what is appropriately available as an alternative. For example, if the autocorrelation is not significantly higher than zero (remember the caution above), you can proceed with a traditional test of the significance of difference (e.g. t test, ANOVA, or nonparametric Mann-Whitney U. If the autocorrelation is significant, the serial dependency in the data will likely overestimate the outcome of such analysis, causing you to infer a difference when the null hypothesis should be retained.
You will probably find the most use for the combined autocorrelation and ttest, ANOVA, or nonparametric test when you have used an alternating treatments design. To illustrate, we can amplify the example used in description of the time series procedure. In that example, variable 1 in the sample data set was identified as the repeated measures in the baseline; the treatment identified as variable 2 was participation in a group guidance activity. The sample data set loaded with the analysis program also includes measurements identified as variable 3. Assume that this represents another form of treatment, perhaps individual counseling sessions, which was randomly alternated with the group guidance activity.
With these sample data the autocorrelation r's were not significantly greater than zero (.05 level) within any of the three variables. For the hypothetical illustration, there was thus no evidence of significant serial dependency in the repeated measures in the baseline data, in the group guidance data, or in the individual counseling session data. This indicates that you can proceed to test the significance of difference between these variables using either the nonparametric Mann-Whitney U or the more powerful t test or ANOVA.
Chi Square
A special application of the familiar chi square tool for single-case statistical analysis identifies "desired" and "undesired" outcomes using a celeration line for a 2 x 2 analysis. Computational details and rationale for this application are available in Bloom, Fischer, and Orme (2003) cited in the references section.
In essence the process begins with identification of a line indicating whether there is evident increase (acceleration) or decrease (deceleration) in the baseline data. The line is then extended through the treatment data. The two columns in the first row for the chi square analysis are the number of points in the baseline data that are at or above (usually the desired zone) and the number of points that are below (usually the undesired zone) the calculated celeration line. Comparable data for the treatment phase provide the second row for the 2 x 2 analysis.
If your data include at least eight baseline observations (identified in this program as variable 1) and an intervention phase (identified as variable 2), the celeration line will be automatically calculated. The points above and below the celeration line will appear for calculation of the chi square statistic.
ANOVA, t test, and Mann-Whitney U: Testing Significance of Difference
In single-case statistical analysis, these tests will generally be applicable only when you have used an alternating treatments design. Continuing the illustration above, since the autocorrelation r's were not significant, you can proceed with t-test, analysis of variance (ANOVA), or the nonparametric Mann-Whitney U. Information about how to use these tests is in Sheskin (1997) cited in the references. These procedures are also described in most basic statistics textbooks.
Your decision about which to use may be influenced by which form of risk you wish to take. In many instances the quality and form of the data would suggest that you choose the more conservative approach with the nonparametric Mann-Whitney U analysis. The disadvantage is that nonparametric tests have less "power" to detect a significant difference. You risk failing to reject a null hypothesis when the differences between variables are in fact unlikely to have occurred by chance alone.
You can instead elect to "go for the power" and test for the significance of difference between means using the standard ttest (independent measures, no assumption of equal n's) or ANOVA. With that decision, the risk instead is that you will reject the null when the difference was actually the result of peculiarities in the data rather than differences in the treatment.
(Your decision about which to use may be a projective personality assessment.)
The sample data illustrate the power difference. The nonparametric Mann-Whitney U analysis finds significant difference between baseline (variable 1) and group guidance (variable 2) and between between baseline and individual counseling (variable 3). The difference between group guidance and individual counseling is not statistically significant.
Both the t test and an ANOVA with these same data again finds both group guidance and individual counseling significantly different from the baseline. But, with either of the more powerful tests the difference between group guidance and individual counseling was also statistically significant.
Binomial Expansion
The binomial expansion is a useful tool for analysis of behavioral data. Given x number of events, what is the probability of getting y number of successes by chance? That is the question answered by the binomial expansion.
The t-test and Mann-Whitney U test described above are concerned with "how much" difference exists between the variables. In contrast, the binomial expansion is focused on the consistency of whatever differences may occur.
To calculate the binomial, you need only count the number of events and the number of those events which you would define as successful. These two numbers are then used to calculate the likelihood that this number of successes could have occurred by chance alone. The binomial expansion can also be used in concert with the other analysis procedures to address the generalizability or external validity issue in your research.
For example, assume you have a particular favorite treatment protocol. You recommend and/or implement this treatment with a single student, analyze the data with the time series program, and it appears to have made a difference (something made a difference). You do the same thing with another student, and then another, and then another. Of the total number of students with whom this treatment has been used, how many of the results indicated a statistically significant change and how many did not? These data provides the basis for the binomial.
For example, assume you have tried this same treatment with six students. Five of the time series analyses indicated a significant change; one did not. The probability of obtaining five successes in six trials by chance alone is .109.
Note that the outcome of the binomial analysis is a direct probability of occurrence. Thus, the typical standard of .05 or .01 for statistical significance does not necessarily apply.
Bayesian Analysis
This is a special application of the theorem first proposed by an English clergyman, the Reverend Thomas Bayes, in the 18th century. Information about the rationale for this technique is in Phillips(1973) cited in the references.
Bayesian analysis rests on a premise that the probability of a particular outcome is based in part on the fact that some other outcome has already occurred. It provides a guide for changing existing beliefs when there is new evidence. The objective in this application, similar to that described above for the binomial analysis, is as a tool for synthesizing results of replicated single-case studies. The data for the analysis are the p values obtained in the time series analysis.
Suppose, for example, that you conduct a study with one student and find an outcome of .051 (p value) that a trend was evident when intervention data were added to the baseline observations. Even though this is very close to the criterion for rejecting the null hypothesis of no change, it is greater than .05, and you are expected to fail to reject the null hypothesis. (Some researchers are tempted to report that a finding this close to the .05 criterion "approached" statistical significance, verbiage that can cause heart palpitations in conservative statisticians.)
You then conduct a second study of the same intervention with another student. The p value result for the second study is .119. In the traditional model, you've now tried this approach with two students, and in both cases the outcome was not statistically significant.
Bayesian analysis of those same data, however, suggests a different interpretation. The .051 outcome serves as the prior belief. Given that prior information, a finding of .119 in a second study, results in a new prior belief of .007 when the results of the two studies are combined. The .007 then becomes the new prior belief for use with data in another replication. The correct verbiage for the combined p values is that the chances are 7 in 1000 of finding the amount of variation in data when treatment is added to baseline if the variation is only random.
Finally
Each of the analysis techniques described above has some advantages and some limitations. Before concluding, however, it is important to emphasize that statistical analysis in single-case research is just a tool. The premise here has been that visual analysis of data alone is seldom sufficient. In essentially all cases, your conclusions will be stronger if you do a statistical analysis of your data. But, in most if not all cases, you will be best served if your overall work with the data includes both visual and statistical analysis. For example, graphing the data during the process of the study may help you identify a trend which suggests a needed change in your overall design. Single-case research allows you to make a change without necessarily losing the data already obtained.
Remember that statistical analysis is designed to identify whether a change is statistically significant and does not address whether the change is clinically important. In this form of analysis (and the same generally holds true for the group designs as well), if you gather enough data points, you can probably find a statistically significant outcome for almost anything. Common sense is still the most powerful analysis technique.
Unfortunately, common sense can at times lead one in the wrong direction. The use of the tools described and available in this program can help prevent jumping to erroneous conclusions about the effectiveness of some treatment and thus improve the overall quality of services we provide.
_________________________________________________________________________
continue to references
statistical tools review questions
access statistical analysis tools
back to program guide
Jones Home Page