The following slides accompanied a presentation delivered at ASHA’s Clinical Practice Research Institute.
Data speak, not men…
“Designs have inherent rigor but not all studies using a design are rigorous” — Randy; yesterday
“Illusion of strong evidence…”– Gilbert, McPeek & Mosteller, 1977
Effects of Interpretation Bias on Research Evidence (Kaptchuk, 2003)
- “Good science inevitably embodies a tension between the empiricism of concrete data and the rationalism of deeply held convictions.”
- “…a view that science is totally objective is mythical and ignores the human element.”
Single-Subject Designs: Introduction
- Single subject experimental designs are among the most prevalent used in SLP treatment research. — (Kearns & Thompson, 1991; Thompson, 2006; Schlosser et al, 2004)
- Well designed single subject design studies are now commonly published in our journals as well as in interdisciplinary specialty journals — (Psychology, Neuropsychology, Education , PT, OT…)
- Agencies, including NIH, NIDDR etc., commonly fund conceptually salient and well designed single subject design treatment programs — (Aphasia, AAC, autism…)
- Meta-analyses have been employed to examine the overall impact of single subject studies on the efficacy and efficiency of interventions — (Robey et al., 1999)
- Quality indicators for single-subject designs appear to be less well understood than for group designs (Kratochwill & Stoiber, 2002; APA Div. 12; Horner, Carr, Halle, et al, 2005).
- Common threats to internal and external validity persist in our despite readily available solutions. (Schlosser, 2004; Thomson, 2006)
Purpose of this presentation:
- Brief introduction to single subject designs
- Identify elements of single designs that contribute to problems with internal validity/experimental control from a reviewer’s perspective
- Discuss solutions for some of these issues; ultimately necessary for publication and external funding
Common Single-Subject Design Strategies
Obligatory Introduction
- Brief introduction to single subject designs
- Single-subject designs are experimental, not observational.
- Subjects “serve as their own controls”; receive both treatment and no-treatment conditions
- Juxtaposition of Baseline (A) phases with Treatment (B) phases provides mechanism for experimental control (internal validity)
- Control is based on within and across subject replication
Multiple- Baseline: Across Behaviors
Treatment vs No-treatment comparisons
- Examine efficacy of treatment relative to no treatment
- Multiple baselines/ variants; Withdrawal/ reversals
Component Assessment
- Relative contribution of treatment components
- Interaction Designs (variant of reversals)
Successive Level Analysis
- Examine successive levels of treatment
- Multiple Probe; Changing Criterion
Treatment – Treatment Comparisons
- Alternating Treatments (mixed m b )
ABAB Withdrawal Design
ATD-MB comparison: Broca’s aphasia
Internal Validity
- Operational specificity; reliability of IV, DV; treatment integrity; appropriate design
- Artifact, Bias
- Visual analysis of ‘control’: Loss of baseline (unstable; drifting trend); W/I and across phase changes: Level, Slope, Trend
- Replicated treatment effects: three demonstrations of the effect at three points in time
Visual-Graphic Analysis
- Within and across phase analysis of
- Level (on the ordinate; %..)
- Slope (stable, increasing, decreasing)
- Trend over time (variable; changes with phases; overlapping.)
- Overlap, immediacy of effect, similarity of effect for similar phases
- Correlation of change and phase change
Research on Visual Inspection of Single-Subject Data (Franklin et al, 1996; Robey et al, 1999)
- Low level of inter-rater agreement: De Prospero & Cohen (1979) Reported R = .61 among behavioral journal reviewers
- Reliability and validity of visual inspection can be improved with training (Hagopian et al, 1997)
- Visual aids (trend lines) may have produced only modest increase in reliability
- Traditional statistical analyses (eg. Binomial test) are highly affected by serial dependence (Crosbie, 1993)
Serial Dependence/Autocorrelation
- The level of behavior at one point in time is influenced by or correlated with the level of behavior at another point in time
- Autocorrelation biases interpretation and leads to Type I errors (falsely concluding a treatment effect exists; positive autocorrelation) and Type II errors (falsely concluding no treatment effect; negative autocorrelation)
- Independence assumption
Solutions:
- ITSACORR: A statistical procedure that controls for autocorrelation (Crosbie, 1993)
- Visual Inspection and Structured Criteria (Fisher, Kelley & Lomas, 2003; JABA)
- SMA bootstrapping approach (Borckardt, et al, 2008; AM Psychologist)
- clinicalresearcher.org
Baseline Measures
- Randomize order or stimulus sets/ conditions
- All treatment stimuli need to be assessed in baseline
- Establish equivalence for subsets of stimuli used as representative
- Avoid false baselines
- A priori stability decisions greatly reduce bias
- At least 7 baseline probes may be needed for reliable and valid visual analysis
S1 ITSACORR results were non-significant
S2 ITSACORR results were sig (F < .05)
Too few data points for valid analysis
Intervention
- Explicit steps; directions….a Manual
- Control for order effects
- Assess integrity of intervention (see Schlosser, 2004)
- One variable rule
- Is treatment intensity: sufficient; typical?
- Dual criteria for termination of treatment
- Performance level (e.g. % correct)
- Maximum allowable length of treatment (but not equal phases)
Dependent Measures
- Use multiple measures
- Try not to collect during treatment sessions
- Probe often (weekly or more)
- Pre-train assistants the scoring code and periodically check for ‘drift’
- Are definitions specific, observable and replicable?
Reliability
- Reliability for both intervention and dependent variable
- Obtain for each phase of the study and adequately sample
- Control for sources of bias including drift and expectancy (ABCs — artifact, bias, and complexity)
- Use point to point reliability when possible
- Calculate probability of chance agreement; critical for periods of high or low responding
- Occurrence and non occurrence reliability
A priori Decisions
Failure to establish and make explicit criteria for guiding procedural and methodological decisions prior to change is a serious threat to internal validity that is difficult.
- Participant selection/ exclusion criteria (report attrition)
- Baseline variability, length
- Phase changes
- Clinical significance
- Generalization
Consider Clinically Meaningful Change
Clinical significance can not be assumed from our perspective alone.
Change in level of performance on any outcome measure, even when effects are large and visually obvious or significant, is an insufficient metric of the impact of experimental treatment on our participants/ patients.
Minimal Clinically Important Difference (MCID): “the smallest difference in a score that is considered worthwhile or important” (Hays & Woolley, 2000)
Responsiveness of Health Measures (Husted et al., 2000)
1. Distribution based approaches examine internal responsiveness, using distribution/ variability of initial (baseline) scores to examine differences (e.g. Effect size).
2. Anchor based approaches examine external responsiveness by comparing change detected by a dependent measure with an external criterion. For example, specify a level of change that meets “minimal clinically important difference” (MCID).
Anchor-based Responsiveness measures (see Beninato, et al Archives of PMR, 2006) use external criterion as “anchor”
- Compare change score on outcome measure to some other estimate of important change
- Patient’s/Family estimates
- Clinician’s estimates
- Necessary to complete the EBP triangle?
Revisiting Clinically Important Change (Social Validation)
When the perceived change is important to the patient, clinician, researcher, payor or society (Beaton et al., 2001)
Requires that we extend our conceptual frame of reference beyond typical outcome measures and distribution based measures of responsiveness
“Time will tell” — (M. Planck, 1950)
“A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die.” — (Kaptchuk, 2003)
References
Beaton, D. E., Bombardier, C., Katz, J. N., Wright, J. G., Wells, G., Boers, M., Strand, V. & Shea, B. (2001). Looking for important change/differences in studies of responsiveness. The Journal of Rheumatology, 28(2), 400–405 [PubMed]
Beninato, M., Gill-Body, K. M., Salles, S., Stark, P. C., Black-Schaffer, R. M. & Stein, J. (2006). Determination of the minimal clinically important difference in the FIM instrument in patients with stroke. Archives of Physical Medicine and Rehabilitation, 87(1), 32–39 [Article] [PubMed]
Borckardt, J. J., Nash, M. R., Murphy, M. D., Moore, M., Shaw, D. & O’Neil, P. (2008). Clinical practice as natural laboratory for psychotherapy research: A guide to case-based time-series analysis. American Psychologist, 63(2), 77 [Article] [PubMed]
Crosbie, J. (1993). Interrupted time-series analysis with brief single-subject data. Journal of Consulting and Clinical Psychology, 61(6), 966 [Article] [PubMed]
DeProspero, A. & Cohen, S. (1979). Inconsistent visual analyses of intrasubject data. Journal of Applied Behavior Analysis, 12(4), 573–579 [Article] [PubMed]
Fisher, W. W., Kelley, M. E. & Lomas, J. E. (2003). Visual aids and structured criteria for improving visual inspection and interpretation of single-case designs. Journal of Applied Behavior Analysis, 36(3), 387–406 [Article] [PubMed]
Franklin, R. D., Gorman, B. S., Beasley, T. M. & Allison, D. B. (1996). Graphical display and visual analysis.. Design and Snalysis of Single-Case Research, (pp. 119–158). Lawrence Erlbaum Associates.
Gilbert, J. P., McPeek, B. & Mosteller, F. (1977). Statistics and ethics in surgery and anesthesia. Science, 198, 684–89 [Article] [PubMed]
Hagopian, L. P., Fisher, W. W., Thompson, R. H., Owen-Deschryver, J., Iwata, B. A. & Wacker, D. P. (1997). Toward the development of structured criteria for interpretation of functional analysis data. Journal of Applied Behavior Analysis, 30(2), 313–326 [Article] [PubMed]
Hays, R. D. & Woolley, J. M. (2000). The concept of clinically meaningful difference in health-related quality-of-life research. Pharmacoeconomics, 18(5), 419–423 [Article] [PubMed]
Horner, R. H., Carr, E. G., Halle, J., Mcgee, G., Odom, S. & Wolery, M. (2005). The use of single-subject research to identify evidence-based practice in special education. Exceptional Children, 71(2), 165–179 [Article]
Husted, J. A., Cook, R. J., Farewell, V. T. & Gladman, D. D. (2000). Methods for assessing responsiveness: A critical review and recommendations. Journal of Clinical Epidemiology, 53(5), 459–468 [Article] [PubMed]
Kaptchuk, T. J. (2003). Effect of interpretive bias on research evidence. British Medical Journal, 326(7404), 1453 [Article] [PubMed]
Kearns, K. P. & Thompson, C. K. (1991). Technical drift and conceptual myopia: The Merlin effect. Clinical Aphasiology, 19, 31–40
Kratochwill, T. R. & Stoiber, K. C. (2002). Evidence-based interventions in school psychology: Conceptual foundations of the Procedural and Coding Manual of Division 16 and the Society for the Study of School Psychology Task Force.. School Psychology Quarterly, 17(4), 341 [Article]
Robey, R. R., Schultz, M. C., Crawford, A. B. & Sinner, C. A. (1999). Single-subject clinical-outcome research: Designs, data, effect sizes, and analyses. Aphasiology, 13(6), 445–473 [Article]
Thompson, C. K. (2006). Single subject controlled experiments in aphasia: The science and the state of the science. Journal of Communication Disorders, 39(4), 266–291 [Article] [PubMed]
Thompson, C. K., Kearns, K. P. & Edmonds, L. A. (2006). An experimental analysis of acquisition, generalisation, and maintenance of naming behaviour in a patient with anomia. Aphasiology, 20(12), 1226–1244 [Article]