Filter by Categories
Clinical Practice Research
Ethical Legal and Regulatory Considerations
Planning, Managing, and Publishing Research
Research Design and Method

Lessons Learned About High Fidelity LEAP Replication

Questions to ask, strategies for identifying and measuring fidelity of key components, and considerations in setting a research agenda for scaling up and replicating an intervention.

Phillip Strain

DOI: 10.1044/cred-pvid-implscid2p5

The following is a transcript of the presentation video, edited for clarity.


Since 1985 my colleagues and I have been in the business of replicating the LEAP model of intervention for young kids with autism. We’ve done that in about 100 preschool settings across about 20 states. And I want to share some of that longitudinal experience, as well as some of that more narrow experiences related to a specific study we did a couple of years ago.


I’ve advertised some lessons learned. Are you ready for the first one? Here it is: “If you’re planning on getting your hands dirty at any point in time, you might as well go ahead and jump in the mud pie.”


That’s kind of, historically speaking, what we did with LEAP. So LEAP was built on a series of about 36 single-subject design studies, all conducted in criterion environments. All interventions implemented by the people we ultimately wanted to impact in the long term.


I’ll assure you there are some negatives associated with that. But I’m going to quote my favorite philosopher Moms Mabley. Moms said the following, “If you always do what you always did, you’ll always get what you always got.”


Anyhow, that’s lesson number one. Take it or leave it.

Key Intervention Components

So, what is it that we have been replicating since 1985? This is listed as “key intervention components” but now I have a new term. I’m going to call these “kernels” of LEAP.

Here are some features of LEAP that distinguish it, in many cases, not all, from other early intervention services for young kids with autism.

We spend a lot of time figuring out how to teach typically developing peers to facilitate the social and communicative skills of their classmates with autism.

We spend very little time assessing kids, coming up with functional goals and objectives. We essentially ask adult family members what they would like their kids to be doing, and if at all possible we try to deliver on that.

We also embed hundreds of learning opportunities for children throughout the day, and embed those within typical preschool routines: arrival, circle, spilling juice, those typical routines.

We also implement — in the service of generalization, not because we are committed to this service delivery model. But we are committed to serving the fundamental generalization issue that challenges all kids on the spectrum, and that is stimulus-response generalization.

That’s why we employ a transdisciplinary model of service delivery. What does that mean on the ground? It means on the ground, in a high-fidelity LEAP program, if you walk into the program and you observe what adults did, you couldn’t tell by their interactions with each other, by their interactions with children, by their interactions with adult family members, what their discipline was. That’s, to us, an operational definition of transdisciplinary.

We also collect a good amount of data, like a lot of my colleagues, not for self-congratulatory purposes. But data that allows us to know in a timely fashion that we’ve made mistakes. Those of you in the autism business know there is a high probability that you’re first best idea about how to teach behavior X is not going to be correct. And if you do not have data systems in place that allow you to get immediate feedback on children’s performance, there is a high probability that you’ll be wasting your time and theirs. It is early intervention, and the window is fairly narrow. You can’t afford to continue making mistakes over the long haul.

Unlike most name-brand programs, LEAP is not a service-delivery system based upon the delivery of just one form of instruction. It never made any sense to me. The heterogeneity in this population is too great to rely on one model. These are not all of the evidence-based practices that we use, it’s just representative. So we use Picture Exchange Communication System, primarily not as a language acquisition device, but as a problem-behavior prevention one as kids are acquiring language to be able to express their wants and needs, desires, etc. We use Pivotal Response Treatment, Errorless learning, Incidental Teaching, Peer-Mediated Intervention.

How do we make a decision? Well, we try to be as data-based as possible. And if you look at the data on each of these interventions, I believe that you will come to the conclusion that they work better in some developmental domains than others. And all things being equal, that’s how we make initial clinical decisions as to what to use with individual kids.

That’s a little bit about LEAP as a preschool intervention, but that’s just half of our model. The other thing we do is we spend time, as much as family members think they need, in folks’ homes trying to make routines more manageable for them. Our goal is not to turn parents into therapists, but to make family life more easy, more happy, more normalized.

So we deliver a highly-structured, manualized parent skill training curriculum, but we do it in settings parents say they want help in, and we teach skills not in the order that’s in our curriculum, but in the order that’s applied to that particular setting. And the staff just hope and pray the parents don’t say, “We need help at bedtime.” But occasionally, prayer doesn’t work and people’s hours are a little bit different.

Quality Program Indicators

So, in 1985, shortly after launching efforts to replicate LEAP, we were desperately in need of a fidelity measure. We asked ourselves, “Well, what are we going to do here folks?”

Here’s the strategy we used. I don’t know if it’s good, I don’t know if it’s bad, you decide for your own purposes. But here’s what we did. We went back to all those precedent single-case design, mostly multiple baseline across setting and multiple baseline across participant studies. And we asked the question, can we turn the independent variables in those studies into readily measureable, reliable coaching system indicators that allow us the best chance to say whether or not folks are at fidelity in terms of implementing the LEAP model.

And the QPI, the quality program indicators, is about a 50 item scale. It measures eight different kinds of dimensions of implementation of LEAP.

Just to give you a better idea, let’s drill down a little bit and look specifically at the promoting social interaction component. It has six indicators. And each of these indicators is measured on a five-point scale from minimally implemented to fully implemented. For example: capitalizes on the presence of typically developing peers — are you setting up a situation where peer models are readily available? We have a specific system that we call “peer buddies” and we look to see that that’s implemented across settings that it’s appropriate to be implemented in, and so on and so forth.
I want to have you look at some fidelity data from the QPI. In 2007, we launched a clustered randomized controlled trial in which we did the following: We identified 50 preschool sites across the US that currently were using an inclusive model of service delivery for young kids with autism, and who also had the capacity and the willingness to implement our parent skill training component. We, once making that selection, we randomly assigned half of those classrooms to get our intervention manuals only, and half to get our intervention manuals plus 25 days of onsite coaching to fidelity across a two-year period.


I would call your attention to this mean implementation row, across intervention classes and comparison classes. As you can see, at the start, a way to read that would be people are implementing about 25% of LEAP practices. After one year– 12 days of intensive coaching — the intervention classes are implementing about half of what we’d like them to implement. The comparison classes are doing a little bit more as a function of reading. Now I have to tell you that the publisher of our intervention manual was not doing backflips at this point in time, but that’s the way that goes. Not a huge surprise to most of you I’m guessing.


At the end of Year 2, you can see the intervention classes were implementing a high level of LEAP practices — about 87% of them, comparison classes about 38%. That distinction, that Alan Kazdin has talked about and preached about for years, is what he calls treatment differentiation.


It’s seldom measured and seldom seen in treatment studies. But I would argue is an important dimension. It’s not just sufficient to say the comparison got a business as usual. But that’s often what you hear, end of story.


Let me also give you lesson two. Are you ready for this one? This surprised us: There is not a stepwise, linear relationship between getting to fidelity and child outcome growth and development. There is a threshold phenomenon. That is to say, if you’re doing 70% of LEAP, there is a high probability you won’t see differential responding by children. Or by adult family members. It takes a high level of implementation. And that has profound implications for how you nurture people in the adoption process, does it not?


Most people think, “If I start doing this, life is going to get better. I’m going to see an effect. I’m going to feel, in the short term that this is the correct investment.” Our data show that it takes a while of plugging along and a high degree of fidelity implementation to reach behavioral effects.
Well, you do get to some good effects. Here’s a highly condensed version of outcomes for kids in this particular randomized trial. The deltas, essentially, represent different scores from a pre-test to a post-test that was conducted approximately two years later. But if you look at intervention time, the distance is only 14 months on average from the pre-test to the post-test.


A way to read this is the Childhood Autism Rating Scale, which is fairly crude, but we used it because we’ve used it for the past 25 years and we wanted to compare study-to-study, we thought that was worth the cost of being criticized for continuing to use the CARS. A way to read that is children had about 6 less, if you will, indicators of autism symptomology after intervention. The comparison kids, 2.8. And our difference is, that we analyzed with ANOVA, the different between those deltas. As you can see the effect sizes range from moderate to whopping.


A way to read, for example, the PLS-4 which most of you are familiar with. Kids on average in the full replication classrooms made 18.5 months developmental gain over a 14-month period. Kids in the comparison group made 9.4 months developmental gain over 14 months.
So we think perhaps we have something that’s worthy of continuing to replicate. What I think is important to share with you more so than the differential effects in that study are some correlations that were significant, and some correlations that weren’t.


In this chart, what we’re showing is the correlation between our quality program indicator — our fidelity measure — at the end of Year 2, and each outcome index gain in score for both groups. If you know about correlations in the behavioral sciences, these are really big. They are not trivial. And notice for each outcome for both groups, it makes a profound difference how much of LEAP was being implemented.


It makes me think — I hate to say this in public, too, because you’ll think I’m not a hard-nosed scientist — but it makes me rethink how we should define intervention groups. The way we usually define them is how we initially assign them. At the beginning. But they don’t always fall out that way. If you recall that chart when I showed you the fidelity data, one of the classrooms in the comparison group was at 92% fidelity. Are they a comparison now? Are they something different? I’ll leave that for you to decide and ponder late at night.
It also makes a difference whether people feel as if this process of coaching and the practices they are being coached on make a difference. In 1995, my colleague Frank Kohler and I looked at primarily the business literature, because that’s where most of the literature is on adoption. And businesses adopt all the time, and really quickly, and they rock and roll. And they go with the data — sometimes they don’t go with the data — but they make decisions all the time.


Here’s what we found when we looked at dimensions that influenced adoption. We have adopted this for our purposes and these are data from that randomized study. So folks rate these dimensions on a 1 to 5 scale — 1 representing “this is not applied at all” to 5 “yeah, this is great.”


And let me tell you a little bit about how we define these dimensions.


Applied means I can use it today. That was the definition. Once you tell me how to do this, there’s not a lead time between that event, my having a cognitive understanding of a procedure, and my being able to implement it.


Effective means perhaps not what you think. Effective means, in this definition, that I can apply it across contexts. So I can use this as an intervention for language acquisition at circle time, for social interaction development at free play, for developing functional use of objects at snack time.


Flexible, in this definition applies to the range of kids that I feel like this is applicable to.


And so on and so forth.


We get really high social validity scores. But here’s lesson number three. You couldn’t have possibly guessed it because I’ve hidden the facts from you, but here’s lesson number three: Timing is everything.


If you ask people to fill out a social validity scale in Year 1, they hate everything. Pretty much. But at the end of Year 2, when they’ve reached that F-word, fidelity, they feel really different about that. The timing of these assessments is crucial for a variety of different purposes. It doesn’t mean that first assessment is invalid at all. It should dictate your level of support for people in a process that sometimes is difficult on a variety of different levels.


Especially when you think back to that first chart I showed you, people in some classes were doing 3% of LEAP when they started. We’ve essentially asked people to do everything different. Everything. It’s not like adding a strategy to your toolbox. It’s like throwing your toolbox in the shed and starting all over again for many of these folks.
Here are some other correlations you might find interesting. Teacher experience and prior training was not related to outcomes.


Child characteristics at start were not related to outcomes. Trust me, we assessed everything imaginable. The correlation approaches zero. Which means that two things are true — some kids who are doing pretty good at baseline are not good responders, and some kids who look pretty slammed developmentally do great.


Fidelity turns out to be the sole, powerful predictor in our model.

Sustainability of Fidelity

So, looking back over the last 30 years of this work, we’ve had the opportunity to do some critical informant interviews with folks. We’ve been able to tap into programs who have been running at a very high degree of effectiveness for almost three decades, and some not.

Here’s our best guess at what sustains fidelity — at least with this program, in these kinds of educational settings.

The first, not necessarily in order of importance, but the first piece of it is the commitment of the primary administrator — that is the person in charge of purse strings — to the model, as opposed to the next, newest, shiny thing. By that I mean that the primary administrator is presented with the next, newest, shiny thing and his or her response was something like this: “Well, I’d like to see the data on that. Oh, that data looks pretty good. Do they have any evidence that process can be replicated outside of UC Santa Barbara … or UC San Francisco … or UC Boulder … or wherever?” So it’s not that they automatically rejected the next new shiny thing, and in the autism business, that’s probably twice every hour if you’re on the Internet, in terms of the next new shiny thing. It’s not that they offhand rejected it, but they were careful consumers, if you will.

We also have identified another key factor — or, if you will, a behavior of primary administrators. That is, they have used the QPI as their supervisory system. Instead of going into a classroom willy-nilly, which is how it usually happens, and observing without a protocol, and then writing up impressions like, “That is a pretty good teacher today. Didn’t see anybody getting hurt. Looks good to me.” they use this data system.

Here’s what they also did — they created a personnel ladder within their organization to reinforce high fidelity use. So retention and pay increases were tied to the QPI.

Stability of staff. In the early intervention business, staff turnover really rapidly. It’s a — forget about the issue of implementation and implementation science — it’s just a horrific experience for organizations, for families, for kids. It’s not so much that programs that maintained over time didn’t experience staff turnover. It’s that they had mechanisms in place to recover. The most common one being that they assigned new staff to experienced folks in the programs with the specific goal of mentoring them on items in the QPI in which they scored low in terms of initial performance.

Finally, the other fuzzy variable. It’s hard to pin down, and there have been a number of other iterations, but it looked to us like it was pretty significant. And it’s that for programs that maintained it for many, many years, LEAP was the “headline” of the organization. That’s how they advertised themselves. They weren’t just an autism program that does some inclusionary stuff, but “this is what we do. We’re going to ride this horse. This is our banner, this is our headline.”

Here are some things we were a little surprised about. We can’t find an association between service systems that were well-resourced and maintaining fidelity over time.

We did some replications in the wealthiest school district in the United States. It’s in the northeast, that’s all I’ll tell you. They didn’t last very long. But resources, incredible. We’ve had a number of other iterations of that.

We’ve also had replications in some of the poorest, impoverished places, and they’re doing just fine, thank you.

We thought that programs that got hit with a lot of traumatic events would lost their momentum. Traumatic events like, a new administration comes into power in state government and decides to slash all preschool services — that happens more often than you might think. But that doesn’t seem to be necessarily related to sustainability.

And the real surprise was, we couldn’t find any association between the size of organizational change and sustainability. By that I mean, we’ve worked in some school districts where we replicated LEAP in dozens of settings. You would think that would meet the permeability test, in terms of influencing the organization, but it doesn’t seem to predict for us.

Intervention Questions Potential Implementers Tend to Ask

Let me just end real quickly with some things I wish I would have known 30 years ago. And we’ve gotten this information again from key informant interviews.
I would invite you to think about whether the answer to these questions could be your research agenda. Because these are the questions that people ask us most often in their decision to adopt, as well as in the process of learning the strategies:

  • When will I see an effect?

    We have a lot of data on what the size of our effects are, but no one has ever said to us, “We love your effect sizes, they are so huge.” What they want to know is when is this going to take effect. And if you ask parents what they want, what’s the answer they want: “When’s my kid going to get better?” It’s a when question, and it has a profound impact on the designs we use, how frequently we collect data.

  • Is the effect going to be better than business as usual? We’ve talked about that a lot over the last couple of days.

  • People want to know what the cost is at all different kinds of levels: Dollars, the degree of change that’s going to be required in my organization, how much supervision am I going to have to take on that’s different from what I’m doing now, what’s the data collection requirement that’s over and above or different from where we are now?

  • How will I know that I’m at fidelity?

  • And perhaps more important in terms of long-term sustainability, what can I do to stay there?

    Most people already know — as has been pointed out, they’ve adopted other things in the past, and drift has clearly occurred.

  • And I don’t know the answer to this, and I struggle with this mightily: People often want to know, do I start big or small?

    My organization has 35 classrooms: Do I go all in? Do I do a test, a natural experiment? Start with 5 and see how it goes, see how the consumers feel about it, see how the implementers feel about it? I don’t know the answer to that, but it’s a question people have.

  • Can I talk to other folks who are further along the path?

    This kind of notion, Albert Bandura is one of my heroes, and I think his most amazing accomplishment is this notion of the proximal model, which is to say, if you want somebody to emulate something that pair of individuals who perceive each other to be most like one another, that’s where the power is. Frankly, we’ve done a lousy job up to now making those connections for people, but I’m committed to doing better on that.

  • Next to last — I bet y’all get this a lot — our (fill in the blank — our clients, our providers, our families, our whomever) are more needy than yours. How can this work with them?

    Everybody feels like they are in the deepest hole that can possibly exist. Right? Being able to respond to that honestly is really importantly. And I don’t know how to do that other than having replicated LEAP in some deep, dark holes.

  • Finally: This sounds like more work, how do I get my providers to buy in?

So, my time is up, thank you very much, and I’m looking forward to hearing the rest of the panel.

Phillip Strain
University of Colorado, Denver

Presented at the Implementation Science Summit: Integrating Research Into Practice in Communication Sciences and Disorders (March 2014). Hosted by the American Speech-Language-Hearing Foundation.
Copyrighted Material. Reproduced by the American Speech-Language-Hearing Association in the Clinical Research Education Library with permission from the author or presenter.