Adaptive Trial Designs for the Development of Treatment Parameters

CREd Library, Research Design and Method

Adaptive Trial Designs for the Development of Treatment Parameters

Sharon Yeatts

November, 2013

DOI: 10.1044/cred-pvd-c13001

The following is a transcript of the presentation video, edited for clarity.

My name is Sharon Yeatts, and I’m an Assistant Professor of Biostatistics with the Data Coordination Unit at MUSC. And we are responsible in many cases, as the Statistics and Data Management Center for helping to design and implement these clinical trials in a number of different disease areas. There are a number of phases in clinical development, and oftentimes we find ourselves at the back and with a confirmatory clinical trial that did not work, and we are left to ask why that trial did not work.

Clinical Development

So I’d like to start by sort of stepping us through the clinical development process and point out where these adaptive designs can be particularly helpful.

In Phase 1 we typically refer to these as dose finding studies, and here we’re interested in assessing sometimes pharmacokinetics, always safety and feasibility, and this is true of any intervention, not just of a pharmacologic agent. We’re trying to establish what the dose is that gives us the most likely impact when we move to future phases.

The next step would be Phase 2, safety and efficacy, where again we’re still looking at safety and feasibility but now we’re also trying to establish that there is some evidence of therapeutic activity, either a biomarker that indicates that the agent is, or the intervention is working, or a surrogate outcome that is associated with our long-term clinical outcome. And our conclusions in Phase 2 are generally based on informal comparisons. We generally at this stage do not have enough subjects to make a definitive statement about whether or not an intervention works.

We save that for Phase 3, the confirmatory stage, where here we’re trying to establish definitive evidence of efficacy based on formal comparisons which are designed to hold to strict statistical operating characteristics. As I said, it’s often the case not only in this area but in many other areas, that we find ourselves with a randomized control trial that shows no difference, and we’re trying to understand why.

It’s not in this Phase 3 that I would argue that the adaptive designs are going to be most helpful, although there are statisticians who would say this is where we could use that information. We have enough subjects at this point.

But in the drug development stage where you’re really learning about the intervention, where you’re trying to establish how long should we be working with a patient on any given session, how many swallows do we need to see, it’s establishing these sort of treatment parameters that the adaptive designs can really be beneficial.

Dose Finding Overview

Objectives

So we’ll start with dose finding.

In dose finding, we’re trying to establish an optimal biological dose that we are going to move forward to future studies. This may involve estimation of pharmacokinetic parameters. It’s almost certainly going to involve an assessment of tolerability and feasibility, and a quantification of the toxicity profile and I’ll show you an example of what that means.

Dose-Response Curve

What I’ve shown here is a typical dose-response curve. The effect is increasing along the y-axis, and that effect could be anything. It could be a toxicity endpoint, it could be a tolerability endpoint, something like fatigue, which we don’t necessarily consider to be a toxicity event, but which has implications for whether or not an intervention could be feasibly applied in our patient population.

It could also be an efficacy response that we’re trying to assess, and we’re relating that effect to some dose, and that dose increases along the x-axis. And I’ll say here, because in the disease areas that I work in most of my experiences have been with pharmacologic agents, dose doesn’t have to mean dose of a pharmacologic agent. It could mean the duration. We can increase duration. We can increase the intensity. We can increase the total exposure. We can even do multiple of these components at the same time. Of course, it’s going to increase the complexity of your design, but it can be done.
And so if you think about the dose space in this sort of broad capacity, where dose could mean any number of attributes of your intervention, we can divide the dose space into three general areas.

The low dose space in many instances shows no effect at all. The body is a very, it’s a fabulous thing. It can respond to many insults without your even knowing that the insult took place.

You have to get beyond some dose threshold in order to see the activity in the response, and that’s where we really want to focus our attention, is in this middle section of the curve, where as your dose increases, the rate of your response changes as well. And again, I’ve shown you here, because we’re talking about Phase 1 dose finding, which is usually based on toxicity, I’ve shown you an increasing dose response curve. But we could draw this very easily the other way, where as the dose increases, the response declines. The methodology is the same.

So there’s this region of therapeutic activity, where as we increase the dose the response changes, and then we get beyond another threshold into this plateau region where essentially we can’t effect a change in the outcome anymore. Either the dose has become so toxic that all of our patients are exhausted, or we’re doing so well that there’s just no additional benefit that we can derive from providing additional dosing.

Defining the Optimal Dose

So when we talk about defining the optimal dose, I’ll ask you to keep that dose response curve in mind, because the optimal dose can mean different things, and it may very well mean different things at different stages of study or for different interventions.

The two definitions, I guess, that I’ll focus on are the Maximum Tolerated Dose, which is the highest dose that you can administer without observing unacceptable toxicity. This is going to basically be the upper end of that therapeutic region.

The other dose that we might be interested in finding is the Minimum Effective Dose. That’s the lowest dose that you can administer that will show you some form of efficacy response and that’s going to give you the lower bond of that therapeutic window.

And again, I’d like to point out that what we really want to do is to spend in our dose-finding studies as much time in this therapeutic window as possible. It’s fine to get data in here [the Non-toxic, Non-efficacious area], and it may be necessary to get some data in that region. We don’t want too much data in this region [the Toxic Dose area], so we really need designs that are going to help us focus our attention, focus our resources in this window.

So why does that have to be so complicated? We all understand the concept behind a randomized control trial. It’s the gold standard in establishing that there is a difference between interventions. So why can’t we use that methodology in a Phase 1 design?

Well, the answer is in some cases maybe you can, but in many cases you can’t and the reason is because of ethics.

So ideally we would take our subjects as they come in and randomize them to a placebo group and one of K dose groups. But if we think, as we do in many cases, that as the dose increases, either your toxicity events or your fatigue increases as well, then you can’t ethically randomize someone to dose number 3 when you haven’t established that dose number 2 is tolerable. So this doesn’t work.

Dose-Finding Designs

But we still want to treat the dose finding in as statistically sound a manner as possible. We know we’re not going to have very many subjects to figure out what this dose is, so we need to make the best use of the information that we possibly can.

There are basically two schools of designs for dose finding. One of them I’ll refer to as rule based, and the other as model based. The rule-based, many of you might be familiar with. It’s sort of rampant in the literature, regardless of what disease area you work in, and that’s because it’s very easy to implement. In either case, the outcome is the occurrence of some target event. In many cases we refer to this as a dose-limiting toxicity, but again it doesn’t have to be a toxicity event. It could be an indication of fatigue.

The dose levels are prespecified, so the investigator starts with maybe seven dose groups where he knows I’m going to administer 7, 32, 57.
And how you escalate or de-escalate through those dose groups is predefined. I’ll show you the algorithm. It doesn’t look that easy on the page, but when I show you how it works in practice you’ll see it’s very easy to implement. You take the flow chart, as your data come in, you move through it and the trial essentially runs itself.

The stopping rule is prespecified, according to that algorithm and there are reports in the literature that say that this particular design targets a 33% rate of dose-limiting toxicities. So I’m willing to accept that up to a third of my patients will experience fatigue sufficient for them to say, I can’t do this anymore. In practice, that rate is actually a little bit less. It tends to be more in the 15 to 22% range than in the 33% range.

The model based are relatively recent developments in statistical methodology. They use the same outcome, a binary indication of whether or not a target event has occurred, but you don’t have to specify the doses. The algorithm will do that for you.

You don’t have to specify in advance how you’re going to escalate or de-escalate. Again, the algorithm will do that for you. You will specify a stopping rule, and the really nice feature of this is you can change that target probability that you’re willing to accept. So if 33% fatigue is too high, you can say, no, I’d rather do 20, or it’s not high enough, I’d rather do 50. And you can specify that in the model and the algorithm just responds.

Rule-Based: 3 + 3 Designs

So the rule-based design that is most common is the 3+3. The flow chart is really not that bad, but just on paper without actual patients it can seem a little complicated.

So you start by treating three subjects at a particular dose.

And hopefully none of those subjects experience your target event, and if that’s the case, then you are allowed to escalate to the next higher dose, whatever that may be.

If one of those initial three subjects experiences a target event, then the algorithm says, okay, now we’ve seen something, we have to be a little bit concerned about what’s happening at this dose, so I’m going to enroll another three subjects to get a better handle on what’s happening.

If you observe no toxic events in that second cohort, so you have one out of only six subjects, then the algorithm says, okay, you’re less than the 33%, you can increase.

At any point, if you experience more than one DLT out of your treated subjects at a given dose, you de-escalate.

Once six subjects have been treated at the previous dose —

The study stops.

So again, on paper it’s a little complicated. We’re going to run through how this works in practice so you can see.

Simulated Trial

This is a simulated trial — I do apologize for that — that was put together when we were proposing to do one of the model-based designs in an ICH population. So along the x-axis is the dose of deferioxamine that we were planning to administer and along the y-axis the probability of a dose-limiting toxicity.

On the right-hand side, the bottom means no DLT was observed by a given subject, the top means we did observe a toxicity event.

So we specified in advance that we would start treating subjects at 7 milligrams per kilogram and increase in units of 25.

So we treated the first cohort of subjects at 7 milligrams per kilogram, 3 subjects treated, no target events were observed.

So we increased to 32. We treat another three subjects, no target events are observed.

So we increase to 57. We treat another three subjects, no target events are observed.

We increase to 82. Now, here’s where it gets interesting. At 82, our fourth cohort, one subject experienced a target event, so the algorithm says, okay, now we need to slow down. I need three more subjects in order to understand what’s happening.

So we enroll another cohort of subjects at the same dose. That’s our fifth cohort, and none of those subjects experience target events, so the algorithm says, you’re good, you can escalate.

So now our sixth cohort of subjects is treated at 107, and all three of them experience target events. So now the algorithm says, you’ve overshot the mark, you need to come down to the previous dose.

But we’ve treated six subjects at that dose, so the trial is done, and this 82 milligrams per kilogram would be our recommended maximum tolerated dose for future study.

So these rule-based designs do have some advantages.
In practice, it’s actually very easy to implement. As I said, the trial would basically run itself with the use of that flow chart, but I’m hoping that you can already start to think about some of the problems that are associated with this design.

So first of all, what if I missed the dose? I have to prespecify these doses, and so the clinician and I, when he sat down, he said, well, let’s start at 7 and up by 25, because that seems reasonable. But what if the true dose I want to hit is at 70? I’m never going to get there. I can either underestimate it or overestimate it, but I’m almost guaranteed to do one or the other. I’m never going to hit it exactly.

The other issue is that we jump by the same increment regardless of what we’ve observed. Again, it might have been different if I’d specified another dose in the middle, but I didn’t. The algorithm doesn’t take into account the fact that now that we’ve started to see something happening, maybe we should move just a little bit more slowly.

Rule-based designs often have patients who were treated well below the therapeutic range. As you can see in this graph, it took us nine subjects to get to any point of activity of the agent, which in particularly rare diseases, may not be a very good use of the patients that you do have access to.

The decision rules don’t make use of all the available data. When you’re deciding to escalate or de-escalate, you consider only what happened at the dose you’re currently administering. And that’s a real problem.
And the estimate of the optimal dose is biased, and it’s variable. It has no choice but to be, because it’s based on only three subjects, maybe six at any given dose.

Operating Characteristics

So if we wanted to refine this design, what would we want the design to look like? What characteristics would we want it to have?

From a clinical perspective, we would want to pay attention to doses around the Maximum Tolerated Dose. We don’t want to spend too much time in that subtherapeutic area, we don’t want to spend too much time in the overly toxic area.

By extension we want to minimize how many patients we’re treating at subtherapeutic levels. So keep in mind that this was developed in cancer and in Phase 1 cancer studies, these are usually folks who are basically at the end of their rope. They tried everything else that is available to them, and this is their only option. So we really don’t want to have too many of them treated in doses where we know they’re not getting any efficacy at all.
But we need to obtain information on interpatient variability and cumulative toxicity.

Statistically, we wanted to have a high probability of terminating at the correct dose, or at least near the correct dose. We wanted to have a low probability of stopping before the truth, and a small probability of escalating beyond the truth.

Model-Based Designs: Continual Reassessment Method

And so the statistical answer to that is the continual reassessment method. This was proposed — I’m not sure, maybe 20 years ago — by O’Quigley, and there have been a ton of papers extending it making variations on it, improving upon it in the last couple of decades.

The continual reassessment method allows you to use all of your data and adapt to that data as you accumulate it.

So the idea again, when I show you how the design works in an assimilated trial, I think it will be a little bit more clear — the first cohort is treated at the maximum tolerated dose that’s identified based on some hypothesized curve.

So when we sat down to implement this study in ICH, the clinician and I sat down, and he said, “I think I can make it all the way up to here before anything happens, and then I think I’m going to max it at about here.” And we played with some curves until he said, “I think it’s that one.” And we said, “Okay, now, this is where we’ll start.”

So you treat three subjects at the Maximum Tolerated Dose that you hypothesized based on that curve, you observe the outcome for those subjects, re-estimate the curve using all of the data, both your hypothesis from the beginning and the data you’ve just accumulated, re-identify the Maximum Tolerated Dose, and that next cohort of subjects is going to be treated then at the new estimate of the Maximum Tolerated Dose.

So with each new cohort of subjects, you’re treating them at your best guess of the Maximum Tolerated Dose, which is a nice feature if you believe that the Maximum Tolerated Dose is really going to be your best bet in terms of efficacy.

And you keep repeating this process until some stopping rule is achieved. That stopping rule can be based either on the target sample size having been enrolled and treated at the Maximum Tolerated Dose, until the maximum sample size for the study overall has been met, or until you’ve achieved some level of convergence or precision.

And once the trial ends, the Maximum Tolerated Dose is considered to be the dose that you would have assigned to the next subject to be enrolled. Again, I think this will be a lot easier once I show you an example.

Variations

I’m not above saying that it had some issues when this design was initially proposed. A number of the clinical folks came back and said, “Whoa, whoa, whoa. One, we’re going to treat one subject at each dose? That’s ridiculous, no way, not going to happen.”

They also said if you don’t restrict how you move through that dose space, my hypothesis may be way wrong, and the first cohort of subjects is treated at maybe three times the true Maximum Tolerated Dose, and now I have to wait to come back into the correct dose space.

So there have been a number of variations which were put forth to improve on these, to make the design more palatable to the clinical collaborators. Probably the most widely known of which is to treat a small cohort, so instead of treating one, we treat three subjects at each dose. It’s not a magic number, it’s sort of like .05. It’s just a number. We treat three subjects at each dose, and we can restrict the escalation process so that it doesn’t move too quickly.

And there are a number of ways that you can do that. You can do that by choosing to treat your first cohort of subjects at some low dose based on conventional criteria and not what you think the Maximum Tolerated Dose is.
You can restrict the escalation by specifying that the dose is not going to be governed by the model until you’ve actually observed some toxicity in your patients. The idea behind that is you need to see some patient variability before you can trust that the model is doing what it ought to do.

And you can also restrict escalation by specifying the doses in advance and saying, I’m not going to skip over any dose that I haven’t already tried.
But all of these modifications are going to impact the statistical operating characteristics a bit. Right, so if you restrict the escalation, you’re not going to get to the Maximum Tolerated Dose as quickly as you would have otherwise, so we need to keep those things in mind.

Another popular variation is the continual reassessment method with an expansion cohort, which says that once I’ve identified what that Maximum Tolerated Dose is I’m going to enroll another 6 to 15 subjects in order to gain some additional safety data at that dose as well as to gain some efficacy information.

So what does this look like in practice? This is another simulated trial. This is the same scenario that I described before.

So some differences that I would like to point out. So here our preference in ICH is actually to target a 40% dose limiting toxicity rate, because the literature suggests that 40% of subjects untreated are going to have serious adverse events. So it doesn’t make sense for us to restrict our dose to something less than what’s available in an untreated population.

And the curve that you see here is what we hypothesized that dose toxicity curve to look at before we started the study.

So if we were to go with the strict definition of a CRM, our first cohort of subjects would have been treated somewhere out here at 92 milligram per kilogram.

The neurologist who is the PI of the study is a very nice guy, and he is very concerned about patient safety and there was no way this would fly. So we restricted the escalation, as I described to you. So we started the same 7 milligrams per kilogram I showed you in the 3+3 design and we restricted to 25 unit increments until a toxicity event was observed.

So we hypothesized this curve and treated our first cohort of subjects at 7 milligrams per kilogram, and there were no target events observed.

So we increased to 32. The next cohort of subjects is treated at 32 milligrams per kilogram.

And what I’ll show you here — it’s a little bit difficult to see in this cohort. It’ll become more obvious later on — is that after we get the information from a cohort of subjects, we re-estimate the curve and the curve shifts just a little bit, so you can see this curve [dashed line] is where we started our hypothesis. After we get that information, we switch the curve and it shifts just a little bit [solid line] — not a whole lot, because no target events means not a lot of information added to our hypothesized model.

The next cohort of subjects is going to be treated at 57.

And again, no toxicity events. So this looks exactly like the simulated trial I just showed you so far, right?

When we get to 82, again, same situation.

I have three subjects treated, one of them experiences a toxicity and now you can see that shift in the curve [from the dashed line (original) to the solid line (updated)], and now this is where you’re going to see how different this design is from the 3+3.

So now that I’ve experienced one toxicity event, I’m going to let the model tell me where my next subject should be treated. So I follow my 40% line over to the curve, I drop it down, and we think the Maximum Tolerated Dose is now at about 87.

So we treat our next cohort of subjects at 92. I treat three subjects, and two of them experience a target event. And so you can see now that the curve has shifted [from the dashed line (original) to the solid line (updated)]. We’re shifting to the left, because we have new data to suggest that maybe 92 is a little bit too much.

And again, if we follow the .04 probability over to the curve, now we have 87.

I treat three subjects at 87 milligrams per kilogram and none of them experienced a target event.

The curve shifts again and says, okay, I should go back to the 92 and see what happens there.

I treat three subjects at 92, and all of them experience toxicity events.

The model adjusts and takes us back down to about 89, so my next cohort of subjects is treated at 89 milligrams per kilogram.

One of them experiences a toxicity event, and now you’ll notice there’s no change in my curve.

So we might consider at this point that the algorithm has converged and we’re going to claim that the Maximum Tolerated Dose is this 89 milligrams per kilogram.

Advantages and Disadvantages

So what sort of advantages do we get from this approach? Well, it’s a combination of clinical judgment and statistical rigor. We can understand the operating characteristics, we can control the target probability rate that we’re interested in, and the model uses cumulative information from all of the patients, not just the three that are being treated at the current dose, but it lumps your initial hypothesis and all of the accumulated data that you have thus far to guide you in future steps, so you really are using that sort of adaptive mind frame. You’re adapting to the information as it comes in to update your best guess of the Maximum Tolerated Dose.

It allows you to estimate the Maximum Tolerated Dose from a continuous spectrum of doses. I didn’t have to specify in advance what doses I was interested in. The model tells you.

And you can restrict it. If it’s too hard to come up with a dose of 89 and a dose of 88, you tell the model the doses have to be so many units apart in order for me to distinguish them.

Audience Question

In our field we don’t typically have a toxicity problem. We’re not interested in killing the patient, but we have this issue where we could get to a point where we’ve either irritated the patient so much or they are too tired — it’s not a toxicity issue, it’s a tolerability issue. Can we get them to do the extra swallows that they need in order to proceed?

There are tweaks to this model that I will talk about, so I would argue that you are in this case where you’re not interested in moving smoothly through this dose space. So if you say along the x-axis maybe that this is the number of minutes per session, that you’re not interested in starting at zero or five minutes and going in 10-minute increments through the session.

You’re interested in saying, this is what I think is the right place to start, let’s collect some data and let it guide me through. So it’s really just sort of tweaking what I’m referring to as a toxicity event really to be a tolerability event. How much can the patient stand in order to continue along this therapy that we think will help them if they just do it?

In many cases we treat dose finding as we’re trying to find the Maximum Tolerated Dose and then we’re done. In cases such as this, I don’t think that’s the answer. You want to find the Maximum Tolerated Dose and then maybe start there and work backwards to find the Minimum Effective Dose.

How little can we get away with and still have a reasonable outcome? I’m glad you brought that up.