How to Reduce Sample Size for Clinical Trials
One of the questions you will often be asked if you’re a clinical trialist is: “Can’t we make the study smaller?” Reducing sample size without losing power can be accomplished by one of three ways. The first is to improve the signal-to-noise ratio. To do this, you can either reduce the noise, strengthen the signal, or reduce variability (which will both reduce the noise and strengthen the signal). The second is to use a better statistical technique, which can extract more information out of the data. The third is to multiplex: use the same patient more than once. Below are some of the ways to do these.
Use a Continuous Variable
Using a continuous variable is one of the most effective ways to reduce sample size. A continuous variable can take on any numeric value, such as 1 mmHg, 5.23 mmHg, 0.6348 mmHg. A non-continuous variable can only take on fixed values such as spring/summer/fall/winter. In fact, the most common non-continuous variable, the binary variable, can only take on two values such as yes/no.
A measurement such as average decrease in blood pressure or percent decrease in wound size would be a continuous variable. You can turn a continuous variable into a non-continuous variable. For example, you can categorize blood pressure into “high blood pressure” and “normal blood pressure” rather than using the raw mmHg measurement.
A continuous variable has much greater information content than a non-continuous variable. When you know someone’s blood pressure is 150/100, that is more informative than only knowing that he has high blood pressure. Because of the higher information content, you will find that you can often drastically reduce sample size by using a continuous variable for an endpoint.
As an example, let’s say you are testing a drug for wound healing. You have a choice of endpoints. You can use average percent healed area (a continuous endpoint) or number of patients with healed wound (the definition of healed could be fully healed, or 75% healed, etc.). With the percent healing, you will probably need about 80 patients to reach adequate power. With number of patients healed, you will probably need about 140 patients.
This is often the most powerful technique for reducing sample size.Why don’t people always use a continuous endpoint? Sometimes you can’t make the endpoint continuous because of logistic issues. More commonly, for Phase III studies, FDA will often insist on a non-continuous endpoint.
Potential reduction in sample size: 10% – 40%
Use Optimal Cut-Off for Binary Endpoint
If you have to use a binary (dichotomous, proportions) variable, then you should spend some time deciding what is the optimal cut-off. For example, let’s say you’re doing a study on arthritis and the FDA insists that you use responder vs. non-responder analysis. You can choose one of the following
- ACR20 as the cutoff (roughly equivalent to 20% improvement) to qualify as a responder, where you expect patients on your drug to have 60% response rate and the comparator to have 40%
- ACR70 where you expect patients on your drug to have 15% response rate and the comparator to have 5%
- ACR5 where you expect patients on your drug to have 95% response rate and the comparator to have 85%
Many people assume that a lower bar, such as ACR20, would give you the smallest sample size, especially since the difference you anticipate with it is 20% as opposed to 15% with ACR70.
That would be incorrect. Difference of 40% response rate vs. 60% response rate would require sample size of 97. Difference of 5% vs. 15% would require a sample size of 76. And difference of 95% vs. 85% requires the exact same sample size (76) as 5% vs. 15%.
Perhaps you might find it surprising that ACR5 and ACR70 would require the same sample size. But it shouldn’t, since 5% response is the same as 95% non-response and 15% response is same as 85% non-response.
This is because the power doesn’t depend only on the absolute difference between the groups. It depends on the odds. I won’t go into the details here, but the main thing to remember is that when you’re using binary endpoints, lowest power is near the 50% event rate, and the highest power is at the extremes, near 0% and 100%. This is true so long as the absolute efficacy difference is constant. For example, 80% vs. 85% difference would require smaller sample size than 40% vs. 45%, but not smaller than 40% vs. 50%. You can see the power tables here.
Potential reduction in sample size: 10% – 30%
Increase the Event Rate
Studies with very low event rates can require very large sample sizes. For example, a cardiovascular trial with mortality rates of 6% in the active arm vs. 7% in the control arm would require tens of thousands of patients.
You can sometimes reduce the sample size by increasing the event rate. There are several ways to increase the event rate, including changing the definition of the event, increasing the follow-up period, using a surrogate endpoint (discussed below and in another post), enriching the patient population (discussed below), or using a composite endpoint.
A composite endpoint includes multiple endpoints in one. They tend to be the most useful and clinically relevant for diseases that have multiple effects, such as diabetes or lupus. An example of a composite endpoint would be an endpoint of “death or MI.”
Composite endpoints can be tricky, because you may need to weight each component separately and the endpoint can become complex. For example, you might want to weight renal failure more than skin rash in a lupus composite endpoint.
Many clinical statistician are not trained in designing such endpoints (statistician from the financial industry are actually often better trained for this) so you would be well served by consulting someone with experience in designing and validating composite endpoint.
Returning to the earlier example, if the event rate for the composite endpoint of death/MI were 12% vs. 14% in the above cardiovascular trial, then the sample size would decrease.
It’s important to keep in mind that in order for composite endpoint to result in smaller sample size, the absolute difference between the two arms must increase as well. For example, event rate of 12% vs. 13% would not result in a smaller study compared to 6% vs. 7%. In fact, it would result in a larger study, as I explained in the previous point made above.
Potential reduction in sample size: variable, 0% – 90%+
Use Adaptive Design
Modern adaptive designs can reduce sample size, reduce cost, and reduce risk. Non-adaptive study sample sizes are often based on guesses and estimate that are imprecise. As a result, you may end up designing a study that is larger than necessary in order to avoid the risk of under powering the study. Adaptive designs can allow you to size the study to exactly the right size.
Potential reduction in sample size: variable, 10% – 50%
Run a Tight Clinical Trial
Sloppiness in clinical trial execution or mistakes in design can render patients unusable for analysis and increase study dropout rates. Sample sizes often need to be inflated to account for the dropouts. The difference between good clinical operation and poor can result in 5-10% difference in sample size simply from unevaluable patients.
In addition, good clinical trial hygiene, such as using a core lab for critical parameters, training the sites in how to perform the key measurements so that all the sites perform them the same way, using same raters to rate the patients every time, and other similar practices will reduce variability and sample size. For example, if a study in rheumatoid arthritis didn’t require that the same physician assess joint swelling on every visit, then the variability may increase substantially from inter-rater differences.
Potential reduction in sample size: 5-20%
Use Pairwise Comparisons
If you can use the same patient multiple times, that will reduce the variability of the measurements and increase power. For example, rather than using average baseline blood pressure vs. average post-treatment blood pressure, use average change in blood pressure for each person.
Potential reduction in sample size: 0 – 30%
Use Crossover Studies
Crossover studies, where possible, reduce sample size by both lowering variability as well as doubling the amount of information from the same patient. There are challenges to crossover studies, such as carryover effects, so this is possible only in certain circumstances.
Potential reduction in sample size: 50 – 70%
Use Factorial Designs
In some instances, you can double or triple the amount of information from one study by testing multiple interventions in the same study.
Potential reduction in sample size: 50 – 75%
Enrich the Patients
You can enrich the patient population in ways that will reduce the sample size substantial.
The first way is to make the patient population homogeneous. By making the patient population as similar to each other as possible, you will reduce the variability. For example, rather than including all patients with MIs, if you only include patients with anterior MIs, you are likely to have lower variability in outcomes. The tradeoff is that the generalizability of the study suffers. Alternatively, a compromise between power and generalizability would be to enroll all comers, but prespecify the primary endpoint as the enriched subgroup, and use either a secondary or hierarchical primary endpoint for the all-comers group.
The second is to select the patient population that is most likely to show a response or is more likely to show a greater amount of response. For example, if you were performing a pain study, patients with average pain score of 5 might be more likely to have 3 point decrease in pain than patients with average pain score of 3. Or patients who have had pain for a few week may be more likely to respond than patients with refractory pain who have had the symptoms for years.
The third is to select patients who are more likely to have more events. For example, patients with anterior MIs from the example above are more likely to die than patients with inferior MIs. If your endpoint is death, then you will have more power with anterior MI patients because there will be more events.
Potential reduction in sample size: 0 – 20%
Stratify the Patients
Similar to the above strategy, you can stratify the patients. This insures you minimize any potential baseline imbalance, and you can adjust you analysis to maximize the power of the study. Stratification is particularly helpful if the patient population is heterogeneous and the heterogeneity may impact the outcome significantly.
Potential reduction in sample size: 0 – 20%
Adjust for Independent Variables in the Final Analysis
An alternative to stratification is prespecified adjustment of the final analysis for imbalances. For example, you can prespecify in an MI trial that if one group has more anterior MIs than the other that adjustment to account for the imbalance will be made (the mortality rate for the group with more anterior MI will be adjusted downward for example). This can reduce variability and sample size. This technique has all the typical shortcoming associated with multivariate analysis and I am not a proponent of it.
Potential reduction in sample size: 0 – 10%
Use Sustained Response
In some diseases, such as Crohn’s disease, the natural course of the disease is highly variable and/or the measurement of outcome is inconsistent. Many patients may have falsely positive responses briefly only to relapse. In that case, sustained response can remove some of the noise. A sustain response requires that the patient show improvement on multiple visits or over a certain minimal length of time.
Potential reduction in sample size: 0 – 25%
Use Statistical Technique that Matches the Anticipated Response Curve
Standard statistical techniques assume a standard response curve. If you anticipate that there will be a subgroup that will respond particularly well, or if you anticipate that the response over time will not be a smooth curve, or if you believe that there will be other non-standard distribution of response, then your statistician should be able to use a more suitable statistical technique and improve the power.
For example, if you anticipate that your drug will have negative effects short term but positive effect long term, your statistician might introduce a time varying covariate to your Cox proportional hazard model. That will compensate for the fact that your drug has different effect over time and give you more power. Or, if you anticipate that the median survival with your drug will be about the same but that the tail end of your survival curves will be different (in other words, there will be a 10% of the patients who will live for years but 90% will derive no benefit) then you might want to use a 2-year survival rate landmark analysis rather than time-to-death survival analysis.
Potential reduction in sample size: 0 – 50%
Use Surrogate Endpoints
Surrogates are measurements (such as glucose levels or bone mineral density) that are correlated with the disease outcome (cardiovascular events, renal failure, or hip fractures) and hopefully with treatment effect. They are commonly used because they are easier to measure, have less variability, and/or occur faster than the ultimate clinical outcome. Surrogates can dramatically reduce sample size, but many people already overuse surrogate endpoints so I put this lower on this list. There are a lot of subtleties and pitfalls in using surrogate endpoints, but in general, if you can pick one that is in the causal pathway of the disease, surrogates can help reduce the sample size significantly.
Potential reduction in sample size: 0 – 70%
Use Frequency Analysis
In some instances, a patient might have multiple events (such as seizures). If the events are independent, then you can use the number of events rather than the number of patients who have an event as the endpoint. This will reduce the sample size.
Potential reduction in sample size: 0 – 40%
Rules of Thumb
By the way, here are a few rules of thumb that can be helpful when thinking about sample sizes.
- For trials using continuous variables, sample size is roughly equal to 16 divided by the difference in the two groups. (Difference is expressed in standard deviation units, and power is 80%, significance is 0.05) For example, if you expect the two groups to differ by half a standard deviation, then sample size is approximately 16 divided by .25 or 64.
- Power goes up as function of the square root of the sample size. For example, if you quadruple the sample size, power will double.
- Sample size is inversely proportionate to the square of the effect size. For example, if the difference between the two arms doubles, then sample size decreases by a factor of 4.
- For trials using continuous variables, sample size is proportionate to the square of the standard deviation. For example, if the standard deviation doubles, then sample size quadruples.