Principles, Methods, and Applications of Multiple Comparisons


Jason C. Hsu, The Ohio State University
Szu-Yu Tang, Roche Tissue Diagnostics

This short course is not a bunch of formulas.

For an in depth understanding of error rate control, we will discuss Tukey’s Per Comparison, Per Family, and Familywise error rates, as well as Benjamini and Hochberg’s False Discovery Rate, emphasizing for each its proper interpretation and application to real life problems (such as drug development for Alzheimer’s disease, or discovering biomarkers from Genome- wide Association Studies).

To understand which multiple comparison method to use, we will show the two fundamental principles of multiple comparisons, closed testing and partitioning, underlie the myriad methods (step-up or step-down, Bonferroni based or not, weighted or not, gate-keeping or graphical approach), giving insight into the pros and cons of each, making it easier for you to choose a proper method for your own application.

Science progresses, technology advances, attitude and problems change! While old principles endure, new principles need to be developed from time to time. Targeted medicine such as immunotherapy for cancer makes patient targeting a real possibility, if we can be confident of how. Subgroup analysis, which inherently involving multiplicity issues, brings to light that Odds Ratio and Hazard Ratio are not Logic-respecting, not even Collapsible in Causal Inference, but Relative Risk and Ratio of Medians are logic-respecting. To conclude this short course, an example of a newly developed principle which solves this new problem, the Subgroup Mixable Estimation principle will be shown to ensure confident and logic-respecting targeting of patient Subgroups.

Designs for Confirmatory Trials with Multiple Treatments, Multiple Endpoints and Population Enrichment


Cyrus Mehta, President and Co-Founder, Cytel Inc., Cambridge, MA 02139
Lingyun Liu, Director/Strategic Consulting, Cytel Inc., Cambridge, MA 02139

As EMEA guidance on multiplicity adjustment pointed out, a clinical study that requires no adjustment of the type I error is one that consists of two treatment groups, that uses a single primary variable, and has a confirmatory statistical strategy that pre-specifies just one single null hypothesis relating to the primary variable and no interim analysis. In contrast to conventional phase 3 clinical studies with two-arm trials involving a single endpoint and no interim analysis, modern clinical trials are often designed to address multiple clinical questions which often need multiplicity adjustments to ensure strong type I error control. Commonly encountered sources of multiplicities include multiple treatments, multiple endpoints, interim analysis, subgroup analysis. This workshop will cover the following multiplicity problems:

(1) trial design with multiple endpoints using graphical multiple comparison methods and gatekeeping procedures,

(2) adaptive multi-arm multi-stage (MAMS) group sequential designs with flexible interim analyses at which ineffective or unsafe arms can be dropped and sample size can be re-estimated to ensure a well powered study while strongly controlling type I error rate at nominal level,

(3) trial design with adaptive population enrichment where the precision drug/therapy can be developed for the patient population who benefit most.

Theory and application will be well mixed. Using clinical trial examples, we will demonstrate the use of the industry standard software package East® to assess the pros and cons of a variety of multiple comparison procedures commonly used in clinical trials. Some highlights include the exact gatekeeping testing procedures which handle trials with small sample size where the asymptotic breaks down. Therefore exact test can be utilized to account for the correlation among multiple endpoints and the exact distribution of those endpoints. For MAMS design, two approaches will be presented. One approach monitors the trial using the unweighted cumulative test statistics and type I error control is achieved by conditional rejection probability principle. The other approach utilizes the p-value combination function in closed testing framework to preserve type I error in face of interim adaptation including treatment selection and/or sample size adaptation. The pros and cons of each approach will be discussed. Last, the adaptive enrichment design will be presented using TAPPAS trial as the real trial example. We will show how to design such trials to meet regulatory requirements. Regulatory interaction and operational considerations for successful implementation of such designed trials will be discussed.