Name: Troendle

Firstname: James

Title: Dr.

Institution: NIH

Street: None

City: Bethesda, MD

Zip-Code: 20892

Country: USA

Phone: 301-435-6952

Fax: 301-402-2084

Email: jt3t@nih.gov

Authors: Korn, E.L., Troendle, J.F., McShane, L.M., and Simon, R.

Title: Controlling the False Discovery Proportion or the Number of False Discoveries With Application to High-dimensional Genomic Data

Abstract: Microarray analysis allows simultaneous measurement of expression levels for thousands of genes on a single specimen. Frequently, an objective of such a study is to identify which genes among the thousands are differentially expressed in one group as compared to another. We propose two new statistical procedures for this problem, which control the number of spurious findings.

A simple approach to the identification of differentially expressed genes is to perform a univariate analysis of group mean differences for each gene, and then identify those genes that are most statistically significant. Using nominal significance levels (unadjusted for multiple comparison) will lead to the identification of many genes that truly are not differentially expressed, "false discoveries". However, control of the familywise error rate is too extreme since the identified genes will be studied further for biologic relevance. A reasonable strategy is to allow a small number of false discoveries, or a small proportion of the identified genes to be false discoveries. Although previous work has considered control for the expected proportion of false discoveries, we show that these methods may be inadequate. We propose two stepwise permutation-based procedures designed to control either the actual proportion or number of false discoveries with specified confidence.

The procedure that controls the false discovery proportion is shown to asymptotically achieve that control. The procedure that controls the number of false discoveries is shown to achieve that control regardless of sample size. In both cases, simulation supports these claims and in fact seems to indicate that both procedures may achieve control regardless of sample size.

The new methods are applied to breast tumor microarray data, where many more genes are identified than by using a familywise error rate controlling procedure. In addition, simulations indicate the methods perform well for the type of dimension, correlation, and sample size typically encountered in microarray analysis.

References: Benjamini, Y., and Hochberg, Y. (1995), "Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing," Journal of the Royal Statistical Society, Ser. B, 57, 289-300.

Westfall, P.H., and Young, S.S. (1993), Resampling-Based Multiple Testing, New York: Wiley.