Name: Gene

Firstname: Pennello

Title: Mathematical Statistician

Institution: FDA Division of Biostatistics

Street: 1350 Piccard Drive HFZ-542

City: Derwood MD

Zip-Code: 20850

Country: USA

Phone: 301-827-0010

Fax: 301-443-8559

Email: gxp@cdrh.fda.gov

Authors: Gene Pennello, Ph.D.

Title: Screening for Genes with Differential Expression in Microarray Experiments: A Bayesian Subset Selection Approach

Abstract: Microarray experiments are often used as exploratory tools for finding genes that are associated with disease or other characteristics when there is little evidence of their effects. Typically, several thousand gene variants are tested for effects, and thus many significant effects can be observed by chance. A realistic goal is to select a subset likely to contain the m<N genes with the largest effects from among the N genes spotted on the array. Further study of the N genes will be limited to the genes in the subset. In particular, this subset can be helpful when the goal is to create a specialized microarray, such as a diagnostic microarray device. We consider a Bayesian subset selection approach to the problem. The approach traces back to a thesis by RP Bland under DB Duncan (1961, Johns Hopkins U.). The null hypothesis for each gene is that its effect is larger than the mth largest effect of the remaining genes. The subset of genes selected for further! study are those genes for which the null hypothesis cannot be rejected. Bayesian hierarchical modeling is used to adjust the analysis for multiple testing. In the prior distribution, genes having no previous evidence of effects are considered exchangeable, making the posterior mean effect for each gene borrow strength from the observed effects of the other genes. Thus a large observed effect of a gene will be shrunk towards 0 in the posterior mean when the gene is exchangeable with other genes that have mostly small observed effects. Candidate genes, for which there is some prior evidence for an effect, will be treated separately. The method will be illustrated on latex allergy genomics data.