MCP - Conference 2007 -- Vienna

5^th International Conference on Multiple Comparison Procedures

MCP 2007 Vienna

Vienna, Austria | July 8-11, 2007

Program

The conference will be held from July 9 to July 11. On July 8 several pre-conference courses are offered.
Accepted Poster:

Bayesian classification and label estimation via EM algorithm: a comparative study
Marilia Antunes; Lisete Sousa
Faculty of Sciences, University of Lisbon

Gene classification problem is studied considering the ratio of gene expression levels, X, in two-channel microarrays and a non-observed categorical variable indicating how differentially expressed the gene is: not differentially expressed, down-regulated or up-regulated. Supposing X from a mixture of Gamma distributions, two methods are proposed and results are compared. The first method is based on a hierarchical Bayesian model. The conditional probability of a gene to belong to each group is calculated and the gene is assigned to the group for which this conditional probability is higher. The second method uses EM algorithm to estimate the most likely group label for each gene, that is, to assign the gene to the group which contains it with the higher estimated probability.

Estimation of Parameters in Unconditional Categorical Regression
Kamal Azam; Grami A., Ph.D et al
Tehran University of Medical Sciences

In large–scale sampling, we are always facing non-responses item(s) non-response or unit(s) or both. In fitting a model to the data we have two groups of variables, namely dependent and independent variables. Non-response may occur for any of these groups of variables. In this paper we assume that Y as a categorical dependent variable, Z and X as independent variables. The first two variables are fully observed and we assume that the mechanism of missing-ness is random (MAR). In order to estimate parameters a model is devised based on likelihood function for the whole data set including missing data and the estimation of parameters are compared with those obtained by statistical software such as S-Plus which are only based on complete observed data and ignore missing units .
Our results show that the estimations obtained using maximum likelihood based model is superior to the standard estimations for the approach utilized by the soft wares. The comparison is made on a set of health survey data on goiter disease carried out in Qazvin province.

Key words: Missing At Random, Logistic Regression, Goiter Disease, Maximum Likelihood

Adjustment Method to Address Type I Error and Power Issues with Outcome Multiplicity and Correlation
Richard Blakesley; Sati Mazumdar, Patricia Houck
University of Pittsburgh

Multiple comparisons call into question the validity of individual hypothesis testing due to type I error inflation. Several adjustment methods exist in statistical literature to protect type I error. However, their type I error and power performance suffer with increasing outcome multiplicity and correlation. Single-step approaches (Bonferroni, Sidak) protect type I error for independent outcomes, but become conservative with increasing correlation and suffer from lack of power. Stepwise approaches (Holm, Hochberg, Hommel) demonstrate improved power over single-step methods. Methods which use correlation components in the adjustment formulae (Dubey/Armitage-Parmar and R-Squared Adjustment) address overcorrection of type I error to a limited extent. Resampling methods (Bootstrap MinP and Step-Down MinP) incorporate correlation structure, but there exist caveats and implementation limits. We propose combining a stepwise approach with a new correlation component to stabilize type I error protection and maintain high power to reject false null hypotheses regardless of outcome multiplicity and correlation levels.

Methods: Simulations were conducted in the R statistical package. Multivariate normal datasets were simulated in each experiment under varying conditions of effect sizes (uniform, split), number of outcomes (4, 8, 12, 24), correlation structure (compound symmetry, block symmetry, decreasing dependence) and strength of outcome correlation. For each simulated dataset, two-sample t-tests were performed for each outcome, adjustment methods were applied, and type I error and three power formulations (minimal, maximal, average) were estimated. The proposed method used the Sidak form, a step-up approach, and a measure of hypothesis independence. Previously mentioned methods were included for comparison.

Results: The proposed method demonstrated stable type I error protection across the explored correlation structures. It also showed similar or greater power (all formulations) than examined methods with conservative type I error protection. These results held for increased outcome multiplicity.

Conclusion: The new method holds promise to allow high power to make inferences without concern for type I error issues regarding multiple correlated outcomes.

Funding Source: NIMH T32 MH073451, NIMH P30 MH071944

Performance of multiple testing procedures for genomic differences in groups of papillary thyroid carcinoma analysed by array CGH
Herbert Braselmann; Eva Malisch, Kristian Unger, Gerry Thomas, Horst Zitzelsberger
GSF National Research Center for Environment and Health, Institute of Molecular

Microarray-based Comparative Genomic Hybridization (array CGH) allows to detect DNA copy number differences between a reference genome and a tumour genome at thousands of chromosomal sites simultanously. Papillary thyroid carcinomas (PTC) are often carrying RET/PTC rearrangements which have been shown to be heterogeneously distributed within tumour tissues. Thus, it is likely that additional gene alterations are present in these tumours. Moreover, RET/PTC negative tumours should exhibit alternative changes.

Therefore we have investigated 33 PTC (20 adult tumours, 13 infantile, post-Chernobyl tumours ) with known RET/PTC status (RET/PTC positive: 11 adult, 10 infantile cases, RET/PTC negative: 9 adult, 3 infantile cases) by array CGH to uncover such unknown gene alterations in PTC.

Endpoints are given as log2-transformed intensity ratios (log2-ratios), which are further simplified to gain or loss status variables. For the analysis of group differences for approximately 1000 preselected genomic sites between adult versus infantile or between RET/PTC positive versus RET/PTC negative tumours, results of a multiple t-test for smoothed log2-ratios and of multiple Fisher’s exact tests for gains or losses are presented. When testing for age dependence, Benjamini-Hochberg’s FDR procedure resulted in about 50-100 significant differences, similar as a maximum permutation procedure (maxT) for the t-tests. For comparison of the RET/PTC status groups, FDR procedures resulted in hundreds of significant differences, whilst the maxT procedure yielded 46 significances. Fisher’s exact test for gains or losses yielded throughout a smaller number of significant differences. Typically for array CGH, a large part of the intensity ratios are positively correlated among the samples within chromosomal segments of variable length.

The results demonstrate exemplarily the performance of false discovery rate (FDR) and familywise error rate (FWER) p-value adjustments for a type of high-dimensional data. Results are also dependent on the data preprocessing methods and the chosen endpoint.

Multiplicity Adjusted Location Quotients
Gemechis Dilba; Frank Schaarschmidt, Bichaka Fayissa
Institute of Biostatistics, Leibniz University of Hannover, Germany

Location quotient is an index frequently used in geography and economics to measure the relative concentration of activities. For binomial data, the problem consists of simultaneously comparing the ratios of the individual proportions to the overall proportion. Apparently, this is a multiple comparison problem and up to now multiplicity adjusted location quotients have not been addressed. In fact there is a negative correlation between the comparisons when proportions of the subgroups are compared with the proportion of all the subgroups combined. Here, we propose adjusted location quotients based on existing probability inequalities and by directly using the asymptotic joint distribution of the associated z-statistics. A simulation study is carried out to investigate the performance of the various methods in terms of achieving a nominal simultaneous coverage probability. A simple adjustment of Fieller confidence intervals is observed to work quite well. The proposed methods will be illustrated on a health utilization data.

Quantile curve estimation and visualization for non-stationary time series
Dana Draghicescu; Serge Guillas, Wei Biao Wu
Hunter College, City University of New York

This talk addresses the problem of quantile curve estimation for a wide class of non-stationary and/or non-Gaussian processes. We discuss several smoothed quantile curve estimates, give asymptotic results, and introduce a data-driven procedure for the selection of the optimal smoothing parameter. This methodology provides a statistically accurate and computationally efficient graphical tool, that can be used for the exploration and visualization of the behavior of time-varying quantiles for time series with complex structures. A Monte Carlo simulation study and two applications to ozone time series illustrate the findings.

Simultaneous confidence intervals for overdispersed count data
Daniel Gerhard; Frank Schaarschmidt, Ludwig A. Hothorn
Institute of Biostatistics, Leibniz University of Hannover

The application of simultaneous confidence intervals for count data can be beneficial for various research objectives, like observing tumor counts in clinical trials and non-clinical studies or for the comparison of insect abundance in agricultural field trials. The confidence intervals considered here are constructed with parameter estimates from a generalized linear model, assuming the counts to be Poisson or in case of overdispersion negative-binomial distributed. Multiplicity is taken into account by a corresponding quantile of the multivariate t-distribution with certain correlation. In a simulation study we investigated the coverage probability of confidence intervals for different distributional assumptions in various factorial designs for several sample sizes. It is shown, that nominal level alpha is reached only at a number of observations larger than 20 and sufficient large sample means. Simulation studies and evaluation of examples were performed in the free software environment R using the packages gamlss and multcomp.

References:

Bretz, F., Genz, A., Hothorn, L.A. (2001): On the numerical availability of multiple comparison procedures. Biometrical Journal 43: 645-656.

McCulloch, C.E. and Searle, S.R. (2001): Generalized, linear and mixed models. John Wiley & Sons, Inc.

Rigby, R.A. and Stasinopoulos D.M. (2004): Generalized additive models for location, scale and shape. Applied Statistics, 54: 1-38.

Scale and suitable analysis
Fumihiko Hashimoto
Osaka City University

It seems that they are fully conscious of the rank of scales, such as an “ordinal scale” and an “interval scale”, in the treatise of a medical field. They use non-parametric analysis for low-level scale (e.g. nominal scale), and use parametric or non-parametric analysis for higher-level scale (e.g. interval scales) by their prudence. However, this processing may sometimes bring about a “false” statistical result. Author of this paper was engaged in research of the medical field as a statistic professional, there are many papers treats higher level scale with lower level analysis according to text of statistics. On this paper, we clarify that lower level analysis for higher level scale is not “prudent” but rather bring into mistaken by showing simulation data and our realistic data. The date measured with a certain scale have to be analyzed with suitable statistics.

Parametric multiple contrast tests in the presence of heteroscedasticity
Mario Hasler; Ludwig A. Hothorn
Leibniz University Hannover

Parametric multiple contrast tests in the presence of heteroscedasticity

Mario Hasler, Ludwig A. Hothorn
Institute of Biostatistics, Leibniz University Hannover, Germany

We describe a new method to facilitate multiple contrast tests for normally distributed data in the presence of heteroscedasticity. It keeps the alpha-level best whilst readily available methodology tends to yield conservative or liberal test decisions, respectively. Both differences in and ratios of means are addressed. We compare the new method with former ones by alpha-simulations.

References:

• G Dilba, E Bretz, V Guiard, and L. A. Hothorn. Simultaneous confidence intervals for ratios with applications to the comparison of several treatments with a control. Methods Of Information In Medicine, 43(5):465–469, 2004.
• G Dilba, F Bretz, and V Guiard. Simultaneous confidence sets and confidence intervals for multiple ratios. Journal Of Statistical Planning And Inference, 136(8):2640–2658, August 2006.
• G Dilba and F Schaarschmidt. mratios: Inferences for ratios of coefficients in the general linear model, 2006. R package version 1.2.1.
• PA Games and JF Howell. Pairwise multiple comparison procedures with unequal n’s and/or variances: a Monte Carlo study. Journal of Educational Statistics, 1(2):113–125, 1976.
• Y Hochberg and AC Tamhane. Multiple comparisons procedures. John Wiley and Sons, Inc., 1987.
• T Hothorn, F Bretz, and P Westfall. multcomp: Simultaneous Inference for General Linear Hypotheses, 2006. R package version 0.991-5.
• R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2006. ISBN 3-900051-07-0.
• FE Satterthwaite. An approximate distribution of estimates of variance components. Biometrics, 2:110–114, 1946.
• AC Tamhane and BR Logan. Finding the maximum safe dose level for heteroscedastic data. Journal of Biopharmaceutical Statistics, 14(4):843–856, 2004.
• BL Welch. The significance of the difference between two means when the population variances are unequal. Biometrika, 29:350–362, 1938.

A simulation study on the gain in power of multiple test procedures by using information on the number of true hypotheses
Claudia Hemmelmann; Andreas Ziegler, Rüdiger Vollandt
Institute of Medical Stastistics, Computer Sciences and Documentation Universit

It is known that the knowledge of the number of true hypotheses leads to increased power of some multiple test procedures. However, the number of true hypotheses is unknown in general and must be estimated. We aim at showing how the gain in power is dependent upon the accuracy of the estimation of the number of true hypotheses.
We simulate m-dimensional random vectors and employ different multiple test procedures by utilizing several upper bounds of the number of true hypotheses. We consider multiple test procedures which control the family-wise error rate (Holm method), the generalized family-wise error rate (Hommel and Hoffmann; procedure A of Korn and colleagues), the false discovery rate (Benjamini and Hochberg) and the false discovery proportion (Lehmann and Romano; procedure B of Korn and colleagues) and apply the average power and the all pairs power for the evaluation.
Clearly, the more accurate the estimate of the number of true hypotheses is, the larger the gain in power. The power increases when the number of true hypotheses decreases. But this increase of power also depends upon the error rate and several distribution parameters. For example, the gain of power is independent from the correlation between the vector components for the procedure of Hommel and Hoffmann whereas the gain of power increases with increasing correlation for the procedure A of Korn and colleagues.
We also compute the corresponding error rate by an underestimation of the number of true hypotheses. For some procedures and error rates, respectively, no underestimation is allowed, e.g. for the Holm method and procedure of Benjamini and Hochberg, whereas for others the number of true hypotheses can be underestimated up to 60-70 percent, e.g. for the procedure of Hommel and Hoffmann and procedure of Lehmann and Romano.

On Orthogonal Series Estimation Methods
Mei Ling Hunag; Percy Brill
Department of Methematics, Brock University, Canada

This paper discusses nonparametric orthogonal series estimation methods. The main focus is on a Hermite series density estimator and a trigonometric series density estimator. The paper gives comparisons of the properties of these two estimators with other nonparametric density estimation methods, for example, kernel density estimation and other methods. Computational simulation results are obtained. The paper also discusses several examples of applications in medical research and other fields.

Forecasting Monthly Temperature and Relative Humidity using Time series analysis
Inderjeet Kaushik; PR Maiti
Institute of Technology, Banaras Hindu University, Varanasi, India

Prediction of climatic factors like temperature and relative humidity is a stochastic process. In this paper an effort is made to model these parameters by using time series analysis for forecasting monthly temperature and relative humidity. For the analysis and forecasting purpose last 12 years monthly data of Mirzapur district is used. Time series ARIMA models provide quite satifactory results than other time series models.

Study on statistical analysis for adverse drug reaction in Korea
Hyeon Jeong Kim; Eunhee Kim, Mun Sin Kim, Junghoon Jang, Bong Hyun Nam
National Institute of Toxicological Research

In the case of developed countries, the spontaneous reporting systems for the adverse drug reactions and the management of their databases have been constructed systematically. However, the overall systems for the adverse drug reaction in Korea is insufficient compared to developed countries. In addition to the reporting cases of the adverse drug reaction have been recently increased due to the obligation of reporting them, but the statistical analysis methods for these data have been not studied sufficiently. So we investigated the spontaneous reporting systems for the adverse drug reactions and the statistical analysis methods in developed countries such as USA, UK, Australia, and WHO and applied the statistical methods in Korea data and compared the methods.

Testing equality of two mean vectors with uniform covariance structure when missing observations occur
Kazuyuki Koizumi; Toshiya Iwashita,Takashi Seo
Tokyo University of Science

We consider the test for equality of two mean vectors and the simultaneous confidence intervals when observations are missing at random in the intraclass correlation model. Hotelling's $T^2$ test for the equality of two mean vectors is given by an extension of Seo and Srivastava(2000) when the missing observations are of the monotone type. Finally, numerical example is presented.

Inequalities for multivariate normal probabilities of nonsymmetric rectangles
Vered Madar
Tel-Aviv University

Sidak inequality (1967) provides a product bound to the joint
normal probabilities of rectangles. It permits arbitrary correlation structure and also extend able to elliptically
contoured distributions (Das Gupta et al 1971). As a such it has many useful applications in Multiple Comparisons Procedures.
We extend the symmetric inequality by \v{S}id\'{a}k (1967) to
a much stronger inequality on nonsymmetric rectangular regions, and show some applications.

Methodological issues in the design and sample size estimation of a cluster randomized trial to evaluate the effectiveness of clinical pathways
Sara Marchisio; Massimiliano Panella, Manzoli Lamberto, DiStanislao Francesco
University of Eastern Piedmont

Clinical pathways emerged as an important tool to reduce unnecessary variations and to improve the outcomes for patients. Despite enthusiasm and diffusion, the widespread acceptance of clinical pathways remain questionable because very little prospective controlled data demonstrated their effectiveness, mainly because of the complexity of the study design and management. We performed a cluster multi-centre randomized controlled clinical trial to evaluate the effect of applying clinical pathways to process and outcome indicators and to the costs sustained to assist the patients with heart failure. We compared the results obtained treating the patients with clinical pathways to the results obtained with the usual care. Since a clinical pathway is not a single intervention to be compared with a placebo but its eventual benefits come from a mix of complex actions that are implemented at the institutional level (appropriate use of practice guidelines and supplies of drugs and ancillary services, new organization and procedures, patient education, etc.), we randomly assigned hospitals, rather than individual patients, to either introduce the pathway or continue usual care. The primary outcome measure was in-hospital mortality. Since in Italy the in-hospital mortality rates range from 5% to 17%, we expected that clinical pathways succeeded to control mortality to 5% to be clinically relevant. Based on this goal a sample size of 424 patients (212 in each group) was required for the study to have 80% power at the 5% significance level (two-sided). We adjusted the sample size using an inflation factors of 2.015 to account for the cluster randomization (7 clusters per trial arm, cluster size of 30 patients, ICC of 0.035). In addition to common descriptive statistics (Fisher exact and Kruskal Wallis test for categorical and continuous variables, respectively), performed at the cluster level, the differences in the rate of in-hospital deaths and unscheduled admissions across groups and according to each variable under study were evaluated using random-effects logistic regression, thus accounting for the clustering effect. Variables were included if significant at the 0.10 level (backward approach), with the exception of age which was forced to entry. The presence of multicollinearity, interaction and higher power terms was assessed to check final model validity. A cluster randomized trial have conceptual validity and relevant advantages in terms of patient’s management and study expenditures. However, the conduction of a cluster randomized trial implies some specific ethical issues and, moreover, several methodological modifications in the statistical analysis and sample size estimation as shown in this paper. Therefore, the paper is intended as a methodological instrument to support the investigators in conducing a trial to evaluate complex interventions in healthcare.

Maximum contrast tests and model selection under order restriction
Xuefei Mi; L.A. Hothorn
Biostatistics Unit, Hannover University, Germany

Abstract:
The use of order-restricted hypotheses is a common approach to increase power. Hereby, simple-order and tree-order are three common types of order restrictions. In this talk we focused on how to selection a suitable pattern of them. Several approaches are available for these problems, such as max-t statistics according to Hirotsu and Srivastava (2000) which can be formulated as maximum contrast approach belonging to the broader class of multiple contrast tests (MCT). The disadvantage of MCT is that it can only reject the global null hypotheses. The local finding rate of the true pattern of the alternative is low. Recently, Anraku(1999), Zhao et.(2002) and Ninomiya(2005) developed information-criterion based log-likelihood method for model selection approaches under certain types of order restriction. These methods have better finding rate of the true pattern, but do not control the alpha rate. They treat the null model as one of the possible pattern among all others and are not constructed as hypothesis test to reject the alternatives. In this talk we will compare these two methods for simple-order and tree-order. Also we will present a modification which can control the alpha rate under simple-order restriction.

References
Robertson, T., Wright, F.T. and Dykstra, R.L. (1988). Order restricted statistical inference. Wiley, New York.

Bretz, F and Hothorn, L.A. (2002). Detecting dose-response using contrasts: asymptotic power and sample size determination for binomial data. Statistics Medicine 21, 3325

Ninomiya, Y. (2005). Information criterion for Gaussian change-point model. Statistics & Probability Letters 72, (3): 237-247

Zheng, L. and Peng, L. (2002). Model selection under order restriction. Statistics & Probability Letters 57:44, 301-306

Hirotsu, C. and Srivastava, M. S. (2000). Simultaneous confidence intervals based on one-sided max t test. Statistics & Probability Letters 49, 25-37.

Adjusting for multiple testing
Mohamed Moussa; Nil
Department of Community Medicine, Faculty of Medicine, Kuwait University

Multiple hypotheses testing is a common problem in medical research. Multiple hypotheses testing theory provides a framework for defining and controlling appropriate error rates in order to protect against wrong conclusions. A one-way analysis of variance (ANOVA) is used when the effect of an explanatory factor with more than two groups on continuous outcome variable is explored. If the ANOVA statistics show significant difference in means between factor groups, then multiple pairwise comparisons are performed to find which groups are significantly different from another. This is done either by specific group differences using planned (apriori) comparisons which are decided before the ANOVA is run, or using post-hoc (aposteriori) tests which involve all possible comparisons between groups. Post-hoc tests are data-driven and hence are inferior to the thoughtful planned tests. Type 1 error increases with the number of comparisons, hence some adjustments are made to preserve it. If n comparisons are made, the probability that at least one of them will be significant is 1- (1-alpha)**n, if all n individual null hypotheses (H0) are true, alpha is the probability of falsely rejecting H0. It is preferable to run a small number of planned comparisons rather than a large number of unplanned post-hoc tests. Post-hoc tests vary from being conservative to liberal with no adjustment for multiple comparisons. A conservative test is one in which the actual significance is smaller than the stated critical significance level. Thus conservative tests may incorrectly fail to reject H0. The choice of post-hoc test is mainly determined by equality of the variance. Equal variance post-hoc tests are either conservative tests (Scheffe, Tukey’s honestly significant difference, HSD, Bonferroni, and Sidak) or liberal tests (Fisher’s least significant difference, LSD, Duncan’s new multiple range test, Student – Newman – Keuls, SNK). Equal variance not assumed post-hoc tests include Games Howell and Dunnett’s C tests. The aim of this paper was to apply the existing multiple comparison procedures in exploring the effect of physical activity level (Very light, Light, Moderate, and Heavy) on the continuous cardiovascular risk marker ‘total sialic acid’.

Our results showed that the LSD test is the most liberal post-hoc test showing three significant comparisons (Very light/Light, p=0.008; Very light/Moderate, p=0.005; Very light/Heavy, p=0.012). The Tukey’s HSD, Bonferroni, and Sidak showed the same significant comparisons (Very light/Light; Very light/Moderate) with p-values 0.039, 0.023 in Tukey’s HSD, p= 0.047, 0.027 in Bonferroni, and p=0.046, 0.027 in Sidak tests respectively. The Scheffe’s test was the most conservative showing only one significant comparison (Very light/Moderate, p=0.044)
Epidemiologists show less enthusiasm about formal adjustment procedures since they increase type 2 error and hence decrease statistical power to detect significance. There is an extreme view that denies the need for adjustments for multiple comparisons. They argue that multiple comparisons are appropriate if the universal H0 and omnibus HA are of interest, but in most studies they are of no interest. Studies with a single key interest apriori planned may often generate stronger evidence on a specific hypothesis than studies with aposteriori multiple interests.

It is emphasized that adjustments for multiple testing are required in confirmatory studies whenever results from multiple tests have to be combined in one final conclusion and decision. It is suggested that multiple-comparison procedures are frequently adopted unnecessarily. Provided that a selected number of well-defined individual null hypotheses are specified apriori at the design, there are situations in which multiple tests of significance can be performed without adjustment of type 1 error rate.

Biotechnology as chance for food safety
Kakha Nadiradze
Biotechnology Center of Georgia

At the beginning of the 21thcentury, the Modern Bio and Microbiology Techniques plays a very Important role in the world agriculture, environment and ecology and scientific and research activities. Due to new approaches of the researches realities, the Bio and Microbiology can be considered as a one of the important field of the agriculture with a number of problems that have to be resolved in the interest of all countries jointly.

Since humans began to live in settled agricultural communities they have been involved in a constant battle to reduce the impact of pests-insects, mites, mollusk, pathogens, weeds, mammals and bird-on their crops. They have to control these problems through manual methods, intercropping, tillage and composting, as well as more innovative methods like the use of predatory vertebrates.

Mankind has always exploited the potential of beneficial organisms to control pests, in what we now call biological control. At its simplest, biological control or bio-control is the deliberate use of one or more organisms to control another organism that has became a pest. Within bio-control three different approaches:

Classical bio-control: traditionally used for permanent suppression of an alien pest through the introduction and release of co-evolved or highly specific natural enemies from the pests are of origin

Augmentation: the Release or application of (usually indigenous) natural enemies in large numbers to control pest outbreaks.
Conservation: The promotion of practices favoring the activity of indigenous natural enemies against either native or non-native pests. There are four mail categories of biological agents:
Insect parasitoids that are parasitic on other insects in early stage of development but eventually kill their hosts; most are Hymenoptera (wasps) or Diptera (files).

Predatory invertebrates and vertebrates that eat prey species Phytophagous or plant-eating invertebrates associates with weeds. Microbial agent including bacteria, viruses, fungi and nematodes. Some microbial or their byproducts (toxins) are formulates into bio-pesticide preparations, which are used in a similar way to chemical pesticides.

The use of biological control is not without risk; many people are aware of the disastrous impacts of the cane food introduces in a non-scientific attempt to control.

Bio-control should underpin most pest management programmers to establish a sustainable balance in the environment:

It replaces reduces the need for chemical control. It readily integrates with little or no negative impact on the ecosystem. It is a long –term means of control. It is more cost –effective

Statistical method for finding protein-binding sites from ChIP-chip tiling arrays
Taesung Park; Haseong Kim, Jae K. Lee
Seoul National University

Recently, high-resolution tiling chromatin-immunoprecipitation chips (ChIP-chip) have being increasingly used to find the protein-binding sites, replication origins of chromosomes, and DNase hypersensitive sites. However, due to the non-ignorable noises and high-resolution of tiling arrays, it is very difficult to obtain a sufficient number of biological replicates of ChIP-chip tiling arrays with a high reproducibility. Further, not many solid statistical methods are currently available to analyze ChIP-chip tiling arrays. We propose a new statistical method to map the transcription factor IID (TFIID) binding sites using the ChIP-chip tiling arrays without any replicate. The proposed method adopts a local error pooling method to control the high noise levels of tiling arrays caused by the correlations between the adjacent probes. Our real data application of 38 NimbleGene ChIP-chip tiling arrays containing a total of 14,535,659 50-mer oligonucleotides, positioned at every 100 basepairs(bp) throughout the human genome, successfully identified the 6,411 active promoters in human cells which are bound by the general transcription factor IID (TFIID).

Approximative simultaneous confidence intervals for multiple contrasts of binomial proportions and poly-3-adjusted tumour rates
Frank Schaarschmidt; Martin Sill, Ludwig A. Hothorn
Institute of Biostatistics, Leibniz University Hannover

Simultaneous confidence intervals for contrasts of means in a one-way layout with k independent samples are well established for Gaussian distributed data. Procedures approaching different practical questions are available, as all-pairs or many-to-one comparisons, comparison with average, or different tests for order-restricted alternatives. However, if the distribution of the response is not Gaussian, corresponding methods are usually not available or not implemented. For the two cases: i) k binomial proportions (Price and Bonett, 2004), and ii) k Poly-3-adjusted tumour rates (Bailer and Portier, 1988) we extended recently proposed confidence interval methods for the difference of two proportions or single contrasts to multiple contrasts by using quantiles of the multivariate normal distribution. The small sample performance of the proposed methods was investigated in simulation studies. For binomial proportions and poly-3-adjusted tumour rates, the simple adjustment of adding 2 pseudo-observations to each sample estimate leads to reasonable coverage probabilities. The methods are illustrated by evaluation of real data examples of a clinical trial and a long-term carcinogenicity study. The proposed methods and examples are available in the R package MCPAN.

Bailer, J.A. and Portier, C.J. (1988): Effects of treatment-induced mortality and tumor-induced mortality on tests for carcinogenicity in small samples. Biometrics 44, 417-431.

Price, R.M. and Bonett, D.G. (2004): An improved confidence interval for a linear function of binomial proportions: Computational Statistics & Data Analysis 45: 449-456.

Lets ROC on Microarrays
Carina Silva-Fortes; Maria Antónia Amaral Turkman and Lisete Sousa
Escola Superior de Tecnologia da Saúde de Lisboa-Insituto Politécnico de Lisboa-

There are new statistical challenges posed by data from microarray experiments, due to the exploratory nature of experiments and the huge number of genes under investigation. There are many statistical techniques to analyze those data, but sometimes they are too difficult to implement. We present the advantages of the application of receiver operating characteristic (ROC) analysis in microarray data, in particular on selection of genes that are differentially expressed in different known classes of tissue. We also present one example of application of ROC analysis to select the optimal cut-off value for gene classification.

Key-words: ROC curves, microarrays, optimal cut-off, differential expression

Controlling the number of false positives using the Benjamini-Hochberg Procedure
Paul Somerville; n/a
University of Central Florida

Controlling the Number of False Positives Using the Benjamini- Hochberg FDR Procedure
Paul N. Somerville
University of Central Florida

In multiple hypotheses testing, it is challenging to adequately control the rejection of true hypotheses while still maintaining reasonable power to reject false hypotheses. For very large numbers of hypotheses, using the traditional family-wise error rate (FWER) can result in very low power for testing single hypotheses. Benjamini and Hochberg (1955) proposed a powerful multiple step procedure which controls FDR, the “False Discovery Rate”. The procedure can result in a large number of false positives.

Van der Laan, Dudoit and Pollard (2004) proposed controlling a generalized family-wise error rate k-FWER (also called gFWER(k)), defined as the probability of at least (k+1) Type I errors (k=0 for the usual FWER). Lehmann and Romano (2005) suggested new and simple methods of controlling k-FWER and the proportion of false positives (PFP) (also called False Discovery Proportion FDP).

Somerville and Hemmelmann (2006) proposed controlling k-FWER by limiting the number of steps in step-up or step-down procedures. In this paper the procedure is applied to the Benjamini-Hochberg FDR procedure. Formulas are developed and Fortran 95 programs have been written. Tables are presented giving the maximum number of steps in the Benjamini-Hochberg procedure which will assure that
P[U.le.k].ge.1-a
for various values of k and a, where U is the number of false positives.

Application of Multiple Comparison Procedures for Analysis of Naltrexone and Fluoxetine Effects for Treatment of Heroin Dependence
Elena V. Verbitskaya; Evgeny M. Krupitsky, Edwin E. Zvartau, Marina V. Tsoi-Podosenin, MD, Valentina Y
Laboratory of Biomedical Statistics, St.-Petersburg Pavlov State Medical Univers

A previous study of 52 patients randomized to naltrexone or naltrexone placebo demonstrated that naltrexone was clinically effective for preventing relapse to heroin addiction in Russia. This study was done to replicate these early results in a larger sample, and see if the combination of an SSRI antidepressant with naltrexone might alleviate the depression, anxiety and anhedonia typically associated with opioid detoxification and improve the results of naltrexone treatment. 280 heroin addicts who completed detoxification at addiction treatment units in St. Petersburg and provided informed consent were randomized to a 6 month course of biweekly drug counseling and one of four groups of 70 subjects/group: Naltrexone 50 mg/day (N) + Fluoxetine 20 mg/day (F); N + Fluoxetine placebo (FP); Naltrexone placebo (NP) + F; or NP + FP. Medications were administered under double-dummy/double-blind conditions. Primary endpoint was relapse rate and the main analysis for it was survival analysis. There were several secondary endpoints that assessed changes of such psychometric characteristics as craving for heroin (VASC), Global Assessment of Functioning (GAF; DSM-IV, 1994), Beck Depression Inventory (BDI; Beck et al, 1961), Brief Psychiatric Rating Scale (BPRS; Overall, Gorham, 1962), Spielberg State-Trait Anxiety Test (SSTAT) (Spielberg et al, 1976), Scale of Anhedonia Syndrome (SAS; Krupitsky et al, 1998),. We met several problems in regard to statistical analysis of such data: 1) multiple secondary endpoints,. 2) multiple timpoints, all secondary endpoints were tested 3-13 times during trail; 3) the big change of number of patients relapsed or lost for follow up that causes a big disbalance at the end of the trail that limited the usage of repeated measures MANOVA. At the end of six months, 43% of subjects in the N+F group remained in the study and had not relapsed as compared to 36% in the N+FP group, 21% in the NP+F group, and 10% in the NP+FP group. Combination of two samples (pilot and main) increases the sample at the end of the 6 month period. There was prominent effect of drugs on characteristics of addiction (MANOVA, Games-Howel Post hock test), Repeated measures ANOVA showed that were no effect of treatments on psychometric characteristics, only effect of time: it includes only those patients, who stay in th program til the end of the study. But MANCOVA tests with on the 3 month data and 6 month data demonstrated effect of Naltrexone.

Optimal allocation of sample size in two-stage association studies
Shu-Hui Wen; CK Hsiao
Department of Public Health

Multiple testing occurs commonly in genome-wide association studies with dense SNPs map. With numerous SNPs, not only the genotyping cost and time increase dramatically, most traditional family-wise error rate (FWER) controlling methods may fail for being too conservative and lose power when detecting SNPs associated with disease. Lately, more powerful two-stage strategies for multiple testing have received great attention. In this paper, we propose a grid-search algorithm for an optimal design for sample size allocation under these two-stage procedures. Two types of constraints are considered, one is on the overall cost and the other on sample size. With the proposed optimal allocation of sample size, bearable false positive results and larger power can be achieved to meet the limitation on study design. As a general rule, the simulations indicate that allocating at least 80% of the total cost in stage one provides maximum power, as opposed to other methods. If per-genotyping cost in stage two differs from that in stage one, downward proportion of the total cost in earlier stage maintains good power. For limited total sample size, evaluating all the markers on 55% of the subjects in the first stage provides maximum power while the cost reduction is approximately 43%.

Nonparametric tolerance bounds for gene selection
S. Stanley Young; Gheorghe Luta
National Institute of Statistical Sciences, USA

A tolerance bound “covers” a specific proportion, P, of the distribution with a fixed level of confidence. The usual interest is to cover the central part of the distribution. For certain problems, e.g. selection of genes from a microarray experiment for further characterization, there is a need to select a set of genes expected to contain the most extreme P proportion of the genes tested. So rather than statistically testing each gene and selecting the gene if some multiple testing threshold has been obtained, our idea is to select a set of genes that contain, with specified confidence, the most extreme genes in the set of genes tested.

Home

General Information

Social Events

Program

Presentations

Abstracts
   Invited Talks
   Contributed Talks
   Posters

Organizing Committee

Registration

Hotels/Excursions

Travel Information

Sponsors

Newsletter

Gallery

Downloads

Abstract Book
MCP-2007 Flyer

Archive

MCP-2009 - Japan
MCP-2005 - Shanghai
MCP-2002 - Bethesda
MCP-2000 - Berlin
MCP-1996 - Tel Aviv

| scripted by ViRus |