This dissertation considers three different topics related to extracting and merging evidence from heterogeneous sources. This problem is addressed from different angles, from the field of design of experiment to machine learning.
Within this dissertation, we add to the existing literature in each area by developing novel methodology and software.
Adaptive trial designs can considerably improve upon traditional designs,
by modifying design aspects of the ongoing trial, like early stopping,
adding or dropping doses, or changing the sample size.
We propose a two-stage Bayesian adaptive design for a Phase IIb study aimed at selecting the lowest effective dose for Phase III. In this setting, efficacy has been proved for a high dose in a Phase IIa proof-of-concept study, but the existence of a
lower but still effective dose is investigated before the scheduled Phase III starts.
In the first stage patients are randomized to placebo, maximal
tolerated dose, and one or more additional doses within the dose
range. Based on an interim analysis, the study is either stopped for
futility or success, or enters the second stage, where newly recruited
patients are allocated to placebo, some fairly high dose, and one
additional dose chosen based on interim data. At the interim analysis
criteria based on the predictive probability of success are used to
decide on whether to stop or to continue the trial, and, in the latter
case, which dose to select for the second stage.
Finally, a dose will be selected as lowest effective dose for Phase III
either at the end of the first or at the end of the second stage.
The operating characteristics of the procedure are evaluated via
simulations and results are presented for several scenarios comparing
the performance of the proposed procedure to those of the non adaptive
design.
The development of novel therapies in multiple sclerosis (MS) is one area where a range of surrogate
outcomes are used in various stages of clinical research. While the aim of treatments in MS is to prevent
disability, a clinical trial for evaluating a drugs effect on disability progression would require a large
sample of patients with many years of follow-up. The early stage of MS is characterized by relapses. To
reduce study size and duration, clinical relapses are accepted as primary endpoints in phase III trials. For
phase II studies, the primary outcomes are typically lesion counts based on Magnetic Resonance Imaging
(MRI), as these are considerably more sensitive than clinical measures for detecting MS activity.
Recently, Sormani and colleagues \cite{sormani2010surrogate} provided a systematic review, and
used weighted regression analyses to examine the role of either MRI lesions or relapses as trial level
surrogate outcomes for disability. We build on this work by developing a Bayesian three-level model,
accommodating the two surrogates and the disability endpoint, and properly taking into account that
treatment effects are estimated with errors. Specifically, a combination of treatment effects based on
MRI lesion count outcomes and clinical relapse, both expressed on the log risk ratio scale, were used to
develop a study level surrogate outcome model for the corresponding treatment effects based on
disability progression. While the primary aim for developing this model was to support decision making
in drug development, the proposed model may also be considered for future validation.
In Genomics and Epidemiology we deal with a high number of features for each observation. Many well known approaches to drawing inferences in this kind of settings use the topology of the feature space, induced by an appropriate metric, to group observations and summarize their main characteristics to get rid of the noise and to predict an outcome of interest. In the present work we generalize this approach in the context of Loss-Based Estimation. We propose an alternative method for constructing a nonparametric multidimensional regression function. This approach is based on the simple idea of clustering data points in the feature space and then fitting a constant to the outcome. HOPACH-PAM is used for partition. This approach results in the choice of a small number of distinct regions easy to interpret. This is specifically illustrated by simulations from which we can see immediately the superiority of this method on CART. Pre-screening and feature selections methods are also developed to improve the performances and reduce the noise. Software is also available in the R package HOPSLAM (HOpach-Pam Supervised Learning AlgorithM) to make this methodology easily accessible.