I consider the type of statistical experiment commonly referred to as adaptive trials, in which the experimenter interacts with an individual or a set of individuals, and, sequentially, over time steps $t=1,\ldots,T$, observes a vector of measurements $L_1(t)$ on the individual or individuals, then assigns treatment vector $A(t)$, and then observes a post-treatment vector of measurements $L_2(t)$. In an adaptive trial, the experimenter can update the treatment distribution at $t$ based on previous observations.
This very general formulation covers many common settings such as dynamic treatment regimes, the stochastic contextual bandit model, the Markov Decision Process model in reinforcement learning. I consider two related types of learning tasks: causal inference from data collected under an adaptive trial, and sequential decision making with the objective of either maximizing the sample efficiency for an estimation task, or of minimizing some form of cumulative regret.
My primary concerns are to develop statistical methods and algorithms that use statistical models that assume no more than is known from domain knowledge (and therefore are nonparametric), and that are as sample efficient as possible.