The large scale structures (LSS) of the Universe contain a vast amount of information about the birth, evolution and composition of our Universe. To mine this information over the next decade, large scale imaging surveys such as the Dark Energy Spectroscopic Survey (DESI), Large Synoptic Survey Telescope (LSST), Euclid and WFIRST will probe the Universe with unprecedented precision, and on the largest scales. This provides an opportunity to shed light on the longstanding mysteries regarding the true nature of dark matter and dark energy, as well as to resolve tensions between different probes of the past decade. However to utilize the full potential these surveys which are no longer statistically limited, it is critical to develop analytic methods that extract the maximum amount of information across all scales. In this regard, the massive data volumes, the non-linearity, and non-Gaussianity of the signal in the regimes probed by these surveys will pose outstanding challenges.
The focus of this thesis is to tackle aforementioned challenges with a novel framework for the analysis of the large scale structures by reconstructing cosmological fields using forward models. These forward modeling approaches are the most promising way to jointly model different cosmological observations with requisite accuracy at all scales, while accounting for their individual systematic biases and noise. We approach reconstruction within a Bayesian formalism - by optimizing the likelihood of the observed data and maximizing the corresponding posterior of initial conditions. This requires solving optimization problems in multi-million dimensional space and we develop novel differentiable forward models as well as adiabatic optimization algorithms to assist with this.
We begin by using this approach to reconstruct the initial density field from a continuous dark matter field, as well as discrete observables like dark matter halos. We also construct a power spectrum estimator for this reconstructed Gaussian density field, which is the summary
statistic for optimal analysis. With 21-cm intensity mapping, we show how this framework can instead be used for secondary analysis such as de-noising the observed data and recovering lost cosmological information. For this, we also introduce Hidden Valley simulations
- a suite of high resolution N-Body simulations and develop models to simulate HI clustering. Having demonstrated the efficacy of this approach, we finally introduce FlowPM - a GPU-accelerated, distributed, and differentiable N-Body solver in TensorFlow that naturally
interfaces with the most advanced machine learning tools. This will enable the community to efficiently solve large scale cosmological inference problems and develop differentiable forward models for the reconstruction of cosmological fields with the next generation of LSS surveys.