How neurons in the brain collectively represent stimuli is a long standing open problem. Studies in many species from leech and cricket to primate show that animals behavior, such as gaze direction or arm movement, often correlates with a measure of neural activity
termed the population vector. To construct it, one averages preferred stimuli for each neuron proportionately to their responses. However, the population vector discards much of the information contained in the activity of the population. In the first part of this thesis we show that for a broad class of common models of neural populations a sufficient statistic of the population response can be constructed that is guaranteed to transmit as much information about the stimulus as the population response. The statistic has fixed dimension, independent of the population size, and is valid even in the presence of intrinsic interneuronal correlations. This statistic turns out to be a re-weighted version of the population vector. We validate the performance of this statistic on a dataset of visual neural responses. Additionally we show that under certain conditions, this statistic can serve as a reconstruction of the stimulus itself. Quantifying mutual information between inputs and outputs of a large neural circuit is an important open problem in both machine learning and neuroscience. However, evaluation of the mutual information is known to be generally intractable for large systems due to the exponential growth in the number of terms that need to be evaluated. Here we show how information contained in the responses of large neural populations can be effectively computed for a class of models that generalize those considered in the first part of the thesis. Neural responses in this model can remain sensitive to multiple stimulus components. We show that the mutual information in this model can be effectively approximated as a sum of lower- dimensional conditional mutual information terms. The approximations become exact in the limit of large neural populations and for certain conditions on the distribution of receptive fields across the neural population. We empirically find that these approximations continue to work well even when the conditions on the receptive field distributions are not fulfilled. The computing cost for the proposed methods grows linearly in the dimension of the input, and compares favourably with other approximations.