This article considers the subject of information losses arising from the finite data sets used in the training of neural classifiers. It proves a relationship between such losses as the product of the expected total variation of the estimated neural model with the information about the feature space contained in the hidden representation of that model. It then bounds this expected total variation as a function of the size of randomly sampled data sets in a fairly general setting, and without bringing in any additional dependence on model complexity. It ultimately obtains bounds on information losses that are less sensitive to input compression and in general much smaller than existing bounds. This article then uses these bounds to explain some recent experimental findings of information compression in neural networks that cannot be explained by previous work. Finally, this article shows that not only are these bounds much smaller than existing ones, but they also correspond well with experiments.