The human visual system represents summary statistical information (e.g. average) along many visual dimensions efficiently. While studies have indicated that approximately the square root of the number of items in a set are effectively integrated through this ensemble coding, how those samples are determined is still unknown. Here, we report that salient items are preferentially weighted over the other less salient items, by demonstrating that the perceived means of spatial (i.e. size) and temporal (i.e. flickering temporal frequency (TF)) features of the group of items are positively biased as the number of items in the group increases. This illusory 'amplification effect' was not the product of decision bias but of perceptual bias. Moreover, our visual search experiments with similar stimuli suggested that this amplification effect was due to attraction of visual attention to the salient items (i.e. large or high TF items). These results support the idea that summary statistical information is extracted from sets with an implicit preferential weighting towards salient items. Our study suggests that this saliency-based weighting may reflect a more optimal and efficient integration strategy for the extraction of spatio-temporal statistical information from the environment, and may thus be a basic principle of ensemble coding.