Climate models serve as indispensable tools to investigate the effect of anthropogenic emissions on current and future climate, including extremes. However, as low-dimensional approximations of the climate system, they will always exhibit biases. Several attempts have been made to correct for biases as they affect extremes prediction, predominantly focused on correcting model-simulated distribution shapes. In this study, the effectiveness of a recently published quantile-based bias correction scheme, as well as a new subset selection method introduced here, are tested out-of-sample using model-as-truth experiments. Results show that biases in the shape of distributions tend to persist through time, and therefore, correcting for shape bias is useful for past and future statements characterizing the probability of extremes. However, for statements characterized by a ratio of the probabilities of extremes between two periods, we find that correcting for shape bias often provides no skill improvement due to the dominating effect of bias in the long-term trend. Using a toy model experiment, we examine the relative importance of the shape of the distribution versus its position in response to long-term changes in radiative forcing. It confirms that the relative position of the two distributions, based on the trend, is at least as important as the shape. We encourage the community to consider all model biases relevant to their metric of interest when using a bias correction procedure and to construct out-of-sample tests that mirror the intended application.