Two experimental protocols, pairwise rating and triplet rank-ing, have been commonly used for eliciting perceptual similar-ity judgments for faces and other objects. However, there hasbeen little systematic comparison of the two methods. Pairwiserating has the advantage of greater precision, but triplet rank-ing is potentially a cognitive less taxing task, thus resulting inless noisy responses. Here, we introduce several information-theoretic measures of how useful responses from the two pro-tocols are for the purpose of response prediction and parame-ter estimation. Using face similarity data collected on AmazonMechanical Turk, we demonstrate that triplet ranking is signif-icantly better for extracting subject-specific preferences, whilethe two are comparable when pooling across subjects. Whilethe specific conclusions should be interpreted cautiously, dueto the particularly simple Bayesian model for response gener-ation utilized here, the work provides a information-theoreticframework for quantifying how repetitions within and acrosssubjects can help to combat noise in human responses, as wellas giving some insight into the nature of similarity representa-tion and response noise in humans. More generally, this workdemonstrates that substantial noise and inconsistency corruptsimilarity judgments, both within- and across-subjects, withconsequent implications for experimental design and data in-terpretation.