Standardized rating systems are often used to evaluate the proficiency of Motivational Interviewing (MI) counselors. The published inter-rater reliability (degree of coder agreement) in many studies using these instruments has varied a great deal; some studies report MI proficiency scores that have only fair inter-rater reliability, and others report scores with excellent reliability. How much can we to trust the scores with fair versus excellent reliability? Using a Monte Carlo statistical simulation, we compared the impact of fair (0.50) versus excellent (0.90) reliability on the error rates of falsely judging a given counselor as MI proficient or not proficient. We found that improving the inter-rater reliability of any given score from 0.5 to 0.9 would cause a marked reduction in proficiency judgment errors, a reduction that in some MI evaluation situations would be critical. We discuss some practical tradeoffs inherent in various MI evaluation situations, and offer suggestions for applying findings from formal MI research to problems faced by real-world MI evaluators, to help them minimize the MI proficiency judgment errors bearing the greatest cost.