Two probabilistic genotyping (PG) programs, STRMix™ and TrueAllele™, were used to assess the strength of the same item of DNA evidence in a federal criminal case, with strikingly different results. For STRMix, the reported likelihood ratio in favor of the non-contributor hypothesis was 24; for TrueAllele it ranged from 1.2 million to 16.7 million, depending on the reference population. This case report seeks to explain why the two programs produced different results and to consider what the difference tells us about the reliability and trustworthiness of these programs. It uses a locus-by-locus breakdown to trace the differing results to subtle differences in modeling parameters and methods, analytic thresholds, and mixture ratios, as well as TrueAllele's use of an ad hoc procedure for assigning LRs at some loci. These findings illustrate the extent to which PG analysis rests on a lattice of contestable assumptions, highlighting the importance of rigorous validation of PG programs using known-source test samples that closely replicate the characteristics of evidentiary samples. The article also points out misleading aspects of the way STRMix and TrueAllele results are routinely presented in reports and testimony and calls for clarification of forensic reporting standards to address those problems.