The reliability of a measurement system is studied as a precursor to establishing the accuracy of the measurement system. Forensic science disciplines that rely on feature-based comparisons (e.g., handwriting analysis, fingerprint analysis) have been criticized for the absence of studies demonstrating reliability and accuracy. This has led to empirical evaluations through the use of "black-box" studies. Typically, data collected from inter-examiner (reproducibility) studies is analyzed separately from studies of intra-examiner (repeatability) studies. Motivated by these forensic studies, this dissertation develops methods to assess reliability for continuous, binary, and ordinal outcomes in forensics by combining inter-examiner and intra-examiner data for efficient estimation of reliability, while accounting for possible examiner-forensic sample interactions. Furthermore, we propose an exploratory method to cluster raters/ examiners to identify subpopulations that appear to apply similar decision-making approaches. The dissertation also includes the development of a statistical model to address measurement variability in methylomic studies.