This dissertation aims to augment current structural health monitoring (SHM) practice with an approach to model and quantify uncertainty to enable confidence-based decision-making. The SHM application domain is vibration data-based system identification, and more specifically, transmissibility and frequency response function (FRF) estimations are considered, as these are the primary forms of transfer function estimation in the frequency domain. A finite element (FE) model is established in order to supply a benchmark of transmissibility evaluations, and by tuning the FE model, structural damages can be simulated. Two SHM features are proposed to detect and localize defects by analyzing the features calculated at certain interest point arrays. Considering a realistic test condition, all of the model parameters and data are subject to uncertainty from various sources leading to ambiguous system identification results that cause false alarms (Type-I error) when evaluating hypothesis testing for damage. Based upon stationary Gaussian random process, this dissertation statistically establishes uncertainty quantification (UQ) models for different estimators, and uncertainties of transmissibility and FRF are therefore quantified. A perturbation approach is implemented ending up with standard deviation and bias coefficient of transmissibility magnitude estimations. Probability density functions (PDFs) of transmissibility and FRF estimation are derived, for both magnitude and phase, via different methods, namely Chi-square and Gaussian bivariate approach. The proposed statistical models are validated by Monte-Carlo test on both FE simulation model and real lab-scale structure. To obtain a more stringent validation condition, extraneous artificial noise is added onto the raw measurements. Compared to the pre-set confidence interval, validation results are illustrated via outlier percentage, which is the observed outlier amount, at each frequency line, normalized by the number of total test cases. Comparison of the UQ results among different statistical models, estimators, and noise contamination levels is presented, for the purpose of guiding users towards using optimal estimators under certain circumstance. Hypothesis tests are implemented, with statistical models available, and the detection performance is compared for different detectors, damage levels, and noise contaminations. Receiver operating characteristic curves are used for quantitative visualization of the abovementioned performance qualities. Using area under curve (AUC) metric, it is concluded how detection rates trend as damage level and signal-to-noise condition changes, suggesting optimal frequency bands for implementing detection. For example, even for heavily- contaminated cases, there is still acceptable detectability at resonances. As a decision-making problem, SHM probabilistically involves making correct decisions with acceptable (application-dependent) type-I errors. In the end of this dissertation, probability of detection for different cases and test conditions are optimized and compared, as given certain false alarm tolerance thresholds. By having optimal detections, the damage identification problems have a clearer outline with respect to different hypothesis designs