Cone beam computed tomography (CBCT) is a widely available modality, but its clinical utility has been limited by low detail conspicuity and quantitative accuracy. Convenient post-reconstruction denoising is subject to back projected patterned residual, but joint denoise-reconstruction is typically computationally expensive and complex. In this study, we develop and evaluate a novel Metric-learning guided wavelet transform reconstruction (MEGATRON) approach to enhance image domain quality with projection-domain processing. Projection domain based processing has the benefit of being simple, efficient, and compatible with various reconstruction toolkit and vendor platforms. However, they also typically show inferior performance in the final reconstructed image, because the denoising goals in projection and image domains do not necessarily align. Motivated by these observations, this work aims to translate the demand for quality enhancement from the quantitative image domain to the more easily operable projection domain. Specifically, the proposed paradigm consists of a metric learning module and a denoising network module. Via metric learning, enhancement objectives on the wavelet encoded sinogram domain data are defined to reflect post-reconstruction image discrepancy. The denoising network maps measured cone-beam projection to its enhanced version, driven by the learnt objective. In doing so, the denoiser operates in the convenient sinogram to sinogram fashion but reflects improvement in reconstructed image as the final goal. Implementation-wise, metric learning was formalized as optimizing the weighted fitting of wavelet subbands, and a res-Unet, which is a Unet structure with residual blocks, was used for denoising. To access quantitative reference, cone-beam projections were simulated using the X-ray based Cancer Imaging Simulation Toolkit (XCIST). In both learning modules, a data set of 123 human thoraxes, which was from Open-Source Imaging Consortium (OSIC) Pulmonary Fibrosis Progression challenge, was used. Reconstructed CBCT thoracic images were compared against ground truth FB and performance was assessed in root mean square error (RMSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM). MEGATRON achieved RMSE in HU value, PSNR, and SSIM were 30.97 ± 4.25, 37.45 ± 1.78, and 93.23 ± 1.62, respectively. These values are on par with reported results from sophisticated physics-driven CBCT enhancement, demonstrating promise and utility of the proposed MEGATRON method. We have demonstrated that incorporating the proposed metric learning into sinogram denoising introduces awareness of reconstruction goal and improves final quantitative performance. The proposed approach is compatible with a wide range of denoiser network structures and reconstruction modules, to suit customized need or further improve performance.