Machine vision-based structural health monitoring is gaining popularity due to the rich information one can extract from video and images. However, the extraction of characteristic parameters from images often requires manual intervention, thereby limiting its scalability and effectiveness. In contrast, deep learning overcomes the aforementioned shortcoming in that it can autonomously extract feature parameters (e.g. structural damage) from image datasets. Therefore, this study aims to validate the use of machine vision and deep learning for structural health monitoring by focusing on a particular application of detecting bolt loosening. First, a dataset that contains 300 images was collected. The dataset includes two bolt states, namely, tight and loosened. Second, a faster region-based convolutional neural network was trained and evaluated. The test results showed that the average precision of bolt damage detection is 0.9503. Thereafter, bolts were loosened to various screw heights, and images obtained from different angles, lighting conditions, and vibration conditions were identified separately. The trained model was then employed to validate that bolt loosening could be detected with sufficient accuracy using various types of images. Finally, the trained model was connected with a webcam to realize real-time bolt loosening damage monitoring.