Imputation of Missing Traffic Flow Data by Using Denoising Autoencoders
In transportation engineering, Spatio-temporal data including traffic flow, speed, and occupancy are collected from different kinds of sensors and used by transportation engineers for analysis. However, the missing data influence the analysis and prediction results significantly. In this thesis, Denoising Autoencoders are used to impute the missing traffic flow data. First, we focused on the general situation and used three kinds of Denoising Autoencoders: “Vanilla”, CNN, and Bi-LSTM to implement the data with a general missing rate of 30%. Each model was optimized by focusing on the main hyper-parameters since the tuning can influence the accuracy of the final prediction result. Then, the Autoencoder models are used to train and test data with an exceptionally high missing rate of about 80%. We do this to test and then demonstrate that even under extreme loss conditions, Autoencoder models are very robust. By observing the hyper-parameter tuning process, the changing prediction accuracy is shown and in most cases, all three models maintain good accuracy even under the worst situations. Moreover, the error patterns and trends concerning different sensor stations and different hours on weekdays and weekends are also visualized and analyzed. Finally, based on these results, we separate the data into weekdays and weekends, train and test the models respectively, and improve the accuracy of the imputation result significantly.