© 2018 Elsevier B.V. Occupancy information is crucial to building facility design, operation, and energy efficiency. Many studies propose the use of environmental sensors (such as carbon dioxide, air temperature, and relative humidity sensors) and radio-frequency sensors (Wi-Fi networks) to monitor, assess, and predict occupancy information for buildings. As many methods have been developed and a variety of sensory data sources are available, establishing a proper selection of model and data source is critical to the successful implementation of occupancy prediction systems. This study compared three popular machine learning algorithms, including k-nearest neighbors (kNN), support vector machine (SVM), and artificial neural network (ANN), combined with three data sources, including environmental data, Wi-Fi data, and fused data, to optimize the occupancy models’ performance in various scenarios. Three error measurement metrics, the mean average error (MAE), mean average percentage error (MAPE), and root mean squared error (RMSE), have been employed to compare the models’ accuracies. Examined with an on-site experiment, the results suggest that the ANN-based model with fused data has the best performance, while the SVM model is more suitable with Wi-Fi data. The results also indicate that, comparing with independent data sources, the fused data set does not necessarily improve model accuracy but shows a better robustness for occupancy prediction.