Machine learning algorithm has been applied to shear wave velocity (Vs) inversion in surface wave tomography, where a set of starting 1-D Vs profiles and their corresponding synthetic dispersion curves are used in network training. Previous studies showed that the performance of such trained network is dependent on the diversity of the training data set, which limits its application to previously poorly understood regions. Here, we present an improved semi-supervised algorithm-based network that takes both model-generated and observed surface wave dispersion data in the training process. The algorithm is termed Wasserstein cycle-consistent generative adversarial networks (Wasserstein Cycle-GAN [Wcycle-GAN]). Different from conventional supervised approaches, the GAN architecture enables the inclusion of unlabeled data (the observed surface wave dispersion) in the training process that can complement the model-generated data set. The cycle-consistency and Wasserstein metric significantly improve the training stability of the proposed algorithm. We benchmark the Wcycle-GAN method using 4,076 pairs of fundamental mode Rayleigh wave phase and group velocity dispersion curves derived in periods from 3 to 16 s in Southern California. The final 3-D Vs model given by the best trained network shows large-scale features consistent with the surface geology. The resulting Vs model has reasonable data misfits and provides sharper images of structures near faults in the top 15 km compared with those from conventional machine learning methods.