We propose a robust multi-modal method for automatically registering a moving camera (e.g. mounted on a robot) in 3D scene map. Our approach takes advantages of both Global Positioning System (GPS) and visual sensors to obtain high-precision geographic location for a moving camera at each time. The proposed method distinguishes from past works in the following three aspects: i) we introduce a spatial pyramid mesh warping method to obtain dense correspondences between consecutive frames, which can be used to remove unexpected camera motion for robust registration ; ii) we introduced a robust feature tracking method to tracking feature points in consecutive frames; and iii) we utilize a continuous polynomial function to describe camera motion w.r.t time, which can be solved by minimizing the errors of interpolating both visual observations and GPS locations. We evaluate the proposed method on a set of challenging videos for both stabilization and registration tasks. Results with comparisons to other popular methods showed that our method is capable of achieving high-quality results under various challenges, e.g. lighting changes, motion blurs, scene noises etc.