Visual odometry (VO) is the process of estimating the egomotion of an agent (e.g., a vehicle, human, or robot) using only the input of a single or multiple attached cameras. In this thesis, we present a stereo visual odometry system for estimating the camera pose and surrounding three-dimensional structure from successive stereo image pairs. In order to generate the 3D-to-2D point correspondences, linear triangulation is performed between the stereo pair and a descriptor-based method is used to track feature points between frames. After obtaining the correspondences, the camera poses are initially estimated through a perspective-three-point (P3P) algorithm. During the initial estimation stage, the random sample consensus (RANSAC) method is used to perform outlier rejection. Finally, local estimation of the camera trajectory is refined by windowed bundle adjustment. Our algorithm differs from most visual odometry algorithms in three key respects: (1) it makes no prior assumptions about the camera motion, (2) it performs well on both small motions (e.g., wearable devices) and large motions (e.g., vehicles), and (3) it supports both indoor and outdoor environments. We test our system on a 22 km outdoor environment and a 0.9 km indoor environment and achieve 2.09% average translation error and 0.0067 deg/m average rotation error for every 800 m traveled.