Navigating through natural environments requires localizing objects along three distinct spatial axes. Information about position along the horizontal and vertical axes is available from an object's position on the retina, while position along the depth axis must be inferred based on second-order cues such as the disparity between the images cast on the two retinae. Past work has revealed that object position in two-dimensional (2D) retinotopic space is robustly represented in visual cortex and can be robustly predicted using a multivariate encoding model, in which an explicit axis is modeled for each spatial dimension. However, no study to date has used an encoding model to estimate a representation of stimulus position in depth. Here, we recorded BOLD fMRI while human subjects viewed a stereoscopic random-dot sphere at various positions along the depth (z) and the horizontal (x) axes, and the stimuli were presented across a wider range of disparities (out to ∼40 arcmin) compared to previous neuroimaging studies. In addition to performing decoding analyses for comparison to previous work, we built encoding models for depth position and for horizontal position, allowing us to directly compare encoding between these dimensions. Our results validate this method of recovering depth representations from retinotopic cortex. Furthermore, we find convergent evidence that depth is encoded most strongly in dorsal area V3A.