X-ray fluorescence tomography is based on the detection of fluorescence x-ray photons produced following x-ray absorption while a specimen is rotated; it provides information on the 3D distribution of selected elements within a sample. One limitation in the quality of sample recovery is the separation of elemental signals due to the finite energy resolution of the detector. Another limitation is the effect of self-absorption, which can lead to inaccurate results with dense samples. To recover a higher quality elemental map, we combine x-ray fluorescence detection with a second data modality: conventional x-ray transmission tomography using absorption. By using these combined signals in a nonlinear optimization-based approach, we demonstrate the benefit of our algorithm on real experimental data and obtain an improved quantitative reconstruction of the spatial distribution of dominant elements in the sample. Compared with single-modality inversion based on x-ray fluorescence alone, this joint inversion approach reduces ill-posedness and should result in improved elemental quantification and better correction of self-absorption.