Software reproducibility is important for re-usability and thecumulative progress of research. An important manifestation of
unreproducible software is the outcome of software builds changing
over time. While enhancing code reuse, the usage of open-source
dependency packages hosted on centralized software repositories like
PyPI can have adverse effects on build reproducibility. Frequent
updates of these packages often cause their latest versions to have
breaking changes for applications using them. Large Python
applications risk their historical builds to become unreproducible due
to the widespread usage of Python dependencies, and the lack of
uniform practices for dependency version specification. Manually
fixing dependency errors requires expensive developer time and effort,
while automated approaches face challenges such as parsing
unstructured build logs, finding transitive dependencies, and dealing
with an exponential search space of dependency versions. In this
thesis, we investigate how open-source Python projects specify
dependency versions, and how their reproducibility is impacted by
dependency packages. We propose a tool PyDFix to detect and fix
unreproducibility in Python builds caused by dependency version
errors. The ability of PyDFix to fix unreproducible builds is evaluated
on two bug datasets BugSwarm and BugsInPy, both of which are built
from real-world open-source projects. PyDFix analyzes a total of 2,702
builds, identifying 1,921 (71.1%) of them to be unreproducible due to
dependency errors. From these, PyDFix provides a complete fix for 859
(44.7%) builds, and partial fixes for an additional 632 (32.9%)
builds.