Search

Scholarly Works (1 results)

Thesis
Peer Reviewed

Fixing Dependency Errors for Python Build Reproducibility

Mukherjee, Suchita
Advisor(s): Rubio-González, Cindy

UC Davis Electronic Theses and Dissertations (2021)

Software reproducibility is important for re-usability and thecumulative progress of research. An important manifestation of unreproducible software is the outcome of software builds changing over time. While enhancing code reuse, the usage of open-source dependency packages hosted on centralized software repositories like PyPI can have adverse effects on build reproducibility. Frequent updates of these packages often cause their latest versions to have breaking changes for applications using them. Large Python applications risk their historical builds to become unreproducible due to the widespread usage of Python dependencies, and the lack of uniform practices for dependency version specification. Manually fixing dependency errors requires expensive developer time and effort, while automated approaches face challenges such as parsing unstructured build logs, finding transitive dependencies, and dealing with an exponential search space of dependency versions. In this thesis, we investigate how open-source Python projects specify dependency versions, and how their reproducibility is impacted by dependency packages. We propose a tool PyDFix to detect and fix unreproducibility in Python builds caused by dependency version errors. The ability of PyDFix to fix unreproducible builds is evaluated on two bug datasets BugSwarm and BugsInPy, both of which are built from real-world open-source projects. PyDFix analyzes a total of 2,702 builds, identifying 1,921 (71.1%) of them to be unreproducible due to dependency errors. From these, PyDFix provides a complete fix for 859 (44.7%) builds, and partial fixes for an additional 632 (32.9%) builds.

Cover page: Fixing Dependency Errors for Python Build Reproducibility