Our thesis is that software repositories contain latent information that can be mined to enable quantitative decision making. The decision-making process in software development and maintenance is mostly dependent on software practitioner's experience and intuition. For example, developers use their experiences when prioritizing bug fixes, managers allocate development and testing resources based on their intuition and so on. Often these human driven decisions lead to wasted resources
and increased cost of building and maintaining large complex software systems. The fundamental problem that motivates this dissertation is the lack of techniques that can automate decision-making process in software engineering. As data mining techniques became more mature, mining software repositories has emerged as a novel methodology to analyze the massive amounts of data
created during software development process. Significant, repeatable patterns and behaviours in software development can be identified as a result of this mining which are often used for predicting various aspects of software development and maintenance, such as
predicting defect-prone releases, software quality, or bug fix time.
In this dissertation we show techniques to effectively mine software repositories, identify significant patterns during software development and maintenance, and recommend actionable aspects that can automate the decision-making process in software engineering.
We demonstrate efficient techniques to use the information stored in software repositories and produce results to guide software practitioners so that they can depend less on their intuition and experience and more on actual data.
To this end, in this dissertation, we make several contributions.
First, we perform several empirical studies to characterize information
needs of software developers and managers in the context of decision making during software development and maintenance. For example, we study what kinds of decision-making problems are important to software practitioners on a daily basis. Second, to facilitate analysis of various types of decision-making problems using a common platform, we design a generic mixed-graph model to capture associations of different software elements. We illustrate how we can build different types of hyper-edges on this mixed-graph to quantify amorphous behaviour and dependencies among various software elements. Third, to demonstrate the effectiveness of our framework, we formalize a set of four important decision-making problems that are challenging to address with the state-of-the-art. We show that our framework can achieve high-levels of prediction accuracies for different types of decision-making problems when tested on large, widely-used, real-world, long-lived software projects.