Finding and fixing bugs is a major but time- and effort-consuming task for software quality assurance in software development process. When a bug is filed, valuable multi-dimensional information is captured by the bug report and stored in the bug tracking system. However, developers and researchers have so far used only part of this information (e.g., a detailed description of a failure and occasionally hint at the location of the fault in the code), and for limited purposes, e.g., finding and fixing bugs, detecting duplicate bug reports, or improving bug triagging accuracy. We contend that this information is useful not only for software testing and debugging but also for product understanding, software evolution, and software management. This dissertation makes several advances in extracting actionable information from bug reports using data mining and nature language processing techniques. Both software developers and researchers can benefit from our approach.
We first focus on differences in bugs and bug-fixing processes between desktop and smartphone applications. Specifically, we focus on two main thrusts: a quantitative analysis to discover similarities and differences between desktop and smartphone bug reports/processes, and a qualitative analysis where we extract topics from bug reports to understand bugs' nature, categories, as well as differences between platforms.
Next, we present an approach whose focus is understanding the differences between concurrency and non-concurrency bugs, the differences among various concurrency bug classes, and predicting bug quantity, type, and location, from patches, bug reports and bug-fix metrics.
In addition, we found bugs of different severities have so far been put into the same category, but their characteristics differ significantly. Moreover, the nature of issues with the same severity, e.g., high-severity, differs markedly between desktops and smartphones. To understand these differences, we perform an empirical study on 72 Android and desktop projects. We study how severity changes, quantify the differences between classes in terms of bug-fixing attributes and analyze how the topics differ across classes on each platform over time.
Finally, we propose a novel delta debugging technique to reduce the length of event traces by using a record
amp;replay scheme. When we capture the event sequence while executing the application, an event dependency graph (EDG) will be generated. Then we use the EDG to guide the delta debugging algorithm by eliminating irrelevant events. Therefore, the debugging process can be improved significantly if events that are irrelevant to the crash are filtered out.