Analyzing Outliers in the Maven Central Repository with Object Oriented Design Metrics
Software quality is increasingly becoming a differentiator between software products. Thisresulted in the development of new and improved approaches to software development like object-orientation and the development of software metrics to better manage the process of software development. Analyzing a large collection of software projects that use object-oriented programming in terms of object-oriented design quality metrics that measure the size, complexity, performance and quality of software could give us a good idea of how object-oriented programming is used in practice. Analyzing software projects on the extreme ends of the distributions of these metrics could give us specific examples of coding practices which influence design qualities. This analysis can be used to inform machine learning models that estimate code quality based on design metrics.
In this thesis, I generated a repository containing the latest version of 226,793 softwareprojects taken from the Maven Central Repository. I statically analyzed each of these projects and measured 18 class-level design quality metrics for 10,608,920 classes and 8 package-level design quality metrics for 2,107,577 packages. I analyze the outlier projects in terms of object-oriented design quality metrics and evaluate their suitability for inclusion in training sets for machine learning models.