Software quality is increasingly becoming a differentiator between software products. Thisresulted in the development of new and improved approaches to software development
like object-orientation and the development of software metrics to better manage the
process of software development. Analyzing a large collection of software projects that use
object-oriented programming in terms of object-oriented design quality metrics that
measure the size, complexity, performance and quality of software could give us a good
idea of how object-oriented programming is used in practice. Analyzing software projects
on the extreme ends of the distributions of these metrics could give us specific examples of
coding practices which influence design qualities. This analysis can be used to inform
machine learning models that estimate code quality based on design metrics.
In this thesis, I generated a repository containing the latest version of 226,793 softwareprojects taken from the Maven Central Repository. I statically analyzed each of these
projects and measured 18 class-level design quality metrics for 10,608,920 classes and 8
package-level design quality metrics for 2,107,577 packages. I analyze the outlier projects
in terms of object-oriented design quality metrics and evaluate their suitability for
inclusion in training sets for machine learning models.