Classifying Code Changes and Predicting Defects Using ChangeGenealogies
by Kim Herzig, Sascha Just, Andreas Rau, Andreas Zeller

November 2013. Under submission for MSR2013.

Download as PDF file.

Abstract

Identifying bug fixes and using them to estimate or even predict software quality is a frequent task when mining version archives. The number of applied bug fixes serves as code quality metric identifying defect-prone and non-defect-prone code artifacts. But when is a set of applied code changes considered a bug fix and which metrics should be used to building high quality defect prediction models? In this work, we make use of change genealogy graphs to define a set of change genealogy network metrics describing the structural dependencies of change sets. We further investigate whether change genealogy metrics can be used to identify bug fixing change sets (without using commit messages and bug databases) and whether change genealogy metrics are expressive enough to build effective defect prediction models classifying source files to be defect-prone or not. The results show that change genealogy metrics can be used to separate bug fixing from feature implementing change sets with an average precision of 72% and an average recall of 89%. Our results also show that defect prediction models based on change genealogy metrics can predict defect-prone source files with precision and recall values of up to 80%. On average the precision for change genealogy models lies at 69% and the average recall at 81%. Compared to prediction models based on code dependency network metrics, change genealogy based prediction models achieve better precision and comparable recall values.

BibTeX Entry

@techreport{herzig-genealogytechreport-2011,
    title = "Classifying Code Changes and Predicting Defects Using ChangeGenealogies",
    author = "Kim Herzig and Sascha Just and Andreas Rau and Andreas Zeller",
    year = "2013",
    month = nov,
}

Show all publications of the Software Engineering Chair.