Untangling Changes
by
Kim Herzig, Andreas Zeller
September 2011. Under submission.
Abstract
When developers commit software changes to a version control system, they often commit unrelated changes in a single transaction---simply because, while, say, fixing a bug in module A, they also came across a typo in module B, and updated a deprecated call in module C. When analyzing such archives later, the changes to A, B, and C are treated as being falsely related. In an evaluation of five Java projects, we found up to 15% of all fixes to consist of multiple unrelated changes, compromising the resulting analyses through noise and bias. We present the first approach to untangle such combined changes after the fact. By taking into account data dependencies, distance measures, change couplings, test impact couplings, and distances in call graphs, our approach is able to untangle tangled changes with a mean success rate of 63--75%. Our recommendation is that such untangling be considered as a mandatory step in mining software archives.
BibTeX Entry
@unpublished{herzig-tmp-2011, title = "Untangling Changes", author = "Kim Herzig and Andreas Zeller", year = "2011", month = sep, }