Preprocessing CVS Data for Fine-Grained Analysis
Thomas Zimmermann · Peter Weißgerber

   Thomas Zimmermann and Peter Weißgerber. Preprocessing CVS Data for Fine-Grained Analysis. Proc. 1st International Workshop on Mining Software Repositories (MSR), Edinburgh, UK, May 2004.
Preprocessing is a prerequisite for any analysis of CVS data.

Get the paper in PDF format (5 pages, 212k).


All analyses of version archives have one phase in common: the preprocessing of data. Preprocessing has a direct impact on the quality of the results returned by an analysis. In this paper we discuss four essential preprocessing tasks necessary for a fine-grained analysis of CVS archives:
  1. data extraction,
  2. transaction recovery,
  3. mapping of changes to fine-grained entities, and
  4. data cleaning.
We formalize the concept of sliding time windows and show how commit mails can relate revisions to transactions. We also present two approaches that map changes to the affected building blocks of a file, e.g. functions or sections.


  1. Introduction
  2. Data Extraction
  3. Restoring Transactions
    • Fixed Time Windows
    • Sliding Time Windows
  4. Mapping Changes to Entities
  5. Data Cleaning
    • Large Transactions
    • Merge Transactions
  6. Related Work
  7. Conclusion
  8. References


