Mining Software Archives
Seminar - summer semester 2010

Software Engineering Chair (Prof. Zeller)
Saarland University – Computer Science
Campus E1 1
66123 Saarbrücken, Germany
E-mail: zeller @
Phone: +49 681 302-70970

Deutschsprachige Startseite Page d'acceuil en fran?ais English home page

About the Seminar

Software archives mining deals with the automated extraction, collection, and abstraction of data from the information generated during the software development process (e.g. source code archives, bug tracking systems, etc.). This seminar (7 CP) introduces the notion of software archives and teaches recent software archives mining techniques.

Place and Time

There will be one session per week:

  • Regular sessions: Wednesday, 16:15 - 17:30, Building E1.1, Room 1.09.

Requirements for successful participation

  • Prerequisites: This seminar is suitable for all students, bachelor's or master's, who are interested in software engineering and its applications. You don't need to have any prior knowledge regarding the subject, however participation in any other course offered by the Software Engineering chair might be useful.
  • Regular attendance: Attendance is mandatory. A maximum of 2 absences is allowed per participant. An official written note of absence is required if you are unable to attend more than 2 times.
  • Talks: During the course of the semester, each participant is expected to give two short 6 minutes presentations and a long 20 minutes presentation.
  • Discussions: Each paper presented in the seminar will be discussed after its presentation. Seminar participants are supposed to read all the presented papers in advance.
  • Write-ups: Participants are expected to submit reviews of the discussed papers.
  • Project: Each participant will have to submit a programming exercise during the course of the semester in order to gain better understanding of the isses one faces when mining software archives. Groups of up to 2 participants are allowed.
  • Grading: The final grade will be computed on the following basis: 6 minutes presentations = 30% of the grade; project = 30%; 20 minutes presentation = 40%.

Sessions and Papers

21.04.2010: Ways to mine repositories
  1. "Populating a Release History Database from Version Control and Bug Tracking Systems" by M. Fischer, M. Pinzger, and H. Gall. ICSM'03. (Merlin) [PDF]
  2. "Mining Email Social Networks" by C. Bird, A. Gourley, P. Devanbu, M. Gertz, and A. Swaminathan. MSR'06. (Sebastian) [PDF]
28.04.2010: Recommendation Systems
  1. "Mining Usage Expertise from Version Archives" by D. Schuler and T. Zimmermann, MSR'08. (Sascha) [PDF]
  2. "Mining Version Histories to Guide Software Changes (eRose)" by T. Zimmermann, P. Weissgerber, S. Diehl and A. Zeller, ICSE'04. (Robert) [PDF]
05.05.2010: Metrics
  1. "Predicting Defects for Eclipse" by T. Zimmermann, R. Premraj and A. Zeller. Promise 2007. (Christian) [PDF]
  2. "An Empirical Study of the Factors Relating Field Failures and Dependencies" by T. Zimmermann, N. Nagappan, L. Williams, K. Herzig, and R. Premraj. To be published 2010. (Myroslav) [PDF]
12.05.2010: Discussing the MSA Project

19.05.2010: Defects Detection
  1. "A Critique of Software Defect Prediction Models" by N. E. Fenton and M. Neil, 1998. (Andrey) [PDF]
  2. "Predicting Faults from Cached History" by S. Kim, T. Zimmermann, E.J. Whitehead, and A. Zeller. ISEC 2008. (Eva) [PDF]
26.05.2010: Social Networks
  1. "Putting it all together: Using Socio-Technical Networks to Predict Failures" by C. Bird, N. Nagappan, H. Gall, B. Murphy, and P. Devanbu. ISSRE 09. (Natalia) [PDF]
  2. "Does Distributed Development Affect Software Quality? An Empirical Case Study of Windows Vista" by C. Bird, N. Nagappan, P. Devanbu, H. Gall, B. Murphy. In Communications of the ACM, August 2009. (Sebastian) [PDF]
02.06.2010: Teams Structure
  1. "The Influence of Organizational Structure on Software Quality: An Empirical Case Study" by N. Nagappan, B. Murphy and V. Basili, ICSE 08. (Merlin) [PDF]
  2. "Predicting Defects using Network Analysis on Dependency Graphs" by T. Zimmermann and N. Nagappan. ICSE 08. (Yanchuan) [PDF]
09.06.2010: IDEs
  1. "Mylar: a degree-of-interest model for IDEs" by M. Kersten and G. C. Murphy. AOSD 2005. (Girish) [PDF]
  2. "Hipikat: A Project Memory for Software Development" by D. Cubranic, Gail Murphy, J. Singer, and K. S. Booth. IEEE TSE 2005. (Yury + Sascha) [PDF]
16.06.2010: Empirical Studies
  1. "Empirical Studies of Software Engineering: A Roadmap" by D. Perry, A. Poter and L. Votta. ACM 2000. (Stephan + Natalia) [PDF]
  2. "Should Computer Scientists Experiment More?" by W. Tichy. 1997. (Andreas + Myroslav) [PDF]
23.06.2010: Surveys
  1. "Do Stack Traces Help Developers Fix Bugs?" by A. Schroeter, N. Bettenburg and R. Premraj. MSR 2010. (Robert) [PDF]
  2. "The secret life of Bugs. Going Past the Errors and Omissions in Software Repositories" by J. Aranda and G. Venolia. ICSE 2009. (Andrey) [PDF]
30.06.2010: Scalable Mining
  1. "Mining trends of library usage" by Y. Mileva, V. Dallmeier, M. Burger, and A. Zeller. RSSE 2009. (Stephan) [PDF]
  2. "Cross-project Defect Prediction" by T. Zimmermann, N. Nagappan, H. Gall, E. Ginger, and B. Murphy, FSE 2009. (Eva) [PDF]
07.07.2010: Threats to Mining
  1. "Change Bursts as Defect Predictors" by N. Nagappan, A. Zeller, T. Zimmermann, K. Herzig, and B. Murphy. 2010. (Yury) [PDF]
  2. "Fair and Balanced? Bias in Bug-Fix Datasets" by C. Bird, A. Bachmann, E. Aune, J. Duffy, A. Bernstein and V. Filkov, P. Devanbu. ESEC/FSE 2009. (Christian + Andreas) [PDF]
14.07.2010: Improving Repositories
  1. "Information Needs in Bug Reports: Improving Cooperation Between Developers and Users" by S. Breu, R. Premraj, J. Sillito and T. Zimmermann, CSCW 2010. (Girish) [PDF]
  2. "What makes a good bug report?" by N. Bettenburg, S. Just, A. Schroeter, C. Weiss, R. Premraj and T. Zimmermann. SIGSOFT'08/FSE-16. (Yanchuan) [PDF]

Papers for long talks (20 min)

Below you find the papers for the long talks. The date for this session is August 4th, 2010 and it will take place between 9am and 5pm in seminar room 1.09.
  • "An Extensive Comparison of Bug Prediction Approaches" by M. D'Ambros, M. Lanza, and R. Robbes, MSR2010. (Sebastian Hafner) [PDF]
  • "Identifying Security Bug Reports via Text Mining: An Industrial Case Study" by M. Gerick, P. Rotella, and T. Xie, MSR2010. (Natalia Prytkova) [PDF]
  • "Summarizing Software Artifacts: A Case Study of Bug Reports" by S. Rastkar, G. Murphy, and G. Murray, ICSE2010. (Robert Bleyl) [PDF]
  • "Replaying IDE Interactions to Evaluate and Improve Change Prediction Approaches" by R. Robbes, D. Pollet, and M. Lanza, MSR2010. (Yuri Pavlov) [PDF]
  • "Replicating MSR: A study of potential replicability of papers published in the Mining Software Repositories Proceedings" by G. Robles, MSR2010. (Sascha Just) [PDF]
  • "When Process Data Quality Affects the Number of Bugs: Correlations in Software Engineering Datasets" by A. Bachmann and A. Bernstein, MSR2010. (Christian Holler) [PDF]
  • "Predicting Vulnerable Software Components" by S. Neuhaus, T. Zimmermann, C. Holler, and A. Zeller, CCS2007. (Andrey Tarasevich) [PDF]
  • "DynaMine: Finding Common Error Patterns by Mining Software Revision Histories" by B. Livshits and T. Zimmermann, ESEC-FSE'05. (Andreas Rau) [PDF]
  • "Codebook: Discovering and Exploiting Relationships in Software Repositries" by A. Begel, K. Phang, and T. Zimmermann, ICSE2010. (Eva May) [PDF]
  • "Linking E-Mails and Source Code Artifacts" by A. Bacchelli, M. Lanza, and R. Robbes, ICSE2010. (Merlin Lang) [PDF]
  • "Relation of Code Clones and Change Couplings" by R. Geiger, B. Fluri, H. Gall, and M. Pinzger, Fundamental Approaches to Software Engineering, 2006. (Girish Parappa) [PDF]
  • "A Discriminative Model Approach for Accurate Duplicate Bug Report Retrieval" by C. Sun, D. Lo, X. Wang, J. Jiang, and S. Khoo, ICSE2010. (Yanchuan Li) [PDF]
  • "A Degree-of-Knowledge Model to Capture Source Code Familiarity" by T. Fritz, J. Ou, C. Murphy, and E. Murphy-Hill, ICSE2010. (Myroslav Bachynskyi) [PDF]
  • "Which Warnings Should I Fix First" by S. Kim and M. Ernst, ESEC-FSE'07. (Stephan Max) [PDF]


Kick-off session slides: Intro and Project Slides.
Project Slides from session 4: Slides
Paper review form. Download here.
JEdit bug XHTML files for the mining project. Download here.
JEdit SVN tarball for mining project. Download here.
Sample paper reviews: Download here.
How to give a good research talk: Prof Zeller's slides.
What is a good evaluation: Check out the following links Falsifiability, Evaluation, Empirical Research and paper.


<> · · Updated: 2018-04-05 13:40