| || ||
About the Seminar
Software archives mining deals with the automated extraction, collection, and abstraction of data from the information generated during the software development process (e.g. source code archives, bug
tracking systems, etc.). This seminar (7 CP) introduces
notion of software archives and teaches recent software
Place and Time
There will be one session per week:
- Regular sessions: Wednesday, 16:15 - 17:30, Building
E1.1, Room 1.09.
Requirements for successful participation
- Prerequisites: This seminar is suitable for all students, bachelor's or master's, who are interested in software engineering and its applications. You don't need to have any prior knowledge
regarding the subject, however participation in any other course offered by the Software Engineering chair might be useful.
- Regular attendance: Attendance is mandatory. A maximum of 2 absences is allowed per participant. An official written note of absence is required if you are unable to attend more than 2 times.
- Talks: During the course of the semester, each
participant is expected to give two short 6 minutes presentations and a
long 20 minutes
- Discussions: Each paper presented in the seminar will be discussed after its presentation. Seminar participants are supposed to read all the
presented papers in advance.
- Write-ups: Participants are expected to submit reviews of the discussed papers.
- Project: Each participant will have to submit a programming exercise during the course of the semester in order to gain better
understanding of the isses one faces when mining software archives. Groups of up to 2 participants are allowed.
- Grading: The final grade will be computed on the following basis: 6 minutes presentations = 30% of the grade; project = 30%; 20 minutes
presentation = 40%.
Sessions and Papers
21.04.2010: Ways to mine repositories
28.04.2010: Recommendation Systems
- "Populating a Release History Database from Version Control and Bug Tracking Systems" by M. Fischer, M. Pinzger, and H. Gall. ICSM'03.
- "Mining Email Social Networks" by C. Bird, A. Gourley, P. Devanbu, M. Gertz, and A. Swaminathan. MSR'06. (Sebastian) [PDF]
- "Mining Usage Expertise from Version Archives" by D. Schuler and T. Zimmermann, MSR'08. (Sascha) [PDF]
- "Mining Version Histories to Guide Software Changes (eRose)" by T. Zimmermann, P. Weissgerber, S. Diehl and A. Zeller, ICSE'04. (Robert) [PDF]
12.05.2010: Discussing the MSA Project
19.05.2010: Defects Detection
- "Predicting Defects for Eclipse" by T. Zimmermann, R. Premraj and A. Zeller. Promise 2007. (Christian) [PDF]
- "An Empirical Study of the Factors Relating Field Failures and Dependencies" by T. Zimmermann, N. Nagappan, L. Williams, K. Herzig, and R. Premraj. To be published 2010. (Myroslav) [PDF]
26.05.2010: Social Networks
- "A Critique of Software Defect Prediction Models" by N. E. Fenton and M. Neil, 1998. (Andrey) [PDF]
- "Predicting Faults from Cached History" by S. Kim, T. Zimmermann, E.J. Whitehead, and A. Zeller. ISEC 2008. (Eva) [PDF]
02.06.2010: Teams Structure
- "Putting it all together: Using Socio-Technical Networks to Predict Failures" by C. Bird, N. Nagappan, H. Gall, B. Murphy, and P. Devanbu. ISSRE 09. (Natalia) [PDF]
- "Does Distributed Development Affect Software Quality? An Empirical Case Study of Windows Vista" by C. Bird, N. Nagappan, P. Devanbu, H. Gall, B. Murphy. In Communications of the ACM,
August 2009. (Sebastian) [PDF]
- "The Influence of Organizational Structure on Software Quality: An Empirical Case Study" by N. Nagappan, B. Murphy and V. Basili, ICSE 08. (Merlin) [PDF]
- "Predicting Defects using Network Analysis on Dependency Graphs" by T. Zimmermann and N. Nagappan. ICSE 08. (Yanchuan) [PDF]
16.06.2010: Empirical Studies
- "Mylar: a degree-of-interest model for IDEs" by M. Kersten and G. C. Murphy. AOSD 2005. (Girish) [PDF]
- "Hipikat: A Project Memory for Software Development" by D. Cubranic, Gail Murphy, J. Singer, and K. S. Booth. IEEE TSE 2005. (Yury +
- "Empirical Studies of Software Engineering: A Roadmap" by D. Perry, A. Poter and L. Votta. ACM 2000. (Stephan + Natalia)
- "Should Computer Scientists Experiment More?" by W. Tichy. 1997. (Andreas + Myroslav) [PDF]
30.06.2010: Scalable Mining
- "Do Stack Traces Help Developers Fix Bugs?" by A. Schroeter, N. Bettenburg and R. Premraj. MSR 2010. (Robert) [PDF]
- "The secret life of Bugs. Going Past the Errors and Omissions in Software Repositories" by J. Aranda and G. Venolia. ICSE 2009. (Andrey)
07.07.2010: Threats to Mining
- "Mining trends of library usage" by Y. Mileva, V. Dallmeier, M. Burger, and A. Zeller. RSSE 2009. (Stephan) [PDF]
- "Cross-project Defect Prediction" by T. Zimmermann, N. Nagappan, H. Gall, E. Ginger, and B. Murphy, FSE 2009. (Eva) [PDF]
14.07.2010: Improving Repositories
- "Change Bursts as Defect Predictors" by N. Nagappan, A. Zeller, T. Zimmermann, K. Herzig, and B. Murphy. 2010. (Yury) [PDF]
- "Fair and Balanced? Bias in Bug-Fix Datasets" by C. Bird, A. Bachmann, E. Aune, J. Duffy, A. Bernstein and V. Filkov, P. Devanbu. ESEC/FSE
2009. (Christian + Andreas) [PDF]
- "Information Needs in Bug Reports: Improving Cooperation Between Developers and Users" by S. Breu, R. Premraj, J. Sillito and
T. Zimmermann, CSCW 2010. (Girish) [PDF]
- "What makes a good bug report?" by N. Bettenburg, S. Just, A. Schroeter, C. Weiss, R. Premraj and T. Zimmermann. SIGSOFT'08/FSE-16.
Papers for long talks (20 min)
Below you find the papers for the long talks. The date for this session is August 4th, 2010 and it will take place between 9am and 5pm in
- "An Extensive Comparison of Bug Prediction Approaches" by M. D'Ambros, M. Lanza, and R. Robbes, MSR2010. (Sebastian Hafner) [PDF]
- "Identifying Security Bug Reports via Text Mining: An Industrial Case Study" by M. Gerick, P. Rotella, and T. Xie, MSR2010. (Natalia Prytkova) [PDF]
- "Summarizing Software Artifacts: A Case Study of Bug Reports" by S. Rastkar, G. Murphy, and G. Murray, ICSE2010. (Robert Bleyl) [PDF]
- "Replaying IDE Interactions to Evaluate and Improve Change Prediction Approaches" by R. Robbes, D. Pollet, and M. Lanza, MSR2010. (Yuri Pavlov) [PDF]
- "Replicating MSR: A study of potential replicability of papers published in the Mining Software Repositories Proceedings" by G. Robles, MSR2010. (Sascha Just) [PDF]
- "When Process Data Quality Affects the Number of Bugs: Correlations in Software Engineering Datasets" by A. Bachmann and A. Bernstein, MSR2010. (Christian Holler) [PDF]
- "Predicting Vulnerable Software Components" by S. Neuhaus, T. Zimmermann, C. Holler, and A. Zeller, CCS2007. (Andrey Tarasevich) [PDF]
- "DynaMine: Finding Common Error Patterns by Mining Software Revision Histories" by B. Livshits and T. Zimmermann, ESEC-FSE'05. (Andreas Rau) [PDF]
- "Codebook: Discovering and Exploiting Relationships in Software Repositries" by A. Begel, K. Phang, and T. Zimmermann, ICSE2010. (Eva May) [PDF]
- "Linking E-Mails and Source Code Artifacts" by A. Bacchelli, M. Lanza, and R. Robbes, ICSE2010. (Merlin Lang) [PDF]
- "Relation of Code Clones and Change Couplings" by R. Geiger, B. Fluri, H. Gall, and M. Pinzger, Fundamental Approaches to Software Engineering, 2006. (Girish Parappa) [PDF]
- "A Discriminative Model Approach for Accurate Duplicate Bug Report Retrieval" by C. Sun, D. Lo, X. Wang, J. Jiang, and S. Khoo, ICSE2010. (Yanchuan Li) [PDF]
- "A Degree-of-Knowledge Model to Capture Source Code Familiarity" by T. Fritz, J. Ou, C. Murphy, and E. Murphy-Hill, ICSE2010. (Myroslav Bachynskyi) [PDF]
- "Which Warnings Should I Fix First" by S. Kim and M. Ernst, ESEC-FSE'07. (Stephan Max) [PDF]
Kick-off session slides: Intro and Project Slides.
Project Slides from session 4: Slides
Paper review form. Download here.
JEdit bug XHTML files for the mining project. Download here.
JEdit SVN tarball for mining project. Download here.
Sample paper reviews: Download here.
How to give a good research talk: Prof Zeller's slides.
What is a good evaluation: Check out the following links Falsifiability, Evaluation, Empirical
Research and paper.
<firstname.lastname@example.org> · http://www.st.cs.uni-saarland.de/edu/msa10/ · Updated: 2018-04-05 13:40