Dies ist ein Archiv des alten Softwaretechnik Lehrstuhls der Universität des Saarlandes. Es ist nicht länger aktuell.


Software Mining

Lehrstuhl für Softwaretechnik (Prof. Zeller)
Universität des Saarlandes – Informatik
Informatik Campus des Saarlandes
Campus E9 1 (CISPA)
66123 Saarbrücken
E-mail: zeller @ cs.uni-saarland.de
Telefon: +49 681 302-70970

Deutschsprachige Startseite Page d'acceuil en français English home page
   Software development results in a huge amount of data: changes are recorded in version archives, bugs are reported to issue tracking system, features are discussed in emails and newsgroups. This course (Spezialvorlesung, 2V+2Ü, 6 LP) will cover methods and techniques to mine this data. Additionally, it will feature guest lectures by pioneers in the field of mining programs and their history.

Specifically, the course will present recent research on software evolution, bug detection and prediction, guiding software development, code reuse and search, as well as program analysis techniques such as impact analysis and feature location. The course will also give an introduction to empirical software engineering, showing how to conduct quantitative and qualitative studies.

From this course, students will learn how to analyze programs and their history and how to run empirical studies.


Guest Lecturers


  • Here are the final grades.
  • Remember exercises and project account for 20% each and the exam for 60%.

Reading List for the Exam

  1. All slides from the Lecture 1 to Lecture 9, including the slides from Rainer Koschke's guest lecture and the invited talks by Andrzej Wasykowski and Stephan Neuhaus.
  2. W. F. Tichy, "Should Computer Scientists Experiment More?", IEEE Computer, 31, pp. 32-40, May, 1998. [PDF]
  3. T. Zimmermann and P. Weissgerber, "Preprocessing CVS Data for Fine-grained Analysis," in Procs. of the 1st International Workshop on Mining Software Repositories, (Edinburgh, Scotland), 2004. [PDF]
  4. T. Zimmermann, P. Weissgerber, S. Diehl, and A. Zeller, "Mining Version Histories to Guide Software Changes," IEEE Transactions on Software Engineering, vol. 31, pp. 429-445, June 2005. [PDF]
  5. Martin P. Robillard, "Automatic generation of suggestions for program investigation." ESEC/FSE 2005. [PDF]
  6. R. Holmes and R. J. Walker, "Supporting the Investigation and Planning of Pragmatic Reuse Tasks", ICSE 2007. [PDF]
  7. Ben Liblit, Mayur Naik, Alice X. Zheng, Alexander Aiken, Michael I. Jordan: Scalable statistical bug isolation. PLDI 2005: 15-26 [PDF]
  8. Nachiappan Nagappan, Thomas Ball, Andreas Zeller: Mining metrics to predict component failures. ICSE 2006: 452-461 [PDF]
  9. Sunghun Kim, Thomas Zimmermann, E. James Whitehead Jr., Andreas Zeller: Predicting Faults from Cached History. ICSE 2007: 489-498 [PDF]
  10. No papers for the guest lectures.

Course Projects

The course projects will in teams of up to four students and focus on bug databases. The deliverables are a 15-min presentation on 19th July, as well as a project report (2-4 pages). Pick one of the following topics:
  1. Quality of bug reports

  2. Predicting lifetime of bug reports  ** CANCELLED **
    Related read:
    Lucas D. Panjer: Predicting Eclipse Bug Lifetimes [Mining Challenge 2007]

  3. Detecting duplicates of bug reports
    Related read:
    Per Runeson, Magnus Alexandersson and Oskar Nyholm: Detection of Duplicate Defect Reports Using Natural Language Processing [ICSE 2007]

  4. Triaging bug reports
    Related read:
    John Anvik, Lyndon Hiew and Gail C. Murphy: Who Should Fix This Bug? [ICSE 2006]

  5. Visualizing bug reports (networks)
    Related read:
    Marco D'Ambros and Michele Lanza: Software Bugs and Evolution: A Visual Approach to Uncover Their Relationships [CSMR 2006]

  6. Predicting defects with spam filters  ** CANCELLED **
    Related read:
    Osamu Mizuno, Shiro Ikami, Shuya Nakaichi, and Tohru Kikuno: Spam Filter Based Approach for Finding Fault-Prone Software Modules [MSR 2007]


All lectures will take place in Building E1 3, Hörsaal 002 on Thursdays from 14:00 to 16:00. The schedule is tentative.

19th April - [Lecture 1] Introduction and Empirical Software Engineering
26th April - [Lecture 2] Pre-Processing CVS Archives
3rd May - [Lecture 3] Guiding Software Developers
10th May - [Lecture 4] Software Navigation
17th May - *Holiday*
24th May - [Lecture 5] Code Search and Reuse
31st May - [Lecture 6] A Toolbox for Software Mining
7th June - *Holiday*
14th June - [Lecture 7] Defect Detection & Invited talk by Andrzej Wasylkowski
21th June - [Lecture 8] Defect Prediction & Invited talk by Stephan Neuhaus
28th June - [Lecture 9] Guest Lecture by Rainer Koschke on Clone Detection
3rd July - [Lecture 10] Guest Lecture by Harald Gall on Software Evolution
  (Note that the lecture is on Tuesday and starts at 4PM in E1.3 HS 003.)
11th July - [Lecture 11] Guest Lecture by Stephan Diehl on Software Visualization
  (Note that the lecture is on Wednesday and starts at 9 AM in E1.3 Seminar room 013.)
19th July - [Lecture 12] Project presentations & Wrap-up.


Exercises will be held in the Zeichensaal, Mathematics Building every Tuesday from 14:00 to 16:00.

8th May - [Exercise 1] Introductory Data Analysis
15th May - CANCELLED
22nd May - [Exercise 2] Linear Regression
29th May - [Exercise 3] Dummy Variables and Stepwise Regression (+ Project Discussion)
5th June - [Exercise 5] Project meetings
12th June - [Exercise 6] Project meetings
19th June - [Exercise 7] Project meetings
26th June - [Exercise 8] Project meetings
10th July - [Exercise 9] Project meetings
17th July - [Exercise 10] Project meetings


  1. Foundations of Software Measurement, by Martin Shepperd, 1st edition, 1995.
  2. Software Metrics: A Rigorous and Practical Approach, by Norman E. Fenton and Shari L Pfleeger. 2nd edition, 1998
  3. Selected papers from software engineering as announced on course website.
Please do not purchase the books since they are meant for reference purposes only and are available in the library.


The final grade for this unit will be drawn as follows:
  1. 20% from Exercises (at least 50% points required to pass)
  2. 20% from the Group Project (the project needs to be completed and presented)
  3. 60% from the Exam (there will be only one final exam that must be passed)


You can subscribe to the lecture's calendar.


Please subscribe to the mailing list below. All important announcements will be posted here.
Visit this group

Impressum Datenschutzerklärung

<premraj@cs.uni-saarland.de> · http://www.st.cs.uni-saarland.de/edu/softmine2007/ · Stand: 2018-04-05 13:40