About DBGBENCH


How do professional software engineers debug computer programs? In an experiment with 27 real bugs that existed in several widely used programs, we invited 12 professional software engineers, who together spent one month on localizing, explaining, and fixing these bugs. This did not only allow us to study the various tools and strategies used to debug the same set of errors. We could also determine exactly which statements a developer would localize as faults, how a developer would diagnose and explain an error, and how a developer would fix an error – all of which software engineering researchers seek to automate. Until now, it has been difficult to evaluate the effectiveness and utility of automated debugging techniques without a user study. We publish the collected data, called DBGBENCH, to facilitate the effective evaluation of automated fault localization, diagnosis, and repair techniques w.r.t. the judgement of human experts.

Poster Extended Abstract


Usage


DBGBENCH allows to evaluate novel automated debugging and patching techniques and assistants:
  • Evaluating Fault Localization Techniques: The human-generated fault locations can be used to evaluate automated fault localization techniques. We define as important those fault locations that are independently reported by at least 50% of participants. We suggest to measure the accuracy in finding at least one important fault location as localized by our participants.
  • Evaluating Bug Diagnosis Techniques: The human-generated explanations can be used to evaluate automated bug diagnosis techniques. We suggest to measure the accuracy in finding the pertinent variable values, function calls, events, or cause-effect chains mentioned in the aggregated human-generated bug diagnosis.
  • Evaluating Automated Repair Techniques: The examples of correct and incorrect patches can be used to evaluate automated repair and code review techniques. These high-level explanations serve as ground-truth to determine the correctness (not plausibility) of an auto-generated patch.
  • Evaluating the Effectiveness of Debugging Assistants: The time that our participants take to understand and patch each error can be used to measure how much faster developers can be if assisted with automated tools.

How to Cite


@inproceedings{dbgbench,
  author = {B\"{o}hme, Marcel and Soremekun, Ezekiel Olamide and Chattopadhyay, Sudipta and Ugherughe, Emamurho and Zeller, Andreas},
  title = {How Developers Debug Software -- The DBGBENCH Dataset},
  booktitle = {Proceedings of the 39th ACM SIGSOFT International Conference on Software Engineering 2017},
  series = {ICSE 2017},
  pages = {1-3},
  year = {2017}
}


Slides


Downloads


  • Download the benchmark summary containing the complete list of errors, their average debugging time, difficulty, and patch correctness, human-generated explanations of the runtime actions leading to the error, and examples of correct and incorrect fixes, sorted according to average debugging time.
  • Download the complete data containing for each debugging session (error, participant) the following data:
    • BUG ID & Participant ID
    • Provided Fault Locations, Bug diagnosis/Explanations & Patches
    • Patch Plausibility, Correctness and Category
    • Reasons for (In)correctness
    • Simpliied Bug Report
  • Download an example questionnaire.
  • Download the Docker virtual infrastructure.
  • Download the tutorial material, including slides, videos, and readme files.
  • Read our extended abstract or poster to find out more about DBGBENCH.