![]() |
Eclipse Burst Data! |
Lehrstuhl für Softwaretechnik (Prof. Zeller) Universität des Saarlandes – Informatik Informatik Campus des Saarlandes Campus E9 1 (CISPA) 66123 Saarbrücken E-mail: zeller @ cs.uni-saarland.de Telefon: +49 681 302-70970 ![]() ![]() ![]() |
||
[ News | Download | FAQ | Usage | People ] We have mined the Eclipse version databases to compute so called change bursts. As we demonstrated in our experiments, change bursts can be used for defect prediction purposes. The dataset is publicly available for download and use. News
DownloadFrequently Answered Questions (FAQ)I cannot find the NumberOfDefects column in the data setTrue. There is no such column in the data set. References to NumberOfDefects must be replaced by references to column pre. This holds for the paper itself but also for the R-script referenced in the paper.What is this all about?We have published data that identifies change bursts for each component of Eclipse. Change bursts are a sequence of consecutive changes.For a more detailed and formal definition of change bursts, we refer to Section 2 of our paper Change Bursts as Defect Predictors. The paper is also included in the zipped files available above. ![]() Figure 1: How gap size and burst size determine change burst detection from a sequence of changes.
What can I do with this data?In our paper Change Bursts as Defect Predictors we showed that change bursts can be used to build very accurate defect prediction models.Where do I get these Eclipse versions?All the versions of Eclipse we analyzed (2.0, 2.1, and 3.0) can be accessed at the Eclipse project archived downloads site.Why do you share this data?Finding out where defects come from is a creative effort, and hence, better addressed by a community rather than individuals. This is why we want to share this data with the research community. To our knowledge, this is the first time change bursts were used to build defect prediction models.What is the copyright for this data?In general, facts are free (as in freedom), and are not copyrightable. Hence, users of this archive can use the factual information contained in the bug data archives without any restriction.How can I reference this work?If you publish something based on this data, we would be happy if you could attribute its source. Appropriate citation is our paper: Change Bursts as Defect Predictors.Nachiappan Nagappan, Andreas Zeller, Thomas Zimmermann, Kim Herzig, and Brendan Murphy: Change Bursts as Defect Predictors, Proceeding ISSRE '10 Proceedings of the 2010 IEEE 21st International Symposium on Software Reliability Engineering Pages 309-318.Here's the citation in BibTeX format: Also, we will be happy to hear about your results using our dataset and cite your papers on this page.@inproceedings{Nagappan:2010:CBD:1913797.1914387, author = {Nagappan, Nachiappan and Zeller, Andreas and Zimmermann, Thomas and Herzig, Kim and Murphy, Brendan}, title = {Change Bursts as Defect Predictors}, booktitle = {Proceedings of the 2010 IEEE 21st International Symposium on Software Reliability Engineering}, series = {ISSRE '10}, year = {2010}, isbn = {978-0-7695-4255-3}, pages = {309--318}, numpages = {10}, url = {http://dx.doi.org/10.1109/ISSRE.2010.25}, doi = {10.1109/ISSRE.2010.25}, acmid = {1914387}, publisher = {IEEE Computer Society}, address = {Washington, DC, USA}, } What's in the Data?What is in the packages?Unzip one of the archives. The resulting folder contains three subfolders:hourly , daily , weekly .
To compute change bursts, we split the development history into a series of events at which we would assume there could be some change or not. The sub-folders represent data sets for different definitions of these events:
On the next level you can choose the Eclipse versions we investigated: Eclipse 2.0, Eclipse 2.1 and Eclipse 3.0. Within each granularity folder you will find the actual change burst data sets. Each filename is of the form Eclipse<VERSION>_GAP<GAP_SIZE>_BURST<BURSTSIZE>.csv Thus, each filename is specific to an eclipse version <VERSION> , a specific gap size <GAP_SIZE> , and a specific burst size <BURST_SIZE> (for definitions of gap size and burst size, please read our paper). For each Eclipse version, we computed all combinations of burst and change sizes between 1 and 10.
The ZIP-package also contains a copy of our paper.
What is the data format?The provided CSV files contain change burst metrics for each java source file of the Eclipse project. Each column in CSV file represents one metric. A list of metrics including description is given in the paper.PeopleImpressum ● Datenschutzerklärung <webmaster@st.cs.uni-saarland.de> · http://www.st.cs.uni-saarland.de//softevo/burst-data/eclipse/ · Stand: 2018-04-05 13:41 |