Learning from 6,000 projects: Lightweight cross-project anomaly detection - ISSTA 2010
by Natalie Gruska, Andrzej Wasylkowski, Andreas Zeller

ISSTA 2010: Proceedings of the 19th international symposium on Software testing and analysis, Pages 119-130, ACM, New York, NY, July 2010.

Digital Library via DOI: 10.1145/1831708.1831723 - Local copy: Download as PDF file.

Abstract

Real production code contains lots of knowledge---on the domain, on the architecture, and on the environment. How can we leverage this knowledge in new projects? Using a novel lightweight source code parser, we have mined more than 6,000 open source Linux projects (totaling 200,000,000 lines of code) to obtain 16,000,000 temporal properties reflecting normal interface usage. New projects can be checked against these rules to detect anomalies---that is, code that deviates from the wisdom of the crowds. In a sample of 20 projects, 25% of the top-ranked anomalies uncovered actual code smells or defects.

BibTeX Entry

@inproceedings{gruska-issta-2010,
    title = "Learning from 6,000 projects: Lightweight cross-project anomaly detection",
    author = "Natalie Gruska and Andrzej Wasylkowski and Andreas Zeller",
    year = "2010",
    month = jul,
    address = "New York, NY",
    booktitle = "ISSTA 2010: Proceedings of the 19th international symposium on Software testing and analysis",
    location = "Trento, Italy",
    pages = "119--130",
    publisher = "ACM",
    doi = "10.1145/1831708.1831723",
}

Show all publications of the Software Engineering Chair.