AUTOGRAM is a novel practical method that, given a set of program runs with inputs, automatically produces a
As an example, consider the Java URL class, parsing a URL into its constituents. Given the class and three inputs, our AUTOGRAM prototype automatically produces the grammar shown below, which pretty accurately reflects the structure of the URLs processed.
Note that the grammar shown above is the raw output of the tool. The high readability is achieved by having AUTOGRAM derive the names of nonterminals such as URL or HOST directly from the program variables and functions that process the respective input parts. The more different inputs are processed, the more the grammar generalizes.
With high accuracy and readability, AUTOGRAM results in grammars that
- give humans immediate and detailed insights into the structure of inputs, thereby facilitating
reverse engineeringof input formats as well as manually writing valid test inputs.
- can immediately be used by
test generatorsto produce high numbers of varied and valid inputs, thus facilitating automated robustness testing and fuzzing.
- vastly simplify the creation of parsing programs that decompose existing inputs into their constituents.
To the best of our knowledge, AUTOGRAM is the first to
AUTOGRAM will be presented at the Automated Software Engineering Conference 2016 in Singapore. A preprint of the paper is available.
To get an explanation how AUTOGRAM is learning grammars, check out our video below or on YouTube:
To see a demonstration of AUTOGRAM, check out our video below or on YouTube:
- Download the preprint of the AUTOGRAM paper (to appear at ASE 2016).
- Data sets will be made available soon.
- Access to AUTOGRAM is available for research and evaluation purposes upon request.