Program Analysis for Software Tools and Engineering (PASTE 2004)

ACM SIGPLAN-SIGSOFT Workshop on
Program Analysis for Software Tools and Engineering
(PASTE 2004)

to be held June 7-8, 2004
Washington, DC
co-located with PLDI

SIGPLAN

New: Slides for invited talks and 5-Minute-Madnesses are now available.

Program

Monday, June 7
9:00—9:15	Welcome
9:15—10:15	Talk: Gary Leavens
10:30—11:30	Static Analysis
11:30—12:30	5-Minute Madness
2:00—3:00	Evaluation
3:15—4:15	Dynamic Analysis
4:30—5:30	5-Minute Madness
6:00—7:00	Talk: Michael Ernst

Tuesday, June 8
9:00—10:00	Talk: Patrice Godefroid
10:15—11:15	Pointer Analysis
11:30—12:30	Testing and Coverage
12:30—1:00	Final Discussion

Monday, June 7

9:00 Welcome

Cormac Flanagan and Andreas Zeller, Workshop Co-Chairs

9:15 Invited Talk: JML Framed!

Gary T. Leavens, Iowa State University

10:15 Break

10:30 Static Analysis

Session chair: Cormac Flanagan

Resolving and Applying Constraint Queries on Context-Sensitive Analyses, James Ezick, Cornell University
Validation of Assembler Programs for DSPs: A Static Analyzer, Matthieu Martel, Commissariat à l’Energie Atomique

12:30 Lunch Break

2:00 Evaluation

Session chair: Andreas Zeller

Evaluating the Imprecision of Static Analysis, Atanas Rountev and Scott Kagan and Michael Gibas, Ohio State University
An Improved Slicer for Java, Christian Hammer and Gregor Snelting, Universität Passau

3:00 Break

3:15 Dynamic Analysis

Session chair: Michael Ernst

Dynamically Inferring Temporal Properties, Jinlin Yang and David Evans, University of Virginia
The Design and Implementation of FIT: a Flexible Instrumentation Tool, Bruno De Bus and Dominique Chanet and Bjorn de Sutter and Ludo Van Put and Koen de Bosschere, Ghent University

4:15 Break

5:30 Break

6:00 Invited Talk: Static and dynamic analysis: synergy and duality

Michael Ernst, Massachusetts Institute of Technology

7:00 End

Tuesday, June 8

9:00 Invited Talk: "Model Checking" Software with VeriSoft

Patrice Godefroid, Bell Labs

10:00 Break

10:15 Pointer Analysis

Session chair: Dan Grossman

Efficient field-sensitive pointer analysis for C, David Pearce and Paul Kelly and Chris Hankin, Imperal College, London
Impact of Heap-object Specialization in Pointer Analysis, Erik Nystrom and Hong-Seok Kim and Wen-mei Hwu, University of Illinois at Urbana-Champaign

11:15 Break

11:30 Testing and Coverage

Session chair: N.N.

Automatic mock object creation for test factoring, David Saff and Michael Ernst, MIT Computer Science & Artificial Intelligence Lab
Elided Conditionals, Manos Renieris and Sebastien Chan-Tin and Steven Reiss, Brown University

12:30 Final Discussion + Closing

1:00 End

Invited Talks

JML Framed! [PDF]

Gary T. Leavens, Iowa State University

This talk will try to frame the Java Modeling Language (JML) in two senses. The first is the sense of placing JML in the context of other specification languages and tools. This context will be provided by giving a brief introduction to JML, with small examples. The different tools that work with JML also help provide context for the language design in a different way. Another aspect of this context is provided by comparing JML with other design by contract languages, such as Eiffel.
The second sense of framing in JML is the more technical one of frame axioms, also known as modifies clauses. Such frame axioms are critical for reasoning, and are difficult to check at runtime; hence static analysis is a crucial need. We will discuss the semantics of JML's frame axioms, including datagroups, both in a naive sense and with the experimental Universe type system. Problems in the semantics and checking, as well as some recent work will be presented.
Note that some of the recent work on frame axioms in JML, especially extensions with the Universe type system, is joint work with Peter Mueller (of ETH Zürich) and Arnd Poetzsch-Heffter of (U. Kaiserslautern).

Static and dynamic analysis: synergy and duality [PDF]

Michael Ernst, Massachusetts Institute of Technology

This talk presents two sets of observations relating static and dynamic analysis. The first concerns synergies between static and dynamic analysis. Wherever one is utilized, the other may also be applied, often in a complementary way, and existing analyses should inspire different approaches to the same problem. Furthermore, existing static and dynamic analyses often have very similar structure and technical approaches. The second observation is that some static and dynamic approaches are similar in that each focuses on, and generalizes from, a subset of all possible executions.
The talk concludes with a challenge to researchers to develop new analyses that complement existing ones. More importantly, researchers need to erase the boundaries between static and dynamic analysis and create unified analyses that can operate in either mode, or in a mode that blends the strengths of both approaches.

"Model Checking" Software with VeriSoft [PDF]

Patrice Godefroid, Bell Labs

VeriSoft is a tool for systematically testing concurrent reactive software systems. It explores the state space (dynamic behavior) of a system by driving and observing the execution of its components using a run-time scheduler, and by reinitializing their execution. This systematic state-space exploration is performed using model-checking algorithms and makes heavy use of so-called "partial-order reduction" techniques. By default, VeriSoft searches state spaces for violations of user-specified assertions and coordination problems (deadlocks, crashes, etc.) between concurrent components. With its first prototype developed in 1996 (published at POPL'97), VeriSoft is the first software model checker for general-purpose programming languages like C and C++.
Since made publicly available in 1999, VeriSoft has been licensed to hundreds of users in industry and academia. Inside Lucent Technologies, it was applied successfully to analyze several software products in various business units and application domains (switch maintenance, call processing, network management, etc.). Because VeriSoft can automatically generate, execute and evaluate thousands of tests per minute, it can quickly reveal behaviors that are virtually impossible to detect using conventional testing techniques.
In this talk, I will present VeriSoft, what it does, how it works, industrial applications, strengths and limitations, technology-transfer issues, and discuss the current status of this project as well as related work and future work.

5-Minute Madnesses

Here are the slides from the 5-Minute Madnesses, as handed in by their presenters:

An archive of all slides is also available.

Every participant was allowed to give a quick presentation of his current work or current ideas. Each presenter was allocated a slot of five minutes for her presentation, including a two-minute discussion. As the title suggests, this is shere madness — but a starting point for connecting during the workshop.

Here are some do's and don'ts to make your presentation enjoyable to the audience:

Do bring one single overhead slide that summarizes your work.
Empty overhead slides and pens are also available at the workshop.
Do not attempt to present using a laptop; these take too long to set up.
Do not attempt to squeeze a full presentation into three minutes — nobody will get it.
Do just give your name and the basic idea (emphasizing: "basic". Did we say "basic"? Keep in mind: "basic")
Do not exceed your talk time; talk limits will be strictly enforced (by turning off the projector, if need be).
Do take your time; three minutes is actually quite a lot for just the basic idea.

Please leave your slide with the PASTE chairs; we plan to scan them and make them available as minutes of the discussion.

Technical Papers

These are the papers selected by the PASTE program committee. All submissions underwent a rigorous reviewing process. Each paper was reviewed by three program committee members and was discussed at the program committee phone meeting. The process turned out to be highly selective: Only 10 out of 37 submitted papers were accepted.

The presentations of technical papers will be 20 minutes long, followed by 10 minutes of discussion.

Resolving and Applying Constraint Queries on Context-Sensitive Analyses

James Ezick, Cornell University

A context-sensitive analysis is an analysis in which program elements are assigned sets of properties that depend upon the context in which they occur. For analyses on imperative languages, this often refers to considering the behavior of elements in a called procedure with respect to the call-stack that generated the procedure invocation. Algorithms for performing or approximating these types of analyses make up the core of interprocedural program analysis and are pervasive; having applications in program comprehension, optimization, and verification. However, for many of these applications what is of interest is the solution to the dual problem: given a vertex and a desirable set of properties, what is the set of potential stack-contexts leading to that vertex that results in the desirable property set? Many techniques, such as procedure cloning, have been developed to approximately partition the set of stack-contexts leading to a vertex according to such a condition. This paper introduces a broad generalization of this problem referred to as a constraint query on the analysis. This generalization allows sophisticated constraints to be placed on both the desirable property set as well as the set of interesting stack-contexts. From these constraints, a novel technique based on manipulating regular languages is introduced that efficiently produces a concise representation of the exact set of stack-contexts solving this dual problem subject to the constraints. This technique is applied to a pair of prescient software engineering challenges — resolving program comprehension queries over aggregate collections of properties and statically modifying code to enforce a safety policy decidable by the analysis. Practical examples of both applications are presented along with empirical results.

Validation of Assembler Programs for DSPs: A Static Analyzer

Matthieu Martel, Commissariat à l’Energie Atomique

Digital Signal Processors are widely used in critical embedded systems to pilot low-level, often critical functionalities such as driving a device or processing numerical data. In this article we describe a static analyzer based on abstract interpretation and designed to validate industrial assembler programs for a DSP. The validation consists of guarantying the absence of runtime errors like incorrect memory accesses, type errors, etc. and of tracking the sources of inaccuracies introduced by floating point computations. Our first contribution is a new static analysis for relocatable assembler programs able to cope with dynamically computed branching addresses and recursive procedures. Our second contribution is the analyzer itself, which enables to understand the numerical inaccuracies by means of a graphical interface.

Evaluating the Imprecision of Static Analysis

Atanas Rountev and Scott Kagan and Michael Gibas, Ohio State University

This work discusses traditional approaches for evaluating the imprecision of static analysis, and proposes two alternative techniques that provide better insights into the quality of the analysis solution. We also describe our initial experience with these techniques.

An Improved Slicer for Java

Christian Hammer and Gregor Snelting, Universität Passau

We present an improved slicing algorithm for Java. The best algorithm known so far is not always precise if nested objects are used as actual parameters. The new algorithm presented in this paper always generates correct and precise slices, but is more expensive in general.
We describe the algorithms and their treatment of objects as parameters. In particular, we present a new, safe criterion for termination of unfolding nested parameter objects. We then compare the two algorithms by providing measurements for a benchmark of Java and JavaCard programs.

Dynamically Inferring Temporal Properties

Jinlin Yang and David Evans, University of Virginia

Model checking requires a specification of the target systemÕs desirable properties, some of which are temporal. Formulating a temporal property of the system based on either its abstract model or implementation requires a deep understanding of its behavior and sophisticated knowledge of the chosen formalism. We propose a dynamic approach to automatically infer a programÕs temporal properties based on a set of pre-defined property patterns. We describe a preliminary implementation of this approach, and report results from using it to discover interesting temporal properties of a small program.

The Design and Implementation of FIT: a Flexible Instrumentation Tool

Bruno De Bus and Dominique Chanet and Bjorn de Sutter and Ludo Van Put and Koen de Bosschere, Ghent University

This paper presents FIT, a Flexible open-source binary code Instrumentation Toolkit. Unlike existing tools, FIT is truly portable, with existing backends for the Alpha, x86 and ARM architectures and the Tru64Unix, Linux and ARM Firmware execution environments. This paper focuses on some of the problems that needed to be addressed for providing this degree of portability. It also discusses the trade-off between instrumentation precision and low overhead.

Efficient field-sensitive pointer analysis for C

David Pearce and Paul Kelly and Chris Hankin, Imperal College, London

The subject of this paper is flow- and context-insensitive pointer analysis. We present a novel approach for precisely modelling struct variables and indirect function calls. Our method emphasises efficiency and simplicity and consists of an extension to the existing system of set-constraints. Furthermore, we evaluate the trade-off of time versus precision of using our method, versus a less accurate analysis. Our benchmark suite consists of 7 common C programs ranging in size from 5000 to 150,000 lines of code. Our results indicate the field-sensitive analysis is more expensive to compute, but yields significantly better precision.

Impact of Heap-object Specialization in Pointer Analysis

Erik Nystrom and Hong-Seok Kim and Wen-mei Hwu, University of Illinois at Urbana-Champaign

Knowledge of memory activity is critical in any form of program understanding. For this reason, pointer analysis has become an important component of many software-engineering tools. When such tools target developer guidance, accuracy can make a big difference in their effectiveness by reducing the number of false positives.
Context sensitivity is one factor that greatly affects the accuracy of pointer analysis. The treatment of heap objects is another important factor. Ideally, heap objects would be specialized only when it is beneficial to do so. However, it is not initially clear how much specialization needs to occur to get the bulk of precision benefit nor is it clear how large the deleterious effects are from over specialization. We believe that finding a good control mechanism is critical to make the analysis time more predictable in the presence of heap specialization especially when analyzing large programs.
This paper presents and initial investigation into heap specialization with the intent and determining the ease of reaching an analysis ``sweet-spot''. Specifically, it presents a metric for evaluation of heap specialization and empirically evaluates the benefits of varying amounts of heap specialization.

Automatic mock object creation for test factoring

David Saff and Michael Ernst, MIT Computer Science & Artificial Intelligence Lab

Mock objects are a tool for writing unit tests. If a developer wishes to test an object A, which is designed to issue queries against or mutate another object B, the actual implementation of B can be replaced by a mock. The mock object has two purposes: it checks that A's calls to B are as expected, and it simulates B's behavior in response.
A mock for B is useful for writing tests for A before B is developed, for isolating bugs in A from bugs in B, and, if B is slow or expensive, for improving test performance or cost. However, a developer may create or inherit a unit or system test suite that has not been designed with mocks. Given a system test for A and B, we would like to automatically generate unit tests for A in which B is mocked. This automates one form of test factoring, the restructuring of test suites by augmenting slow system-wide tests with fast, focused tests.
We present here a summary of the challenges in automatic mock creation, and a proposal for addressing them.

Elided Conditionals

Manos Renieris and Sebastien Chan-Tin and Steven Reiss, Brown University

Many software testing and automated debugging tools rely on structural coverage techniques. Implicitly, such tools assume a relation between individual control-flow choices made during a program run and the outcome of the run. In this paper, we show experimentally that control-flow choices, viewed in isolation, often have no impact on the outcome of the program. We call such choices elided. Our experimental evidence comes from three programs of varying complexity. We discuss our implementation techniques, the impact of our findings for dynamic analyses.

PASTE 2004 is sponsored by...

SIGPLAN

Association for
Computing Machinery (ACM) ACM Special Interest Group on
Programming Languages (SIGPLAN) ACM Special Interest Group on
Software Engineering (SIGSOFT)

PASTE 2004 is supported by...