Computation of evolutionary change
Stanek, Edward Jason (2009) Computation of evolutionary change. PhD thesis, Iowa State University.
Full text available as:
A key issue in software evolution analysis is being able to compute evolutionary change accurately and with rich semantics. This dissertation describes a mathematical framework for enabling accurate computation of semantic evolutionary change. It is based on graphs for representing software semantics, graph transformations for modeling evolution, and effects of graph transformations for capturing evolutionary change. We formulate the notions of evolution sets and the evolution distance to measure evolutionary change. Then, we define an appropriate notion of optimal graph alignment to compute evolutionary change accurately. Establishing a rigorous foundation for computing evolutionary change is important for developing powerful automated tools for software evolution analysis. Cost estimation, software merging, reliability analysis, clone detection, incremental testing, validation and other software applications can benefit from precise computation of evolutionary change. A rigorous foundation also allows leveraging the extensive research on graph alignments to advance software engineering. We have created a framework for experimental evaluation of graph alignment algorithms. The framework includes a graph testbed, an accuracy metric, and a graph alignment visualization (GAV) mechanism. The framework is targeted at applications where a precise computation of evolutionary change from one system to the next is needed to reveal valuable knowledge about the system and its evolution. The accuracy metric is based on a new measure for graph difference and a new notion of optimality of graph alignment. The metric is designed to measure the degree to which an alignment is inaccurate, that is the degree to which it reports spurious differences. Such a metric is meaningful for estimating the efficiency of and resources necessary for many software evolution analyses. The Minimal Signature Tree (MST) is introduced as a new data structure that can be constructed for a graph G and can be used to design efficient heuristic graph analysis algorithms. The graphs are assumed to be attributed, directed, and acyclic. The MST is a rooted tree that stores the minimal signatures which are defined to be sequences of labels that describe special subgraphs in G. A minimal signature serves as an artifact to pinpoint a node of G. The MST generalizes the notion of suffix tree from strings to graphs. If the graph G is a string then the MST is homomorphic to the suffix tree. The MST exposes the internal structure of a graph and makes it possible to design efficient algorithms. A MST-based algorithm for differencing graphs G1 and G2 is presented. This algorithm involves constructing a combined MST, called the Co-MST, that stores minimal signatures that are common to G1 and G2. For each common minimal signature s, the node in G1 and the node in G2 pinpointed by s are aligned. The nodes not covered by common minimal signatures may be further aligned using an existing graph alignment technique. We present an experimental study which includes a comparison of a MST-based graph differencing algorithm with two other algorithms. The experiments involve large graphs extracted from Linux, synthetic graphs reaching ten thousand nodes, variations of connectivity and the number of distinct labels, and a priori known differences to verify the accuracy of differencing.
Archive Staff Only: edit this record