ISU Electrical and Computer Engineering Archives

Computation of evolutionary change

Stanek, Edward Jason (2009) Computation of evolutionary change. PhD thesis, Iowa State University.

Full text available as:

PDF - Requires Adobe Acrobat Reader or other PDF viewer.

Abstract

A key issue in software evolution analysis is being able to compute evolutionary change accurately and with rich semantics. This dissertation describes a mathematical framework for enabling accurate computation of semantic evolutionary change. It is based on graphs for representing software semantics, graph transformations for modeling evolution, and effects of graph transformations for capturing evolutionary change. We formulate the notions of evolution sets and the evolution distance to measure evolutionary change. Then, we define an appropriate notion of optimal graph alignment to compute evolutionary change accurately. Establishing a rigorous foundation for computing evolutionary change is important for developing powerful automated tools for software evolution analysis. Cost estimation, software merging, reliability analysis, clone detection, incremental testing, validation and other software applications can benefit from precise computation of evolutionary change. A rigorous foundation also allows leveraging the extensive research on graph alignments to advance software engineering. We have created a framework for experimental evaluation of graph alignment algorithms. The framework includes a graph testbed, an accuracy metric, and a graph alignment visualization (GAV) mechanism. The framework is targeted at applications where a precise computation of evolutionary change from one system to the next is needed to reveal valuable knowledge about the system and its evolution. The accuracy metric is based on a new measure for graph difference and a new notion of optimality of graph alignment. The metric is designed to measure the degree to which an alignment is inaccurate, that is the degree to which it reports spurious differences. Such a metric is meaningful for estimating the efficiency of and resources necessary for many software evolution analyses. The Minimal Signature Tree (MST) is introduced as a new data structure that can be constructed for a graph G and can be used to design efficient heuristic graph analysis algorithms. The graphs are assumed to be attributed, directed, and acyclic. The MST is a rooted tree that stores the minimal signatures which are defined to be sequences of labels that describe special subgraphs in G. A minimal signature serves as an artifact to pinpoint a node of G. The MST generalizes the notion of suffix tree from strings to graphs. If the graph G is a string then the MST is homomorphic to the suffix tree. The MST exposes the internal structure of a graph and makes it possible to design efficient algorithms. A MST-based algorithm for differencing graphs G1 and G2 is presented. This algorithm involves constructing a combined MST, called the Co-MST, that stores minimal signatures that are common to G1 and G2. For each common minimal signature s, the node in G1 and the node in G2 pinpointed by s are aligned. The nodes not covered by common minimal signatures may be further aligned using an existing graph alignment technique. We present an experimental study which includes a comparison of a MST-based graph differencing algorithm with two other algorithms. The experiments involve large graphs extracted from Linux, synthetic graphs reaching ten thousand nodes, variations of connectivity and the number of distinct labels, and a priori known differences to verify the accuracy of differencing.

EPrint Type:Thesis (PhD)
Uncontrolled Keywords:Evolution Distance, Evolutionary Change, Effect of Transformation, Graph Theory, Graph Alignment, Graph Difference, Semantic Difference, Suffix Tree, Minimal Signature, Minimal Signature Tree, Algorithm, Systems Engineering, Software Engineering, Software Evolution Analysis, Software Metrics, Cost Estimation, Software Merging, Clone Detection, Validation, Testing, Reliability
Subjects:Electrical Engineering > COMMUNICATION & SIGNAL PROCESSING > Bioinformatics
Computer Engineering > SOFTWARE SYSTEMS > Computational Biology and Computational Science
Research Excellence Awards
Computer Engineering > SOFTWARE SYSTEMS > Software Engineering
ID Code:493
Identification Number:Identification Number UNSPECIFIED
Deposited By:Jason Stanek
Deposited On:24 April 2009

Archive Staff Only: edit this record