Software Renovation by Reverse Engineering Source Code

Currently, software renovation is a hot topic, because our society is depending more and more on aging software. By renovating software we refresh aging software to better match the current technical and business environment. In his thesis CWI PhD candidate Davy Landman studied the automation of software renovation by using Reverse Engineering.

Publication date: 02-10-2017

Currently, software renovation is a hot topic, because our society is depending more and more on aging software. By renovating software we refresh aging software to better match the current technical and business environment. In his thesis CWI PhD candidate Davy Landman studied the automation of software renovation by using Reverse Engineering.

First the essential concepts and abstractions used in the software are  reverse engineered from the source code and these are used during renovation. Scaling this reverse engineering to large software systems requires automated analysis. Automated analysis  unfortunately  comes at the cost of over-approximation or under-approximation (over-estimating or under-estimating what is relevant). Davy Landman of the CWI SWAT group explored the opportunities and limitations that this poses.

In his thesis Davy Landman explored the limits of domain model recovery by manually recovering domain models. Comparing these models to a manually constructed reference domain model he found that most domain information could be recovered - with high quality - from the source code. This suggests a bright future for automated reverse engineering of domain model from source code.

He also explored using both, the two common source code metrics, Cyclomatic Complexity (CC) and Source Lines of Code (SLOC) for automating reverse engineering. Almost all of the existing literature claims a strong linear correlation between these two metrics. This is often interpreted as indication that CC and SLOC are redundant to each other. Contrary to existing literature, in two large corpora Landman did not observe a strong correlation. This is interpreted as a lack of evidence for CC being redundant to SLOC.  This supports the continued use of these two metrics next to each other.

Finally, the limits of statically analyzing the Reflection API of the Java programming language were studied. Analyzing a representative corpus revealed that 78% of all projects use Reflection. After identifying the common assumptions and limitations of relevant static analysis tools he found them widely challenged in the corpus. Therefore new opportunities for static analysis tools are proposed that can significantly improve the coverage of real Java programs.

Davy Landman has used empirical studies to both answer open questions and identify new opportunities in reverse engineering research and practice.

More information:

Link to thesis

SWAT group

Davy Landman

Davy  Landman will defend his thesis Thursday, October 5 at 10 A.M. ath the Agnietenkapel, Oudezijds Voorburgwal 229, Amsterdam. His thesis was supervised by Paul Klint and Jurgen Vinju.