Clone detection using rolling hashing, suffix trees and dagification: a case study

Mikkel Jønsson Thomsen, Fritz Henglein

5 Citations (Scopus)

Abstract

Microsoft Dynamics NAV is a widely used enterprise resource planning system for small and medium-sized enterprises that, by design, encourages rapid customization by copy-paste programming. We report the results of analyzing clone detection for NAV using two previously published methods and one new algorithmic method: character-based sliding window sampling using Rabin-Karp hashing (MOSS), line-based sequence matching using suffix trees (CodeDup), and abstract-syntax-tree based graph sharing analysis (XMLClone). The latter is piggybacked on XMLStore, which stores XML trees as directed acyclic graphs (dags) where all isomorphic subtrees are identified and coalesced into single nodes, which can be done in linear time using multiset discrimination. This dagification discovers all well-formed Type-1 and, with suitable input normalization, Type-2 clones. We find that the subsequent dag analysis to discover Type-3 clones performs well on NAV source code, both in terms of computational complexity and precision. This suggests that efficient dagification and independently configurable dag interpretation may be valuable ingredients for modular clone detection.

Original languageEnglish
Title of host publication2012 6th International Workshop on Software Clones (IWSC)
Number of pages7
PublisherIEEE
Publication date2012
Pages22-28
ISBN (Print)978-1-4673-1794-8
DOIs
Publication statusPublished - 2012
Event6th International Workshop on Software Clones - Zürich, Switzerland
Duration: 4 Jun 20124 Jun 2012
Conference number: 6

Conference

Conference6th International Workshop on Software Clones
Number6
Country/TerritorySwitzerland
CityZürich
Period04/06/201204/06/2012

Fingerprint

Dive into the research topics of 'Clone detection using rolling hashing, suffix trees and dagification: a case study'. Together they form a unique fingerprint.

Cite this