Clone detection using rolling hashing, suffix trees and dagification: a case study

Mikkel Jønsson Thomsen, Fritz Henglein

5 Citationer (Scopus)

Abstract

Microsoft Dynamics NAV is a widely used enterprise resource planning system for small and medium-sized enterprises that, by design, encourages rapid customization by copy-paste programming. We report the results of analyzing clone detection for NAV using two previously published methods and one new algorithmic method: character-based sliding window sampling using Rabin-Karp hashing (MOSS), line-based sequence matching using suffix trees (CodeDup), and abstract-syntax-tree based graph sharing analysis (XMLClone). The latter is piggybacked on XMLStore, which stores XML trees as directed acyclic graphs (dags) where all isomorphic subtrees are identified and coalesced into single nodes, which can be done in linear time using multiset discrimination. This dagification discovers all well-formed Type-1 and, with suitable input normalization, Type-2 clones. We find that the subsequent dag analysis to discover Type-3 clones performs well on NAV source code, both in terms of computational complexity and precision. This suggests that efficient dagification and independently configurable dag interpretation may be valuable ingredients for modular clone detection.

OriginalsprogEngelsk
Titel2012 6th International Workshop on Software Clones (IWSC)
Antal sider7
ForlagIEEE
Publikationsdato2012
Sider22-28
ISBN (Trykt)978-1-4673-1794-8
DOI
StatusUdgivet - 2012
Begivenhed6th International Workshop on Software Clones - Zürich, Schweiz
Varighed: 4 jun. 20124 jun. 2012
Konferencens nummer: 6

Konference

Konference6th International Workshop on Software Clones
Nummer6
Land/OmrådeSchweiz
ByZürich
Periode04/06/201204/06/2012

Emneord

  • XML
  • cryptography
  • directed graphs
  • enterprise resource planning
  • small-to-medium enterprises
  • trees (mathematics)
  • CodeDup
  • MOSS
  • Microsoft dynamics NAV
  • Rabin-Karp hashing
  • XML trees
  • XMLClone
  • abstract-syntax-tree
  • character-based sliding window sampling
  • copy-paste programming
  • dagification
  • directed acyclic graph
  • enterprise resource planning system
  • graph sharing analysis
  • isomorphic subtrees
  • line-based sequence matching
  • linear time
  • modular clone detection
  • multiset discrimination
  • rolling hashing
  • small-and-medium-sized enterprises
  • suffix trees
  • Abstracts
  • Calibration
  • Cloning
  • Plagiarism
  • Syntactics
  • Cod-eDup
  • ERP
  • MS Dynamics NAV
  • XMLStore
  • clone
  • code
  • detection
  • discrimination
  • hashing
  • multiset
  • similarity
  • suffix
  • tree
  • winnowing

Fingeraftryk

Dyk ned i forskningsemnerne om 'Clone detection using rolling hashing, suffix trees and dagification: a case study'. Sammen danner de et unikt fingeraftryk.

Citationsformater