Bit-coded regular expression parsing

Lasse Nielsen, Fritz Henglein

10 Citations (Scopus)

Abstract

Regular expression parsing is the problem of producing a parse tree of a string for a given regular expression. We show that a compact bit representation of a parse tree can be produced efficiently, in time linear in the product of input string size and regular expression size, by simplifying the DFA-based parsing algorithm due to Dub ´e and Feeley to emit the bits of the bit representation without explicitly materializing the parse tree itself. We furthermore show that Frisch and Cardelli’s greedy regular expression parsing algorithm can be straightforwardly modified to produce bit codings directly. We implement both solutions as well as a backtracking parser and perform benchmark experiments to gauge their practical performance. We observe that our DFA-based solution can be significantly more time and space efficient than the Frisch-Cardelli algorithm due to its sharing of DFA- nodes, but that the latter may still perform better on regular expressions that are “more deterministic” from the right than the left. (Backtracking is, unsurprisingly, quite hopeless.)
Original languageEnglish
Title of host publicationLanguage and Automata Theory and Applications : 5th International Conference, LATA 2011, Tarragona, Spain, May 26-31, 2011. Proceedings
EditorsAdrian-Horia Dediu, Shunsuke Inenaga, Carlos Martín-Vide
Number of pages12
PublisherSpringer
Publication date2011
Pages402-413
ISBN (Print)978-3-642-21253-6
ISBN (Electronic)978-3-642-21254-3
DOIs
Publication statusPublished - 2011
Event5th International Conference on Language and Automata Theory and Applications - Tarragona, Spain
Duration: 26 May 201131 May 2011
Conference number: 5

Conference

Conference5th International Conference on Language and Automata Theory and Applications
Number5
Country/TerritorySpain
CityTarragona
Period26/05/201131/05/2011
SeriesLecture notes in computer science
Volume6638
ISSN0302-9743

Fingerprint

Dive into the research topics of 'Bit-coded regular expression parsing'. Together they form a unique fingerprint.

Cite this