Bit-coded regular expression parsing

Lasse Nielsen, Fritz Henglein

10 Citationer (Scopus)

Abstract

Regular expression parsing is the problem of producing a
parse tree of a string for a given regular expression. We show that a
compact bit representation of a parse tree can be produced efficiently,
in time linear in the product of input string size and regular expression
size, by simplifying the DFA-based parsing algorithm due to Dub´e and
Feeley to emit the bits of the bit representation without explicitly materializing
the parse tree itself. We furthermore show that Frisch and
Cardelli’s greedy regular expression parsing algorithm can be straightforwardly
modified to produce bit codings directly. We implement both
solutions as well as a backtracking parser and perform benchmark experiments
to gauge their practical performance. We observe that our
DFA-based solution can be significantly more time and space efficient
than the Frisch-Cardelli algorithm due to its sharing of DFA-nodes, but
that the latter may still perform better on regular expressions that are
“more deterministic” from the right than the left. (Backtracking is, unsurprisingly,
quite hopeless.)
OriginalsprogEngelsk
TitelLanguage and Automata Theory and Applications : 5th International Conference, LATA 2011, Tarragona, Spain, May 26-31, 2011. Proceedings
RedaktørerAdrian-Horia Dediu, Shunsuke Inenaga, Carlos Martín-Vide
Antal sider12
ForlagSpringer
Publikationsdato2011
Sider402-413
ISBN (Trykt)978-3-642-21253-6
ISBN (Elektronisk)978-3-642-21254-3
DOI
StatusUdgivet - 2011
Begivenhed5th International Conference on Language and Automata Theory and Applications - Tarragona, Spanien
Varighed: 26 maj 201131 maj 2011
Konferencens nummer: 5

Konference

Konference5th International Conference on Language and Automata Theory and Applications
Nummer5
Land/OmrådeSpanien
ByTarragona
Periode26/05/201131/05/2011
NavnLecture notes in computer science
Vol/bind6638
ISSN0302-9743

Fingeraftryk

Dyk ned i forskningsemnerne om 'Bit-coded regular expression parsing'. Sammen danner de et unikt fingeraftryk.

Citationsformater