Two-pass greedy regular expression parsing

Niels Bjørn Bugge Grathwohl, Fritz Henglein, Lasse Nielsen, Ulrik Terp Rasmussen

6 Citations (Scopus)

Abstract

We present new algorithms for producing greedy parses for regular expressions (REs) in a semi-streaming fashion. Our lean-log algorithm executes in time O(mn) for REs of size m and input strings of size n and outputs a compact bit-coded parse tree representation. It improves on previous algorithms by: operating in only 2 passes; using only O(m) words of random-access memory (independent of n); requiring only k n bits of sequentially written and read log storage, where k < 1/3m is the number of alternatives and Kleene stars in the RE; processing the input string as a symbol stream and not requiring it to be stored at all. Previous RE parsing algorithms do not scale linearly with input size, or require substantially more log storage and employ 3 passes where the first consists of reversing the input, or do not or are not known to produce a greedy parse. The performance of our unoptimized C-based prototype indicates that our lean-log algorithm has also in practice superior performance and is surprisingly competitive with RE tools not performing full parsing, such as Grep.

Original languageEnglish
Title of host publicationImplementation and Application of Automata : 18th International Conference, CIAA 2013, Halifax, NS, Canada, July 16-19, 2013. Proceedings
EditorsStavros Konstantinidis
Number of pages12
PublisherSpringer
Publication date2013
Pages60-71
ISBN (Print)978-3-642-39273-3
ISBN (Electronic)978-3-642-39274-0
DOIs
Publication statusPublished - 2013
Event18th International Conference on Implementation and Application of Automata - Halifax, Canada
Duration: 16 Jul 201319 Jul 2013
Conference number: 18

Conference

Conference18th International Conference on Implementation and Application of Automata
Number18
Country/TerritoryCanada
CityHalifax
Period16/07/201319/07/2013

Fingerprint

Dive into the research topics of 'Two-pass greedy regular expression parsing'. Together they form a unique fingerprint.

Cite this