Abstract
We present new algorithms for producing greedy parses for regular
expressions (REs) in a semi-streaming fashion. Our lean-log
algorithm executes in time O(mn) for REs of size m and input
strings of size n and outputs a compact bit-coded parse tree
representation. It improves on previous algorithms by:
operating in only 2 passes; using only O(m) words of
random-access memory (independent of n); requiring only kn
bits of sequentially written and read log storage, where k <
1/3 m is the number of alternatives and Kleene stars in the
RE; processing the input string as a symbol stream and not
requiring it to be stored at all. Previous RE parsing
algorithms do not scale linearly with input size, or require
substantially more log storage and employ 3 passes where the
first consists of reversing the input, or do not or are not
known to produce a greedy parse. The performance of our
unoptimized C-based prototype indicates that the superior
performance of our lean-log algorithm can also be observed in
practice; it is also surprisingly competitive with RE tools
not performing full parsing, such as Grep.
expressions (REs) in a semi-streaming fashion. Our lean-log
algorithm executes in time O(mn) for REs of size m and input
strings of size n and outputs a compact bit-coded parse tree
representation. It improves on previous algorithms by:
operating in only 2 passes; using only O(m) words of
random-access memory (independent of n); requiring only kn
bits of sequentially written and read log storage, where k <
1/3 m is the number of alternatives and Kleene stars in the
RE; processing the input string as a symbol stream and not
requiring it to be stored at all. Previous RE parsing
algorithms do not scale linearly with input size, or require
substantially more log storage and employ 3 passes where the
first consists of reversing the input, or do not or are not
known to produce a greedy parse. The performance of our
unoptimized C-based prototype indicates that the superior
performance of our lean-log algorithm can also be observed in
practice; it is also surprisingly competitive with RE tools
not performing full parsing, such as Grep.
Originalsprog | Engelsk |
---|---|
Titel | Implementation and Application of Automata : 18th International Conference, CIAA 2013, Halifax, NS, Canada, July 16-19, 2013. Proceedings |
Redaktører | Stavros Konstantinidis |
Antal sider | 12 |
Forlag | Springer |
Publikationsdato | 2013 |
Sider | 60-71 |
ISBN (Trykt) | 978-3-642-39273-3 |
ISBN (Elektronisk) | 978-3-642-39274-0 |
DOI | |
Status | Udgivet - 2013 |
Begivenhed | 18th International Conference on Implementation and Application of Automata - Halifax, Canada Varighed: 16 jul. 2013 → 19 jul. 2013 Konferencens nummer: 18 |
Konference
Konference | 18th International Conference on Implementation and Application of Automata |
---|---|
Nummer | 18 |
Land/Område | Canada |
By | Halifax |
Periode | 16/07/2013 → 19/07/2013 |