Abstract
We present new algorithms for producing greedy parses for regular expressions (REs) in a semi-streaming fashion. Our lean-log algorithm executes in time O(mn) for REs of size m and input strings of size n and outputs a compact bit-coded parse tree representation. It improves on previous algorithms by: operating in only 2 passes; using only O(m) words of random-access memory (independent of n); requiring only k n bits of sequentially written and read log storage, where k < 1/3m is the number of alternatives and Kleene stars in the RE; processing the input string as a symbol stream and not requiring it to be stored at all. Previous RE parsing algorithms do not scale linearly with input size, or require substantially more log storage and employ 3 passes where the first consists of reversing the input, or do not or are not known to produce a greedy parse. The performance of our unoptimized C-based prototype indicates that our lean-log algorithm has also in practice superior performance and is surprisingly competitive with RE tools not performing full parsing, such as Grep.
Original language | English |
---|---|
Title of host publication | Implementation and Application of Automata : 18th International Conference, CIAA 2013, Halifax, NS, Canada, July 16-19, 2013. Proceedings |
Editors | Stavros Konstantinidis |
Number of pages | 12 |
Publisher | Springer |
Publication date | 2013 |
Pages | 60-71 |
ISBN (Print) | 978-3-642-39273-3 |
ISBN (Electronic) | 978-3-642-39274-0 |
DOIs | |
Publication status | Published - 2013 |
Event | 18th International Conference on Implementation and Application of Automata - Halifax, Canada Duration: 16 Jul 2013 → 19 Jul 2013 Conference number: 18 |
Conference
Conference | 18th International Conference on Implementation and Application of Automata |
---|---|
Number | 18 |
Country/Territory | Canada |
City | Halifax |
Period | 16/07/2013 → 19/07/2013 |