Ordered context-free grammars
Context-free grammars (CFGs) provide an intuitive and powerful formalism for describing the syntactic structure of parsable input streams. Unfortunately, existing online parsing algorithms for such streams admit only a subset of possible CFG descriptions. Theoretically, it is possible to parse any deterministic context-free language (CFL) in a single pass, as long as the grammar describing the CFL belongs to the LR(k), k =/> 1 subset of CFGs. However, obtaining a suitable LR(k) description for a language is not an easy task — especially when k = 1 — and usually entails an increase in complexity of the rewritten CFG. More importantly, such rewriting inevitably obfuscates the syntactic structure of the language and complicates the placement of semantic bindings. Instead of searching for yet another subclass of CFGs amenable to parsing, we propose to augment the definition of the CFG itself by allowing associativity and precedence to be specified for each production in the grammar. We call the resulting formalism an ordered context-free grammar (OCFG). Compared to ordinary context-free grammars, OCFGs can often reduce the number of distinct derivation trees for a given sentence in a CFL; those parse trees that remain can be arranged into a strict partial order. These characteristics make it very easy to craft unambiguous descriptions for context-free languages using OCFGs. At the same time, parsers constructed from OCFGs rely on deterministic pushdown automata and are structurally identical to their CFG counterparts. For the well-known LR(k) and LALR(k) subsets of CFGs, we define analogous subsets of OCFGs, called LRP(k) and LALRP(fc), and illustrate how these may be used for defining programming language constructs much more succinctly. Finally, we briefly describe the BerthaTM parser generator designed construct bottom-up parsers from LALRP(1) grammar specifications.