I started ebnf2yacc as a personal experiment, and then ended up using it as a
tool at work (for OpenWBEM - see http://www.openwbem.org and
http://www.sourceforge.net/projects/openwbem). I finally got it into a usable
state, and decided to open source it.
The purpose of ebnf2yacc is to ease the creation of yacc parsers. Yacc input
files must be in bnf. It is much easier to write a grammer in ebnf. This
program will take an input file in ebnf and convert it to a usable yacc file.
Caveat: Right now, it will only accept bnf input, basically the same that you
would feed to yacc. The main usefullness of ebnf2yacc right now is to create
a c++ abstract syntax tree. For a concrete example, see the WQL parser of
OpenWBEM. It is planned in the future to support most ebnf features.
ebnf2yacc generates a set of classes that represent the ast of the
grammar. These ast classes support the visitor pattern. An abstract visitor
base class is generated as well as a sample concrete visitor that simply
traverses the tree. ebnf2yacc also generates a yacc file that can be used
(with slight modification if you need precedence or other yacc features)
to build the ast. To build a parser, you will still need to provide the
appropriate framework.
Some people learn best by example. There are two examples of ebnf2yacc input
in the tests subdirectory. test1.e2y is the grammar for WQL that I created for
OpenWBEM. test2.e2y is the grammar for ebnf2yacc itself.
In order to implement certain features, ebnf2yacc makes use of certain
characteristics of the names of grammar rules.
Any token that is ALL CAPS is assumed to be a terminal, and a token that comes
from the lexer.
If a rule begins with "str" (e.g. strToken) or is ALL CAPS, it is stored as a
string in the ast. No ast class is generated for rules that begin with str.
You should only use this for rules that are simple alternatives of a bunch of
tokens. e.g.:
strOp:
PLUS
| MINUS
| TIMES
| DIVIDE
;
If a rule begins with "opt", then code will be generated to check the ast for
null in the sample traversal visitor. e.g.:
optSemicolon: /* EMPTY */
| SEMICOLON
;
If a rule ends with "List", then the ast will contain a list of the first
non-terminal of the first alternative of the rule. e.g.:
varList:
var
| varList COMMA var
;
There is no checking done to enforce these rules, so the "garbage in, garbage
out" rule applies here.
Right now the names of the generated classes are fixed. I plan to have this
be configurable in the future, but have not yet decided on a good mechanism for
that.
To build ebnf2yacc you need lex and yacc. In particular, I used flex and bison.
It has not been tested with other lex and yaccs. If someone tries it with
a different lex or yacc, I would like to know if it works or not. I have
tried to write the code in portable c++, and have compiled it with gcc 2.95.2.
The project uses autoconf/automake, so to build it, you can simply run:
./configure
make
and then to install it:
make install
The binary is named ebnf2yacc. The command line arguments are:
Usage:
If you find any bugs or have any suggestions for improvements and features,
I am eager to hear them. Please feel free to make use of the sourceforge
facilities at http://sourceforge.net/projects/openwbem
There is a mailing list for ebnf2yacc, hosted at sourceforge that you can
subscribe to as well.
--Dan Nuffer