DParser for Python Documentation

This page describes the Python interface to DParser. Please see the DParser manual for more detailed information on DParser.

Basic Ideas

Grammar rules are input to DParser using Python function documentation strings. (A string placed as the first line of a Python function is the function's documentation string.) In order to let DParser know that you want it to use a specific function's documentation string as part of your grammar, begin that function's name with "d_". The function then becomes an action that is executed whenever the production defined in the documentation string reduces. For example,

def d_action1(t):
    " sentence : noun 'runs' "
    print 'found a sentence'
#...

This function specifies an action, d_action1, and a production, sentence, to DParser. d_action1 will be called when DParser recognizes a sentence. The argument, t, to d_action1 is an array. The array consists of the return values of the elements making up the production, or, for terminal elements, the string the terminal matched. In the above example, the array t array will contain the return value of noun's action as the first element and the Python string 'runs' as the second.

Regular expression are specified by enclosing the regular expression in double quotes:

def d_number(t):
    ' number : "[0-9]+" '            # match a positive integer
    return int(t[0])             # turn the matched string into an integer
#...

Make sure your documentation string is a Python raw string (precede it with the letter r) if it contains any Python escape sequences.

For more advanced features of productions, such as priorities and associativites, see the DParser manual.

For a simple, complete example to add integers, go back to the home page.

Arguments to actions

All actions take at least one argument, an array, as described above. Other arguments are optional. The interface recognizes which arguments you want based on the name you give the argument. Possible names are:

spec, spec_only: If an action takes spec, that action will be called for both speculative and final parses (otherwise, the action is only called for final parses). The value of spec indicates whether the parse is final or speculative (1 is speculative, 0 is final). To reject a speculative parse, return dparser.Reject. If an action takes spec_only, the action will be called only for speculative parses. The return value of the action for the final parse will be the same Python object that was returned for the speculative parse. Complete example.
g: DParser's global state. g is actually an array, the first element of which is the global state. (Using a one-element array in this manner allows the action to change the global state.)
s: contains an array (a tree, really) of the strings that make up this reduction. s is useful if the purpose of your parser is to alter some text, leaving it mostly intact. See here for a complete example.
nodes: an array of Python wrappers around the reduction's D_ParseNodes. They contain information on line numbers and such. See here for useful fields.
this: the D_ParseNode for the current production. ($$ in DParser.) Again, see this example.
parser: your parser (sometimes useful if you're dealing with multiple files).

Arguments to dparser.Parser()

All arguments are optional.

modules:: an array of modules containing the actions you want in your parser. If this argument is not specified, the calling module will be used.
file_prefix:: prefix for the filename of the parse table cache and other such files. Defaults to "d_parser_mach_gen"

Arguments to dparser.Parser.parse()

The first argument to dparser.Parser.parse is always the string that is to be parsed. All other arguments are optional.

start_symbol:

the start symbol. Defaults to the topmost symbol defined.

print_debug_info:

prints a list of actions that are called if non-zero. Question marks indicate the action is speculative.

dont_fixup_internal_productions, dont_merge_epsilon_trees, commit_actions_interval, error_recovery:

correspond to the members of D_Parser (see the DParser manual)

initial_skip_space_fn:

allows user-defined whitespace (as does the whitespace production, and instead of the built-in, c-like whitespace parser). Its argument is a d_loc_t structure. This structure's member, s, is an index into the string that is being parsed. Modify this index to skip whitespace:

def whitespace(loc):    # no d_ prefix
   while loc.s < len(loc.buf) and loc.buf[loc.s:loc.s+2] == ':)':    # make smiley face the whitespace
      loc.s = loc.s + 2
#...
Parser().parse('int:)var:)=:)2', initial_skip_space_fn = whitespace)

syntax_error_fn:

called on a syntax error. By default an exception is raised. It is passed a d_loc_t structure (see initial_skip_space_fn) indicating the location of the error. The function below will put '<--error' and a line break at the location of the error:

def syntax_error(loc):
    mn = max(loc.s - 10, 0)
    mx = min(loc.s + 10, len(loc.buf))
    begin = loc.buf[mn:loc.s]
    end = loc.buf[loc.s:mx]
    space = ' '*len(begin)
    print begin + '\n' + space + '<--error' + '\n' + space + end
#...
Parser().parse('python is bad.', syntax_error_fn = syntax_error)

ambiguity_fn:

resolves ambiguities. It takes an array of D_ParseNodes and expects one of them to be returned. By default a dparser.AmbiguityException is raised.

Pitfalls and Tips

Let me know if you run into a pitfall or have a tip, and I will put it here.

Debugging a Grammar

Pass print_debug_info=1 to Parser.parse() to see a list of the actions that are being called (pass it 2 to see only final actions).

Also, try looking at the grammar file that is created, d_parser_mach_gen.g.

Regular expressions:

DParser does not understand all of the regular expressions understood by the Python regular expression module. Make sure you are using regular expressions DParser can understand.

Also, make sure your documentation string is a Python raw string (precede it with the letter r) if it contains any Python escape sequences.

Whitespace:

By default, DParser treats tabs, spaces, newlines and #line commands as whitespace. If you want to deal with any of these yourself (especially be careful of the # character), you have to either create an initial_skip_space_fn, as shown above, or define the special whitespace production:

def d_whitespace(t):
   'whitespace : "[ \t\n]*" '     # treat space, tab and newline as whitespace, but treat the # character normally
   print 'found whitespace:' + t[0]

DParser specifiers/declarations:

DParser can be passed declarations in documentation strings. For example,

from dparser import Parser
def d_somefunc(t) : '${declare longest_match}'
#...

(see the DParser manual for an explanation of specifiers and declarations.)

Multiple productions per action:

You can put multiple productions (even your entire grammar) into one documentation string. Just make sure to add semicolons after each production:

from dparser import Parser

def d_grammar(t):
   '''sentence : noun verb;
      noun : 'dog' | 'cat';
      verb : 'run'
   '''
   print 'this function gets called for every reduction'

Parser().parse("dog run")