This is semantic.info, produced by makeinfo version 4.2 from
semantic.texi.

START-INFO-DIR-ENTRY
* semantic: (semantic).       Semantic Parsing for Emacs
END-INFO-DIR-ENTRY


File: semantic.info,  Node: Top,  Next: Install,  Prev: (dir),  Up: (dir)

   Semantic is a program for Emacs which includes, at its core, a lexer,
and a compiler compiler (bovinator).  Additional tools include a
bnf->semantic table converter, example tables, and a speedbar tool.

   The core utility is the "semantic bovinator" which has similar
behaviors as yacc or bison.  Since it is not designed to be as feature
rich as these tools, it uses the term "bovine" for cow, a lesser cousin
of the yak and bison.

   To send bug reports, or participate in discussions about semantic,
use the mailing list cedet-semantic@sourceforge.net via the URL:
<http://lists.sourceforge.net/lists/listinfo/cedet-semantic>

* Menu:

* Install::                     Installing semantic.
* Overview::                    Introduce basic concepts.
* Semantic Components::         Enumerate all semantic modules.
* Lexing::                      Setting up the lexer for your language.
* Bovinating::                  Setting up the parser for your language.
* BNF conversion::              Using the BNF converter to make tables.
* Compiling::                   Running the bovinator on a source file.
* Debugging::                   Debugging bovine tables
* Programming::                 How to program with a nonterminal stream.
* Current Context::             How to get the current code context.
* Tools::                       User Tools which use semantic.
* Index::


File: semantic.info,  Node: Install,  Next: Overview,  Prev: Top,  Up: Top

Installation
************

   To install semantic, untar the distribution into a subdirectory,
such as `/usr/share/emacs/site-lisp/semantic-#.#'.  Next, add the
following lines into your individual `.emacs' file, or into
`site-lisp/site-start.el'.

     (setq semantic-load-turn-everything-on t)
     (load-file "/path/to/semantic/semantic-load.el")

   If you would like to turn individual tools on or off in your init
file, skip the first line.


File: semantic.info,  Node: Overview,  Next: Semantic Components,  Prev: Install,  Up: Top

Overview
********

   Semantic is a tool primarily for the Emacs-Lisp programmer.
However, it comes with "applications" that non-programmer might find
useful.  This chapter is mostly for the benefit of these non-programmers
as it gives brief descriptions of basic concepts such as grammars,
parsers, compiler-compilers, parse-tree, etc.

   The grammar of a natural language defines rules by which valid
phrases and sentences can be composed using words, the fundamental
units with which all sentences are created.  In a similar fashion, a
"context-free grammar" defines the rules by which programs can be
composed using the fundamental units of the language, i.e., numbers,
symbols, punctuations, etc.  Context-free grammars are often specified
in a well-known form called Backus-Naur Form, BNF for short.  This is a
systematic way of representing context-free grammars such that programs
can read files with grammars written in BNF and generate code for
"parser" of that language.  YACC (Yet Another Compiler Compiler) is one
such program that has been part of UNIX operating systems since the
1970's.  YACC is pronounced the same as "yak", the long-haired ox found
in Asia.  The parser generated by YACC is usually a C program.  Bison
(http://www.gnu.org/software/bison/bison.html) is also a "compiler
compiler" that takes BNF grammars and produces parsers in C language.
The difference between YACC and Bison is that Bison is free software
(http://www.gnu.org/philosophy/free-sw.html) and upward-compatible with
YACC.  It also comes with an excellent manual.

   Semantic is similar in spirit to YACC and Bison.  Semantic, however,
is referred to as a "bovinator" rather than as a parser, because it is
a lesser cousin of YACC and Bison.  It is lesser in that it does not
perform a full parse like YACC or Bison.  Instead, it "bovinates".
"Bovination" refers to partial parsing which creates "parse trees" of
only the top most expressions rather than parsing every nested
expression.  This is sufficient for the purposes for which semantic was
designed.  Semantic is meant to be used within Emacs for providing
editor-related features such as code browsers and translators rather
than for compiling which requires far more complex and complete parsers.
Semantic is not designed to be able to create full parse trees.

   One key benefit of semantic is that it creates parse trees (perhaps
the term "bovine tree" may be more accurate) with the same structure
regardless of the type of language involved.  Higher level applications
written to work with bovine trees will then work with any language for
which the grammar is available.  For example, a code browser written
today that supports C, C++, and Java may work without any change on
other languages that do not even exist yet.  All one has to do is to
write the BNF specification for the new language.  The rest of the work
is done by semantic.  For certain languages, it is hard if not
impossible to specify the syntax of the language in BNF form, e.g.,
texinfo (http://www.texinfo.org) and other document oriented languages.
Semantic provides a parser for texinfo nevertheless.  Instead of BNF
grammar, texinfo files are "parsed" using *Note regular-expressions:
(emacs)Regexps.

   Semantic comes with grammars for these languages:

   * C

   * Emacs-Lisp

   * java

   * makefile

   * scheme

   Several tools employing semantic that provide user observable
features are listed in *Note Tools:: section.


File: semantic.info,  Node: Semantic Components,  Next: Lexing,  Prev: Overview,  Up: Top

Semantic Components
*******************

   This chapter gives an overview of major components of semantic and
how they interact with each other to perform its job.

   The first step of parsing is to break up the input file into its
fundamental components.  This step is called lexing.  The output of the
lexer is a list of tokens that make up the file.

             syntax table, keywords list, and options
                              |
                              |
                              v
         input file  ---->  Lexer   ----> token stream

   The next step is the parsing shown below.

                         bovine table
                              |
                              v
         token stream --->  Parser  ----> parse tree

   The end result, the parse tree, is created based on the "bovine
table", which is the internal representation of the BNF language
grammar used by semantic.

   Semantic database provides caching of the parse trees by saving them
into files named `semantic.cache' automatically then loading them when
appropriate instead of re-parsing.  The reason for this is to save the
time it takes to parse a file which could take several seconds or more
for large files.

   Finally, semantic provides an API for the Emacs-Lisp programmer to
access the information in the parse tree.


File: semantic.info,  Node: Lexing,  Next: Bovinating,  Prev: Semantic Components,  Up: Top

Preparing your language for Lexing
**********************************

   In order to reduce a source file into a token list, it must first be
converted into a token stream.  Tokens are syntactic elements such as
whitespace, symbols, strings, lists, and punctuation.

   The lexer uses the major-mode's syntax table for conversion.  *Note
Syntax Tables: (elisp)Syntax Tables.  As long as that is set up
correctly (along with the important `comment-start' and
`comment-start-skip' variable) the lexer should already work for your
language.

   The primary entry point of the lexer is the "semantic-flex" function
shown below.  Normally, you do not need to call this function.  It is
usually called by _semantic-bovinate-toplevel_ for you.

 - Function: semantic-flex start end &optional depth length
     Using the syntax table, do something roughly equivalent to flex.
     Semantically check between START and END.  Optional argument DEPTH
     indicates at what level to scan over entire lists.  The return
     value is a token stream.  Each element is a list of the form
     (symbol start-expression .  end-expresssion). END does not mark
     the end of the text scanned, only the end of the beginning of text
     scanned.  Thus, if a string extends past END, the end of the
     return token will be larger than END.  To truly restrict scanning,
     use `narrow-to-region'.  The last argument, LENGTH specifies that
     "semantic-flex" should only return LENGTH tokens.

* Menu:

* Lexer Overview::
* Lexer Output::
* Lexer Options::
* Keywords::
* Keyword Properties::


File: semantic.info,  Node: Lexer Overview,  Next: Lexer Output,  Prev: Lexing,  Up: Lexing

Lexer Overview
==============

   Semantic lexer breaks up the content of an Emacs buffer into a list
of tokens.  This process is based mostly on regular expressions which in
turn depend on the syntax table of the buffer's major mode being setup
properly.  *Note Major Modes: (emacs)Major Modes.  *Note Syntax Tables:
(elisp)Syntax Tables.  *Note Regexps: (emacs)Regexps.

   Specifically, the following regular expressions which rely on syntax
tables are used:

``\\s-''
     whitespace characters

``\\sw''
     word constituent

``\\s_''
     symbol constituent

``\\s.''
     punctuation character

``\\s<''
     comment starter

``\\s>''
     comment ender

``\\s\\''
     escape character

``\\s)''
     close parenthesis character

``\\s$''
     paired delimiter

``\\s\"''
     string quote

``\\s\'''
     expression prefix

   In addition, Emacs' built-in features such as `comment-start-skip',
`forward-comment', `forward-list', and `forward-sexp' are employed.


File: semantic.info,  Node: Lexer Output,  Next: Lexer Options,  Prev: Lexer Overview,  Up: Lexing

Lexer Output
============

   The lexer, *Note semantic-flex::, scans the content of a buffer and
returns a token list.  Let's illustrate this using this simple example.

     00: /*
     01:  * Simple program to demonstrate semantic.
     02:  */
     03:
     04: #include <stdio.h>
     05:
     06: int i_1;
     07:
     08: int
     09: main(int argc, char** argv)
     10: {
     11:     printf("Hello world.\n");
     12: }

   Evaluating `(semantic-flex (point-min) (point-max))' within the
buffer with the code above returns the following token list.  The input
line and string that produced each token is shown after each semi-colon.

     ((punctuation     52 .  53)     ; 04: #
      (INCLUDE         53 .  60)     ; 04: include
      (punctuation     61 .  62)     ; 04: <
      (symbol          62 .  67)     ; 04: stdio
      (punctuation     67 .  68)     ; 04: .
      (symbol          68 .  69)     ; 04: h
      (punctuation     69 .  70)     ; 04: >
      (INT             72 .  75)     ; 06: int
      (symbol          76 .  79)     ; 06: i_1
      (punctuation     79 .  80)     ; 06: ;
      (INT             82 .  85)     ; 08: int
      (symbol          86 .  90)     ; 08: main
      (semantic-list   90 . 113)     ; 08: (int argc, char** argv)
      (semantic-list  114 . 147)     ; 09-12: body of main function
      )

   As shown above, the token list is a list of "tokens".  Each token in
turn is a list of the form

     (TOKEN-TYPE BEGINNING-POSITION . ENDING-POSITION)

where TOKEN-TYPE is a symbol, and the other two are integers indicating
the buffer position that delimit the token such that

     (buffer-substring BEGINNING-POSITION ENDING-POSITION)

would return the string form of the token.

   Note that one line (line 4 above) can produce seven tokens while the
whole body of the function produces a single token.  This is because
the DEPTH parameter of `semantic-flex' was not specified.  Let's see
the output when DEPTH is set to 1.  Evaluate `(semantic-flex
(point-min) (point-max) 1)' in the same buffer.  Note the third
argument of `1'.

     ((punctuation    52 .  53)     ; 04: #
      (INCLUDE        53 .  60)     ; 04: include
      (punctuation    61 .  62)     ; 04: <
      (symbol         62 .  67)     ; 04: stdio
      (punctuation    67 .  68)     ; 04: .
      (symbol         68 .  69)     ; 04: h
      (punctuation    69 .  70)     ; 04: >
      (INT            72 .  75)     ; 06: int
      (symbol         76 .  79)     ; 06: i_1
      (punctuation    79 .  80)     ; 06: ;
      (INT            82 .  85)     ; 08: int
      (symbol         86 .  90)     ; 08: main
     
      (open-paren     90 .  91)     ; 08: (
      (INT            91 .  94)     ; 08: int
      (symbol         95 .  99)     ; 08: argc
      (punctuation    99 . 100)     ; 08: ,
      (CHAR          101 . 105)     ; 08: char
      (punctuation   105 . 106)     ; 08: *
      (punctuation   106 . 107)     ; 08: *
      (symbol        108 . 112)     ; 08: argv
      (close-paren   112 . 113)     ; 08: )
     
      (open-paren    114 . 115)     ; 10: {
      (symbol        120 . 126)     ; 11: printf
      (semantic-list 126 . 144)     ; 11: ("Hello world.\n")
      (punctuation   144 . 145)     ; 11: ;
      (close-paren   146 . 147)     ; 12: }
      )

   The DEPTH parameter "peeled away" one more level of "list" delimited
by matching parenthesis or braces.  The depth parameter can be
specified to be any number.  However, the parser needs to be able to
handle the extra tokens.

   This is an interesting benefit of the lexer having the full
resources of Emacs at its disposal.  Skipping over matched parenthesis
is achieved by simply calling the built-in functions `forward-list' and
`forward-sexp'.

   All common token symbols are enumerated below.  Additional token
symbols aside from these can be generated by the lexer if user option
SEMANTIC-FLEX-EXTENSIONS is set.  It is up to the user to add matching
extensions to the parser to deal with the lexer extensions.  An example
use of SEMANTIC-FLEX-EXTENSIONS is in `semantic-make.el' where
SEMANTIC-FLEX-EXTENSIONS is set to the value of
SEMANTIC-FLEX-MAKE-EXTENSIONS which may generate `shell-command' tokens.

Default syntactic tokens if the lexer is not extended.
------------------------------------------------------

`bol'
     Empty string matching a beginning of line.  This token is produced
     only if the user set SEMANTIC-FLEX-ENABLE-BOL to non-`nil'.

`charquote'
     String sequences that match `\\s\\+'.

`close-paren'
     Characters that match `\\s)'.  These are typically `)', `}', `]',
     etc.

`comment'
     A comment chunk.  These token types are not produced by default.
     They are produced only if the user set SEMANTIC-IGNORE-COMMENTS to
     `nil'.

`newline'
     Characters matching `\\s-*\\(\n\\|\\s>\\)'.  This token is
     produced only if the user set SEMANTIC-FLEX-ENABLE-NEWLINES to
     non-`nil'.

`open-paren'
     Characters that match `\\s('.  These are typically `(', `{', `[',
     etc.  Note that these are not usually generated unless the DEPTH
     argument to *Note semantic-flex:: is greater than 0.

`punctuation'
     Characters matching `\\(\\s.\\|\\s$\\|\\s'\\)'.

`semantic-list'
     String delimited by matching parenthesis, braces, etc. that the
     lexer skipped over, because the DEPTH parameter to *Note
     semantic-flex:: was not high enough.

`string'
     Quoted strings, i.e., string sequences that start and end with
     characters matching `\\s\"'.  The lexer relies on `forward-sexp'
     to find the matching end.

`symbol'
     String sequences that match `\\(\\sw\\|\\s_\\)+'.

`whitespace'
     Characters that match `\\s-+' regexp.  This token is produced only
     if the user set SEMANTIC-FLEX-ENABLE-WHITESPACE to non-`nil'.  If
     SEMANTIC-IGNORE-COMMENTS is non-`nil' too comments are considered
     as whitespaces.


File: semantic.info,  Node: Lexer Options,  Next: Keywords,  Prev: Lexer Output,  Up: Lexing

Lexer Options
=============

   Although most lexer functions are called for you by other semantic
functions, there are ways for you to extend or customize the lexer.
Three variables shown below serve this purpose.

 - Variable: semantic-flex-unterminated-syntax-end-function
     Function called when unterminated syntax is encountered.  This
     should be set to one function.  That function should take three
     parameters.  The SYNTAX, or type of syntax which is unterminated.
     SYNTAX-START where the broken syntax begins.  FLEX-END is where
     the lexical analysis was asked to end.  This function can be used
     for languages that can intelligently fix up broken syntax, or the
     exit lexical analysis via "throw" or "signal" when finding
     unterminated syntax.

 - Variable: semantic-flex-extensions
     Buffer local extensions to the lexical analyzer.  This should
     contain an alist with a key of a regex and a data element of a
     function.  The function should both move point, and return a
     lexical token of the form:

          ( TYPE START . END)

     `nil' is a valid return value.  TYPE can be any type of symbol, as
     long as it doesn't occur as a nonterminal in the language
     definition.

 - Variable: semantic-flex-syntax-modifications
     Changes the syntax table for a given buffer.  These changes are
     active only while the buffer is being flexed.  This is a list
     where each element has the form
          (CHAR CLASS)

     CHAR is the char passed to `modify-syntax-entry', and CLASS is the
     string also passed to `modify-syntax-entry' to define what syntax
     class CHAR has.

          (setq semantic-flex-syntax-modifications '((?. "_"))

     This makes the period . a symbol constituent.  This may be
     necessary if filenames are prevalent, such as in Makefiles.

 - Variable: semantic-flex-enable-newlines
     When flexing, report `'newlines' as syntactic elements.  Useful
     for languages where the newline is a special case terminator.
     Only set this on a per mode basis, not globally.

 - Variable: semantic-flex-enable-whitespace
     When flexing, report `'whitespace' as syntactic elements.  Useful
     for languages where the syntax is whitespace dependent.  Only set
     this on a per mode basis, not globally.

 - Variable: semantic-flex-enable-bol
     When flexing, report beginning of lines as syntactic elements.
     Useful for languages like python which are indentation sensitive.
     Only set this on a per mode basis, not globally.

 - Variable: semantic-number-expression
     Regular expression for matching a number.  If this value is `nil',
     no number extraction is done during lex.  This expression tries to
     match C and Java like numbers.

          DECIMAL_LITERAL:
              [1-9][0-9]*
            ;
          HEX_LITERAL:
              0[xX][0-9a-fA-F]+
            ;
          OCTAL_LITERAL:
              0[0-7]*
            ;
          INTEGER_LITERAL:
              <DECIMAL_LITERAL>[lL]?
            | <HEX_LITERAL>[lL]?
            | <OCTAL_LITERAL>[lL]?
            ;
          EXPONENT:
              [eE][+-]?[09]+
            ;
          FLOATING_POINT_LITERAL:
              [0-9]+[.][0-9]*<EXPONENT>?[fFdD]?
            | [.][0-9]+<EXPONENT>?[fFdD]?
            | [0-9]+<EXPONENT>[fFdD]?
            | [0-9]+<EXPONENT>?[fFdD]
            ;


File: semantic.info,  Node: Keywords,  Next: Keyword Properties,  Prev: Lexer Options,  Up: Lexing

Keywords
========

   Another important piece of the lexer is the keyword table (see *Note
Settings::).  You language will want to set up a keyword table for fast
conversion of symbol strings to language terminals.

   The keywords table can also be used to store additional information
about those keywords.  The following programming functions can be useful
when examining text in a language buffer.

 - Function: semantic-flex-keyword-p text
     Return non-`nil' if TEXT is a keyword in the keyword table.

 - Function: semantic-flex-keyword-put text property value
     For keyword TEXT, set PROPERTY to VALUE.

 - Function: semantic-token-put-no-side-effect token key value
     For TOKEN, put the property KEY on it with VALUE without side
     effects.  If VALUE is `nil', then remove the property from TOKEN.
     All cons cells in the property list are replicated so that there
     are no side effects if TOKEN is in shared lists.

 - Function: semantic-flex-keyword-get text property
     For keyword TEXT, get the value of PROPERTY.

 - Function: semantic-flex-map-keywords fun &optional property
     Call function FUN on every semantic keyword.  If optional PROPERTY
     is non-nil, call FUN only on every keyword which has a PROPERTY
     value.  FUN receives a semantic keyword as argument.

 - Function: semantic-flex-keywords &optional property
     Return a list of semantic keywords.  If optional PROPERTY is
     non-nil, return only keywords which have PROPERTY set.

   Keyword properties can be set up in a BNF file for ease of
maintenance.  While examining the text in a language buffer, this can
provide an easy and quick way of storing details about text in the
buffer.


File: semantic.info,  Node: Keyword Properties,  Prev: Keywords,  Up: Lexing

Standard Keyword Properties
===========================

   Add known properties here when they are known.


File: semantic.info,  Node: Bovinating,  Next: BNF conversion,  Prev: Lexing,  Up: Top

Preparing a bovine table for your language
******************************************

   When converting a source file into a nonterminal token stream
(parse-tree) it is important to specify rules to accomplish this.  The
rules are stored in the buffer local variable
`semantic-toplevel-bovine-table'.

   While it is certainly possible to write this table yourself, it is
most likely you will want to use the BNF converter (see *Note BNF
conversion::.)  This is an easier method for specifying your rules.
You will still need to specify a variable in your language for the
table, however.  A good rule of thumb is to call it
`language-toplevel-bovine-table' if it part of the language, or
`semantic-toplevel-language-bovine-table' if you donate it to the
semantic package.

   When initializing a major-mode for your language, you will set the
variable `semantic-toplevel-bovine-table' to the contents of your
language table.  `semantic-toplevel-bovine-table' is always buffer
local.

   Since it is important to know the format of the table when debugging
, you should still attempt to understand the basics of the table.

   Please see the documentation for the variable
`semantic-toplevel-bovine-table' for details on its format.

   * add more doc here *


File: semantic.info,  Node: BNF conversion,  Next: Compiling,  Prev: Bovinating,  Up: Top

Using the BNF converter to make bovine tables
*********************************************

   The BNF converter takes a file in "Bovine Normal Form" which is
similar to "Backus-Naur Form".  If you have ever used yacc or bison,
you will find it similar.  The BNF form used by semantic, however, does
not include token precedence rules, and several other features needed
to make real parser generators.

   It is important to have an Emacs Lisp file with a variable ready to
take the output of your table (see *Note Bovinating::.)  Also, make
sure that the file `semantic-bnf.el' is loaded.  Give your language
file the extension `.bnf' and you are ready.

   The comment character is #.

   When you want to test your file, use the keyboard shortcut `C-c C-c'
to parse the file, generate the variable, and load the new definition
in.  It will then use the settings specified above to determine what to
do.  Use the shortcut `C-c c' to do the same thing, but spend extra
time indenting the table nicely.

   Make sure that you create the variable specified in the
`%parsetable' token before trying to convert the BNF file.  A simple
definition like this is sufficient.

     (defvar semantic-toplevel-lang-bovine-table
        nil
        "Table for use with semantic for parsing LANG.")

   If you use tokens (created with the `%token' specifier), also make
sure you have a keyword table available, like this:

     (defvar semantic-lang-keyword-table
        nil
        "Table for use with semantic for keywords.")

   Specify the name of the keyword table with the `%keywordtable'
specifier.

   The BNF file has two sections.  The first is the settings section,
and the second is the language definition, or list of semantic rules.

* Menu:

* Settings::                    Setup for a language
* Rules::                       Create rules to parse a language
* Optional Lambda Expression::  Actions to take when a rule is matched
* Examples::                    Simple Samples
* Style Guide::                 What the tokens mean, and how to use them.


File: semantic.info,  Node: Settings,  Next: Rules,  Prev: BNF conversion,  Up: BNF conversion

Settings
========

   A setting is a keyword starting with a %.  (This syntax is taken
from yacc and bison. *Note (bison)::.)

   There are several settings that can be made in the settings section.
They are:

 - Setting: %start <nonterminal>
     Specify an alternative to `bovine-toplevel'.  (See below)

 - Setting: %scopestart <nonterminal>
     Specify an alternative to `bovine-inner-scope'.

 - Setting: %outputfile <filename>
     Required.  Specifies the file into which this files output is
     stored.

 - Setting: %parsetable <lisp-variable-name>
     Required.  Specifies a lisp variable into which the output is
     stored.

 - Setting: %setupfunction <lisp-function-name>
     Required.  Name of a function into which setup code is to be
     inserted.

 - Setting: %keywordtable <lisp-variable-name>
     Required if there are `%token' keywords.  Specifies a lisp
     variable into which the output of a keyword table is stored.  This
     obarray is used to turn symbols into keywords when applicable.

 - Setting: %token <name> "<text>"
     Optional.  Specify a new token NAME.  This is added to a lexical
     keyword list using TEXT.  The symbol is then converted into a new
     lexical terminal.  This requires that the `%keywordtable' specified
     variable is available in the file specified by `%outputfile'.

 - Setting: %token <name> type "<text>"
     Optional.  Specify a new token NAME.  It is made from an existing
     lexical token of type TYPE.  TEXT is a string which will be
     matched explicitly.  NAME can be used in match rules as though
     they were flex tokens, but are converted back to TYPE "text"
     internally.

 - Setting: %put <NAME> symbol <VALUE>
 - Setting: %put <NAME> ( symbol1 <VALUE1> symbol2 <VALUE2> ... )
 - Setting: %put ( <NAME1> <NAME2>...) symbol <VALUE>
     Tokens created without a type are considered keywords, and placed
     in a keyword table.  Use `%put' to apply properties to that
     keyword.  (see *Note Lexing::).

 - Setting: %languagemode <lisp-function-name>
 - Setting: %languagemode ( <lisp-function-name1> <lisp-function-name2>
          ... )
     Optional.  Specifies the Emacs major mode associated with the
     language being specified.  When the language is converted, all
     buffers of this mode will get the new table installed.

 - Setting: %quotemode backquote
     Optional.  Specifies how symbol quoting is handled in the Optional
     Lambda Expressions. (See below)

 - Setting: %( <lisp-expression> )%
     Specify setup code to be inserted into the `%setupfunction'.  It
     will be inserted between two specifier strings, or added to the
     end of the function.

   When working inside `%( ... )%' tokens, any lisp expression can be
entered which will be placed inside the setup function.  In general, you
probably want to set variables that tell Semantic and related tools how
the language works.

   Here are some variables that control how different programs will work
with your language.

 - Variable: semantic-flex-depth
     Default flexing depth.  This specifies how many lists to create
     tokens in.

 - Variable: semantic-number-expression
     Regular expression for matching a number.  If this value is `nil',
     no number extraction is done during lex.  Symbols which match this
     expression are returned as `number' tokens instead of `symbol'
     tokens.

     The default value for this variable should work in most languages.

 - Variable: semantic-flex-extensions
     Buffer local extensions to the lexical analyzer.  This should
     contain an alist with a key of a regex and a data element of a
     function.  The function should both move point, and return a
     lexical token of the form:
          ( TYPE START . END)

     `nil' is also a valid return.  TYPE can be any type of symbol, as
     long as it doesn't occur as a nonterminal in the language
     definition.

 - Variable: semantic-flex-syntax-modifications
     Updates to the syntax table for this buffer.  These changes are
     active only while this file is being flexed.  This is a list where
     each element is of the form:
          (CHAR CLASS)
     Where CHAR is the char passed to "modify-syntax-entry", and CLASS
     is the string also passed to "modify-syntax-entry" to define what
     class of syntax CHAR is.

 - Variable: semantic-flex-enable-newlines
     When flexing, report `'newlines' as syntactic elements.  Useful
     for languages where the newline is a special case terminator.
     Only set this on a per mode basis, not globally.

 - Variable: semantic-ignore-comments
     Default comment handling.  `t' means to strip comments when
     flexing.  `Nil' means to keep comments as part of the token stream.

 - Variable: semantic-symbol->name-assoc-list
     Association between symbols returned, and a string.  The string is
     used to represent a group of objects of the given type.  It is
     sometimes useful for a language to use a different string in place
     of the default, even though that language will still return a
     symbol.  For example, Java return's includes, but the string can
     be replaced with `Imports'.

 - Variable: semantic-case-fold
     Value for `case-fold-search' when parsing.

 - Variable: semantic-expand-nonterminal
     Function to call for each nonterminal production.  Return a list
     of non-terminals derived from the first argument, or `nil' if it
     does not need to be expanded.  Languages with compound definitions
     should use this function to expand from one compound symbol into
     several.  For example, in C the definition
          int a, b;
     is easily parsed into one token, but represents multiple
     variables.  A functions should be written which takes this
     compound token and turns it into two tokens, one for A, and the
     other for B.

     Within the language definition (the `.bnf' sources), it is often
     useful to set the NAME slot of a token with a list of items that
     distinguish each element in the compound definition.

     This list can then be detected by the function set in
     `semantic-expand-nonterminal' to create multiple tokens.  This
     function has one additional duty of managing the overlays created
     by semantic.  It is possible to use the single overlay in the
     compound token for all your tokens, but this can pose problems
     identifying all tokens covering a given definition.

     Please see `semantic-java.el' for an example of managing overlays
     when expanding a token into multiple definitions.

 - Variable: semantic-override-table
     Buffer local semantic function overrides alist.  These overrides
     provide a hook for a `major-mode' to override specific behaviors
     with respect to generated semantic toplevel nonterminals and
     things that these non-terminals are useful for.  Each element must
     be of the form: (SYM . FUN) where SYM is the symbol to override,
     and FUN is the function to override it with.

     Available override symbols:

     SYMBOL                     PARAMETERS          DESCRIPTION
     find-dependency            (token)             Find the dependency
                                                    file
     find-nonterminal           (token & parent)    Find token in buffer.
     find-documentation         (token & nosnarf)   Find doc comments.
     abbreviate-nonterminal     (token & parent)    Return summary string.
     summarize-nonterminal      (token & parent)    Return summary string.
     prototype-nonterminal      (token)             Return a prototype
                                                    string.
     concise-prototype-nonterminal'(tok & parent       Return a concise
                                color)              prototype string.
     uml-abbreviate-nonterminal'(tok & parent       Return a UML standard
                                color)              abbreviation string.
     uml-prototype-nonterminal' (tok & parent       Return a UML like
                                color)              prototype string.
     uml-concise-prototype-nonterminal'(tok & parent       Return a UML like
                                color)              concise prototype
                                                    string.
     prototype-file             (buffer)            Return a file in
                                                    which prototypes are
                                                    placed
     nonterminal-children       (token)             Return first rate
                                                    children. These are
                                                    children which may
                                                    contain overlays.
     nonterminal-external-member-parent(token)             Parent of TOKEN
     nonterminal-external-member-p(parent token)      Non nil if TOKEN has
                                                    PARENT, but is not in
                                                    PARENT.
     nonterminal-external-member-children(token & usedb)     Get all external
                                                    children of TOKEN.
     nonterminal-protection     (token & parent)    Return protection as
                                                    a symbol.
     nonterminal-abstract       (token & parent)    Return if TOKEN is
                                                    abstract.
     nonterminal-leaf           (token & parent)    Return if TOKEN is
                                                    leaf.
     nonterminal-static         (token & parent)    Return if TOKEN is
                                                    static.
     beginning-of-context       (& point)           Move to the beginning
                                                    of the
                                                     current context.
     end-of-context             (& point)           Move to the end of the
                                                    current context.
     up-context                 (& point)           Move up one context
                                                    level.
     get-local-variables        (& point)           Get local variables.
     get-all-local-variables    (& point)           Get all local
                                                    variables.
     get-local-arguments        (& point)           Get arguments to this
                                                    function.
     end-of-command                                 Move to the end of
                                                    the current
                                                      command
     beginning-of-command                           Move to the beginning
                                                    of the
                                                    current command
     ctxt-current-symbol        (& point)           List of related
                                                    symbols.
     ctxt-current-assignment    (& point)           Variable being
                                                    assigned to.
     ctxt-current-function      (& point)           Function being called
                                                    at point.
     ctxt-current-argument      (& point)           The index to the
                                                    argument of
                                                      the current
                                                    function the cursor
                                                              is in.

     Parameters mean:

    `&'
          Following parameters are optional

    `buffer'
          The buffer in which a token was found.

    `token'
          The nonterminal token we are doing stuff with

    `parent'
          If a TOKEN is stripped (of positional information) then this
          will be the parent token which should have positional
          information in it.


 - Variable: semantic-type-relation-separator-character
     Character strings used to separation a parent/child relationship.
     This list of strings are used for displaying or finding separators
     in variable field dereferencing.  The first character will be used
     for display.  In C, a type field is separated like this:
     "type.field" thus, the character is a ".".  In C, and additional
     value of "->" would be in the list, so that "type->field" could be
     found.

 - Variable: semantic-dependency-include-path
     Defines the include path used when searching for files.  This
     should be a list of directories to search which is specific to the
     file being included.  This variable can also be set to a single
     function.  If it is a function, it will be called with one
     arguments, the file to find as a string, and  it should return the
     full path to that file, or nil.

   This configures Imenu to use semantic parsing.

 - Variable: imenu-create-index-function
     The function to use for creating a buffer index.

     It should be a function that takes no arguments and returns an
     index of the current buffer as an alist.

     Simple elements in the alist look like `(INDEX-NAME .
     INDEX-POSITION)'.  Special elements look like `(INDEX-NAME
     INDEX-POSITION FUNCTION ARGUMENTS...)'.  A nested sub-alist
     element looks like (INDEX-NAME SUB-ALIST).  The function
     `imenu--subalist-p' tests an element and returns t if it is a
     sub-alist.

     This function is called within a `save-excursion'.

     The variable is buffer-local.

   These are specific to the document tool.

`document-comment-start'
     Comment start string.

`document-comment-line-prefix'
     Comment prefix string.  Used at the beginning of each line.

`document-comment-end'
     Comment end string.


File: semantic.info,  Node: Rules,  Next: Optional Lambda Expression,  Prev: Settings,  Up: BNF conversion

Rules
=====

   Writing the rules should be very similar to bison for basic syntax.
Each rule is of the form

     RESULT : MATCH1 (optional-lambda-expression)
            | MATCH2 (optional-lambda-expression)
            ;

   RESULT is a non-terminal, or a token synthesized in your grammar.
MATCH is a list of elements that are to be matched if RESULT is to be
made.  The optional lambda expression is a list containing simplified
rules for concocting the parse tree.

   In bison, each time an element of a MATCH is found, it is "shifted"
onto the parser stack.  (The stack of matched elements.)  When all of
MATCH1's elements have been matched, it is "reduced" to RESULT.  *Note
(bison)Algorithm::.

   The first RESULT written into your language specification should be
`bovine-toplevel', or the symbol specified with `%start'.  When
starting a parse for a file, this is the default token iterated over.
You can use any token you want in place of `bovine-toplevel' if you
specify what that nonterminal will be with a `%start' token in the
settings section.

   MATCH is made up of symbols and strings.  A symbol such as `foo'
means that a syntactic token of type `foo' must be matched.  A string
in the mix means that the previous symbol must have the additional
constraint of exactly matching it.  Thus, the combination:

     symbol "moose"

   means that a symbol must first be encountered, and then it must
`string-match "moose"'.  Be especially careful to remember that the
string is a regular expression.  The code:

     punctuation "."

   will match any punctuation.

   For the above example in bison, a LEX rule would be used to create a
new token MOOSE.  In this case, the MOOSE token would appear.  For the
bovinator, this task was mixed into the language definition to simplify
implementation, though Bison's technique is more efficient.

   To make a symbol match explicitly for keywords, for example, you can
use the `%token' command in the settings section to create new symbols.

     %token MOOSE "moose"
     
     find_a_moose: MOOSE
                 ;

   will match "moose" explicitly, unlike the previous example where
moose need only appear in the symbol.  This is because "moose" will be
converted to MOOSE in the lexical analysis stage.  Thus the symbol
MOOSE won't be available any other way.

   If we specify our token in this way:

     %token MOOSE symbol "moose"
     
     find_a_moose: MOOSE
                 ;

   then `MOOSE' will match the string "moose" explicitly, but it won't
do so at the lexical level, allowing use of the text "moose" in other
forms of regular expressions.

   Non symbol tokens are also allowed.  For example:

     %token PERIOD punctuation "."
     
     filename : symbol PERIOD symbol
              ;

   will explicitly match one period when used in the above rule.

   *Note Default syntactic tokens::.


File: semantic.info,  Node: Optional Lambda Expression,  Next: Examples,  Prev: Rules,  Up: BNF conversion

Optional Lambda Expressions
===========================

   The OLE (Optional Lambda Expression) is converted into a bovine
lambda (see *Note Bovinating::.) This lambda has special short-cuts to
simplify reading the Emacs BNF definition.  An OLE like this:

     ( $1 )

   results in a lambda return which consists entirely of the string or
object found by matching the first (zeroth) element of match.  An OLE
like this:

     ( ,(foo $1) )

   executes `foo' on the first argument, and then splices its return
into the return list whereas:

     ( (foo $1) )

   executes foo, and that is placed in the return list.

   Here are other things that can appear inline:
`$1'
     the first object matched.

`,$1'
     the first object spliced into the list (assuming it is a list from
     a non-terminal)

`'$1'
     the first object matched, placed in a list.  i.e. ( $1 )

`foo'
     the symbol foo (exactly as displayed)

`(foo)'
     a function call to foo which is stuck into the return list.

`,(foo)'
     a function call to foo which is spliced into the return list.

`'(foo)'
     a function call to foo which is stuck into the return list in a
     list.

`(EXPAND $1 nonterminal depth)'
     a list starting with EXPAND performs a recursive parse on the token
     passed to it (represented by $1 above.)  The semantic list is a
     common token to expand, as there are often interesting things in
     the list.  The NONTERMINAL is a symbol in your table which the
     bovinator will start with when parsing.  NONTERMINAL's definition
     is the same as any other nonterminal.  DEPTH should be at least 1
     when descending into a semantic list.

`(EXPANDFULL $1 nonterminal depth)'
     is like EXPAND, except that the parser will iterate over
     NONTERMINAL until there are no more matches.  (The same way the
     parser iterates over `bovine-toplevel'. This lets you have much
     simpler rules in this specific case, and also lets you have
     positional information in the returned tokens, and error skipping.

`(ASSOC symbol1 value1 symbol2 value2 ... )'
     This is used for creating an association list.  Each SYMBOL is
     included in the list if the associated VALUE is non-nil.  While
     the items are all listed explicitly, the created structure is an
     association list of the form:
          ( ( symbol1 . value1) (symbol2 . value2) ... )

   If the symbol `%quotemode backquote' is specified, then use `,@' to
splice a list in, and `,' to evaluate the expression.  This lets you
send `$1' as a symbol into a list instead of having it expanded inline.


File: semantic.info,  Node: Examples,  Next: Style Guide,  Prev: Optional Lambda Expression,  Up: BNF conversion

Examples
========

   The rule:

     SYMBOL : symbol

   is equivalent to

     SYMBOL : symbol
              ( $1 )

   which, if it matched the string "A", would return

     ( "A" )

   If this rule were used like this:

     ASSIGN: SYMBOL punctuation "=" SYMBOL
             ( $1 $3 )

   it would match "A=B", and return

     ( ("A") ("B") )

   The letters A and B come back in lists because SYMBOL is a
nonterminal, not an actual lexical element.

   to get a better result with nonterminals, use , to splice lists in
like this;

     ASSIGN: SYMBOL punctuation "=" SYMBOL
             ( ,$1 ,$3 )

   which would return

     ( "A" "B" )