This is semantic.info, produced by makeinfo version 4.2 from semantic.texi. START-INFO-DIR-ENTRY * semantic: (semantic). Semantic Parsing for Emacs END-INFO-DIR-ENTRY  File: semantic.info, Node: Top, Next: Install, Prev: (dir), Up: (dir) Semantic is a program for Emacs which includes, at its core, a lexer, and a compiler compiler (bovinator). Additional tools include a bnf->semantic table converter, example tables, and a speedbar tool. The core utility is the "semantic bovinator" which has similar behaviors as yacc or bison. Since it is not designed to be as feature rich as these tools, it uses the term "bovine" for cow, a lesser cousin of the yak and bison. To send bug reports, or participate in discussions about semantic, use the mailing list cedet-semantic@sourceforge.net via the URL: * Menu: * Install:: Installing semantic. * Overview:: Introduce basic concepts. * Semantic Components:: Enumerate all semantic modules. * Lexing:: Setting up the lexer for your language. * Bovinating:: Setting up the parser for your language. * BNF conversion:: Using the BNF converter to make tables. * Compiling:: Running the bovinator on a source file. * Debugging:: Debugging bovine tables * Programming:: How to program with a nonterminal stream. * Current Context:: How to get the current code context. * Tools:: User Tools which use semantic. * Index::  File: semantic.info, Node: Install, Next: Overview, Prev: Top, Up: Top Installation ************ To install semantic, untar the distribution into a subdirectory, such as `/usr/share/emacs/site-lisp/semantic-#.#'. Next, add the following lines into your individual `.emacs' file, or into `site-lisp/site-start.el'. (setq semantic-load-turn-everything-on t) (load-file "/path/to/semantic/semantic-load.el") If you would like to turn individual tools on or off in your init file, skip the first line.  File: semantic.info, Node: Overview, Next: Semantic Components, Prev: Install, Up: Top Overview ******** Semantic is a tool primarily for the Emacs-Lisp programmer. However, it comes with "applications" that non-programmer might find useful. This chapter is mostly for the benefit of these non-programmers as it gives brief descriptions of basic concepts such as grammars, parsers, compiler-compilers, parse-tree, etc. The grammar of a natural language defines rules by which valid phrases and sentences can be composed using words, the fundamental units with which all sentences are created. In a similar fashion, a "context-free grammar" defines the rules by which programs can be composed using the fundamental units of the language, i.e., numbers, symbols, punctuations, etc. Context-free grammars are often specified in a well-known form called Backus-Naur Form, BNF for short. This is a systematic way of representing context-free grammars such that programs can read files with grammars written in BNF and generate code for "parser" of that language. YACC (Yet Another Compiler Compiler) is one such program that has been part of UNIX operating systems since the 1970's. YACC is pronounced the same as "yak", the long-haired ox found in Asia. The parser generated by YACC is usually a C program. Bison (http://www.gnu.org/software/bison/bison.html) is also a "compiler compiler" that takes BNF grammars and produces parsers in C language. The difference between YACC and Bison is that Bison is free software (http://www.gnu.org/philosophy/free-sw.html) and upward-compatible with YACC. It also comes with an excellent manual. Semantic is similar in spirit to YACC and Bison. Semantic, however, is referred to as a "bovinator" rather than as a parser, because it is a lesser cousin of YACC and Bison. It is lesser in that it does not perform a full parse like YACC or Bison. Instead, it "bovinates". "Bovination" refers to partial parsing which creates "parse trees" of only the top most expressions rather than parsing every nested expression. This is sufficient for the purposes for which semantic was designed. Semantic is meant to be used within Emacs for providing editor-related features such as code browsers and translators rather than for compiling which requires far more complex and complete parsers. Semantic is not designed to be able to create full parse trees. One key benefit of semantic is that it creates parse trees (perhaps the term "bovine tree" may be more accurate) with the same structure regardless of the type of language involved. Higher level applications written to work with bovine trees will then work with any language for which the grammar is available. For example, a code browser written today that supports C, C++, and Java may work without any change on other languages that do not even exist yet. All one has to do is to write the BNF specification for the new language. The rest of the work is done by semantic. For certain languages, it is hard if not impossible to specify the syntax of the language in BNF form, e.g., texinfo (http://www.texinfo.org) and other document oriented languages. Semantic provides a parser for texinfo nevertheless. Instead of BNF grammar, texinfo files are "parsed" using *Note regular-expressions: (emacs)Regexps. Semantic comes with grammars for these languages: * C * Emacs-Lisp * java * makefile * scheme Several tools employing semantic that provide user observable features are listed in *Note Tools:: section.  File: semantic.info, Node: Semantic Components, Next: Lexing, Prev: Overview, Up: Top Semantic Components ******************* This chapter gives an overview of major components of semantic and how they interact with each other to perform its job. The first step of parsing is to break up the input file into its fundamental components. This step is called lexing. The output of the lexer is a list of tokens that make up the file. syntax table, keywords list, and options | | v input file ----> Lexer ----> token stream The next step is the parsing shown below. bovine table | v token stream ---> Parser ----> parse tree The end result, the parse tree, is created based on the "bovine table", which is the internal representation of the BNF language grammar used by semantic. Semantic database provides caching of the parse trees by saving them into files named `semantic.cache' automatically then loading them when appropriate instead of re-parsing. The reason for this is to save the time it takes to parse a file which could take several seconds or more for large files. Finally, semantic provides an API for the Emacs-Lisp programmer to access the information in the parse tree.  File: semantic.info, Node: Lexing, Next: Bovinating, Prev: Semantic Components, Up: Top Preparing your language for Lexing ********************************** In order to reduce a source file into a token list, it must first be converted into a token stream. Tokens are syntactic elements such as whitespace, symbols, strings, lists, and punctuation. The lexer uses the major-mode's syntax table for conversion. *Note Syntax Tables: (elisp)Syntax Tables. As long as that is set up correctly (along with the important `comment-start' and `comment-start-skip' variable) the lexer should already work for your language. The primary entry point of the lexer is the "semantic-flex" function shown below. Normally, you do not need to call this function. It is usually called by _semantic-bovinate-toplevel_ for you. - Function: semantic-flex start end &optional depth length Using the syntax table, do something roughly equivalent to flex. Semantically check between START and END. Optional argument DEPTH indicates at what level to scan over entire lists. The return value is a token stream. Each element is a list of the form (symbol start-expression . end-expresssion). END does not mark the end of the text scanned, only the end of the beginning of text scanned. Thus, if a string extends past END, the end of the return token will be larger than END. To truly restrict scanning, use `narrow-to-region'. The last argument, LENGTH specifies that "semantic-flex" should only return LENGTH tokens. * Menu: * Lexer Overview:: * Lexer Output:: * Lexer Options:: * Keywords:: * Keyword Properties::  File: semantic.info, Node: Lexer Overview, Next: Lexer Output, Prev: Lexing, Up: Lexing Lexer Overview ============== Semantic lexer breaks up the content of an Emacs buffer into a list of tokens. This process is based mostly on regular expressions which in turn depend on the syntax table of the buffer's major mode being setup properly. *Note Major Modes: (emacs)Major Modes. *Note Syntax Tables: (elisp)Syntax Tables. *Note Regexps: (emacs)Regexps. Specifically, the following regular expressions which rely on syntax tables are used: ``\\s-'' whitespace characters ``\\sw'' word constituent ``\\s_'' symbol constituent ``\\s.'' punctuation character ``\\s<'' comment starter ``\\s>'' comment ender ``\\s\\'' escape character ``\\s)'' close parenthesis character ``\\s$'' paired delimiter ``\\s\"'' string quote ``\\s\''' expression prefix In addition, Emacs' built-in features such as `comment-start-skip', `forward-comment', `forward-list', and `forward-sexp' are employed.  File: semantic.info, Node: Lexer Output, Next: Lexer Options, Prev: Lexer Overview, Up: Lexing Lexer Output ============ The lexer, *Note semantic-flex::, scans the content of a buffer and returns a token list. Let's illustrate this using this simple example. 00: /* 01: * Simple program to demonstrate semantic. 02: */ 03: 04: #include 05: 06: int i_1; 07: 08: int 09: main(int argc, char** argv) 10: { 11: printf("Hello world.\n"); 12: } Evaluating `(semantic-flex (point-min) (point-max))' within the buffer with the code above returns the following token list. The input line and string that produced each token is shown after each semi-colon. ((punctuation 52 . 53) ; 04: # (INCLUDE 53 . 60) ; 04: include (punctuation 61 . 62) ; 04: < (symbol 62 . 67) ; 04: stdio (punctuation 67 . 68) ; 04: . (symbol 68 . 69) ; 04: h (punctuation 69 . 70) ; 04: > (INT 72 . 75) ; 06: int (symbol 76 . 79) ; 06: i_1 (punctuation 79 . 80) ; 06: ; (INT 82 . 85) ; 08: int (symbol 86 . 90) ; 08: main (semantic-list 90 . 113) ; 08: (int argc, char** argv) (semantic-list 114 . 147) ; 09-12: body of main function ) As shown above, the token list is a list of "tokens". Each token in turn is a list of the form (TOKEN-TYPE BEGINNING-POSITION . ENDING-POSITION) where TOKEN-TYPE is a symbol, and the other two are integers indicating the buffer position that delimit the token such that (buffer-substring BEGINNING-POSITION ENDING-POSITION) would return the string form of the token. Note that one line (line 4 above) can produce seven tokens while the whole body of the function produces a single token. This is because the DEPTH parameter of `semantic-flex' was not specified. Let's see the output when DEPTH is set to 1. Evaluate `(semantic-flex (point-min) (point-max) 1)' in the same buffer. Note the third argument of `1'. ((punctuation 52 . 53) ; 04: # (INCLUDE 53 . 60) ; 04: include (punctuation 61 . 62) ; 04: < (symbol 62 . 67) ; 04: stdio (punctuation 67 . 68) ; 04: . (symbol 68 . 69) ; 04: h (punctuation 69 . 70) ; 04: > (INT 72 . 75) ; 06: int (symbol 76 . 79) ; 06: i_1 (punctuation 79 . 80) ; 06: ; (INT 82 . 85) ; 08: int (symbol 86 . 90) ; 08: main (open-paren 90 . 91) ; 08: ( (INT 91 . 94) ; 08: int (symbol 95 . 99) ; 08: argc (punctuation 99 . 100) ; 08: , (CHAR 101 . 105) ; 08: char (punctuation 105 . 106) ; 08: * (punctuation 106 . 107) ; 08: * (symbol 108 . 112) ; 08: argv (close-paren 112 . 113) ; 08: ) (open-paren 114 . 115) ; 10: { (symbol 120 . 126) ; 11: printf (semantic-list 126 . 144) ; 11: ("Hello world.\n") (punctuation 144 . 145) ; 11: ; (close-paren 146 . 147) ; 12: } ) The DEPTH parameter "peeled away" one more level of "list" delimited by matching parenthesis or braces. The depth parameter can be specified to be any number. However, the parser needs to be able to handle the extra tokens. This is an interesting benefit of the lexer having the full resources of Emacs at its disposal. Skipping over matched parenthesis is achieved by simply calling the built-in functions `forward-list' and `forward-sexp'. All common token symbols are enumerated below. Additional token symbols aside from these can be generated by the lexer if user option SEMANTIC-FLEX-EXTENSIONS is set. It is up to the user to add matching extensions to the parser to deal with the lexer extensions. An example use of SEMANTIC-FLEX-EXTENSIONS is in `semantic-make.el' where SEMANTIC-FLEX-EXTENSIONS is set to the value of SEMANTIC-FLEX-MAKE-EXTENSIONS which may generate `shell-command' tokens. Default syntactic tokens if the lexer is not extended. ------------------------------------------------------ `bol' Empty string matching a beginning of line. This token is produced only if the user set SEMANTIC-FLEX-ENABLE-BOL to non-`nil'. `charquote' String sequences that match `\\s\\+'. `close-paren' Characters that match `\\s)'. These are typically `)', `}', `]', etc. `comment' A comment chunk. These token types are not produced by default. They are produced only if the user set SEMANTIC-IGNORE-COMMENTS to `nil'. `newline' Characters matching `\\s-*\\(\n\\|\\s>\\)'. This token is produced only if the user set SEMANTIC-FLEX-ENABLE-NEWLINES to non-`nil'. `open-paren' Characters that match `\\s('. These are typically `(', `{', `[', etc. Note that these are not usually generated unless the DEPTH argument to *Note semantic-flex:: is greater than 0. `punctuation' Characters matching `\\(\\s.\\|\\s$\\|\\s'\\)'. `semantic-list' String delimited by matching parenthesis, braces, etc. that the lexer skipped over, because the DEPTH parameter to *Note semantic-flex:: was not high enough. `string' Quoted strings, i.e., string sequences that start and end with characters matching `\\s\"'. The lexer relies on `forward-sexp' to find the matching end. `symbol' String sequences that match `\\(\\sw\\|\\s_\\)+'. `whitespace' Characters that match `\\s-+' regexp. This token is produced only if the user set SEMANTIC-FLEX-ENABLE-WHITESPACE to non-`nil'. If SEMANTIC-IGNORE-COMMENTS is non-`nil' too comments are considered as whitespaces.  File: semantic.info, Node: Lexer Options, Next: Keywords, Prev: Lexer Output, Up: Lexing Lexer Options ============= Although most lexer functions are called for you by other semantic functions, there are ways for you to extend or customize the lexer. Three variables shown below serve this purpose. - Variable: semantic-flex-unterminated-syntax-end-function Function called when unterminated syntax is encountered. This should be set to one function. That function should take three parameters. The SYNTAX, or type of syntax which is unterminated. SYNTAX-START where the broken syntax begins. FLEX-END is where the lexical analysis was asked to end. This function can be used for languages that can intelligently fix up broken syntax, or the exit lexical analysis via "throw" or "signal" when finding unterminated syntax. - Variable: semantic-flex-extensions Buffer local extensions to the lexical analyzer. This should contain an alist with a key of a regex and a data element of a function. The function should both move point, and return a lexical token of the form: ( TYPE START . END) `nil' is a valid return value. TYPE can be any type of symbol, as long as it doesn't occur as a nonterminal in the language definition. - Variable: semantic-flex-syntax-modifications Changes the syntax table for a given buffer. These changes are active only while the buffer is being flexed. This is a list where each element has the form (CHAR CLASS) CHAR is the char passed to `modify-syntax-entry', and CLASS is the string also passed to `modify-syntax-entry' to define what syntax class CHAR has. (setq semantic-flex-syntax-modifications '((?. "_")) This makes the period . a symbol constituent. This may be necessary if filenames are prevalent, such as in Makefiles. - Variable: semantic-flex-enable-newlines When flexing, report `'newlines' as syntactic elements. Useful for languages where the newline is a special case terminator. Only set this on a per mode basis, not globally. - Variable: semantic-flex-enable-whitespace When flexing, report `'whitespace' as syntactic elements. Useful for languages where the syntax is whitespace dependent. Only set this on a per mode basis, not globally. - Variable: semantic-flex-enable-bol When flexing, report beginning of lines as syntactic elements. Useful for languages like python which are indentation sensitive. Only set this on a per mode basis, not globally. - Variable: semantic-number-expression Regular expression for matching a number. If this value is `nil', no number extraction is done during lex. This expression tries to match C and Java like numbers. DECIMAL_LITERAL: [1-9][0-9]* ; HEX_LITERAL: 0[xX][0-9a-fA-F]+ ; OCTAL_LITERAL: 0[0-7]* ; INTEGER_LITERAL: [lL]? | [lL]? | [lL]? ; EXPONENT: [eE][+-]?[09]+ ; FLOATING_POINT_LITERAL: [0-9]+[.][0-9]*?[fFdD]? | [.][0-9]+?[fFdD]? | [0-9]+[fFdD]? | [0-9]+?[fFdD] ;  File: semantic.info, Node: Keywords, Next: Keyword Properties, Prev: Lexer Options, Up: Lexing Keywords ======== Another important piece of the lexer is the keyword table (see *Note Settings::). You language will want to set up a keyword table for fast conversion of symbol strings to language terminals. The keywords table can also be used to store additional information about those keywords. The following programming functions can be useful when examining text in a language buffer. - Function: semantic-flex-keyword-p text Return non-`nil' if TEXT is a keyword in the keyword table. - Function: semantic-flex-keyword-put text property value For keyword TEXT, set PROPERTY to VALUE. - Function: semantic-token-put-no-side-effect token key value For TOKEN, put the property KEY on it with VALUE without side effects. If VALUE is `nil', then remove the property from TOKEN. All cons cells in the property list are replicated so that there are no side effects if TOKEN is in shared lists. - Function: semantic-flex-keyword-get text property For keyword TEXT, get the value of PROPERTY. - Function: semantic-flex-map-keywords fun &optional property Call function FUN on every semantic keyword. If optional PROPERTY is non-nil, call FUN only on every keyword which has a PROPERTY value. FUN receives a semantic keyword as argument. - Function: semantic-flex-keywords &optional property Return a list of semantic keywords. If optional PROPERTY is non-nil, return only keywords which have PROPERTY set. Keyword properties can be set up in a BNF file for ease of maintenance. While examining the text in a language buffer, this can provide an easy and quick way of storing details about text in the buffer.  File: semantic.info, Node: Keyword Properties, Prev: Keywords, Up: Lexing Standard Keyword Properties =========================== Add known properties here when they are known.  File: semantic.info, Node: Bovinating, Next: BNF conversion, Prev: Lexing, Up: Top Preparing a bovine table for your language ****************************************** When converting a source file into a nonterminal token stream (parse-tree) it is important to specify rules to accomplish this. The rules are stored in the buffer local variable `semantic-toplevel-bovine-table'. While it is certainly possible to write this table yourself, it is most likely you will want to use the BNF converter (see *Note BNF conversion::.) This is an easier method for specifying your rules. You will still need to specify a variable in your language for the table, however. A good rule of thumb is to call it `language-toplevel-bovine-table' if it part of the language, or `semantic-toplevel-language-bovine-table' if you donate it to the semantic package. When initializing a major-mode for your language, you will set the variable `semantic-toplevel-bovine-table' to the contents of your language table. `semantic-toplevel-bovine-table' is always buffer local. Since it is important to know the format of the table when debugging , you should still attempt to understand the basics of the table. Please see the documentation for the variable `semantic-toplevel-bovine-table' for details on its format. * add more doc here *  File: semantic.info, Node: BNF conversion, Next: Compiling, Prev: Bovinating, Up: Top Using the BNF converter to make bovine tables ********************************************* The BNF converter takes a file in "Bovine Normal Form" which is similar to "Backus-Naur Form". If you have ever used yacc or bison, you will find it similar. The BNF form used by semantic, however, does not include token precedence rules, and several other features needed to make real parser generators. It is important to have an Emacs Lisp file with a variable ready to take the output of your table (see *Note Bovinating::.) Also, make sure that the file `semantic-bnf.el' is loaded. Give your language file the extension `.bnf' and you are ready. The comment character is #. When you want to test your file, use the keyboard shortcut `C-c C-c' to parse the file, generate the variable, and load the new definition in. It will then use the settings specified above to determine what to do. Use the shortcut `C-c c' to do the same thing, but spend extra time indenting the table nicely. Make sure that you create the variable specified in the `%parsetable' token before trying to convert the BNF file. A simple definition like this is sufficient. (defvar semantic-toplevel-lang-bovine-table nil "Table for use with semantic for parsing LANG.") If you use tokens (created with the `%token' specifier), also make sure you have a keyword table available, like this: (defvar semantic-lang-keyword-table nil "Table for use with semantic for keywords.") Specify the name of the keyword table with the `%keywordtable' specifier. The BNF file has two sections. The first is the settings section, and the second is the language definition, or list of semantic rules. * Menu: * Settings:: Setup for a language * Rules:: Create rules to parse a language * Optional Lambda Expression:: Actions to take when a rule is matched * Examples:: Simple Samples * Style Guide:: What the tokens mean, and how to use them.  File: semantic.info, Node: Settings, Next: Rules, Prev: BNF conversion, Up: BNF conversion Settings ======== A setting is a keyword starting with a %. (This syntax is taken from yacc and bison. *Note (bison)::.) There are several settings that can be made in the settings section. They are: - Setting: %start Specify an alternative to `bovine-toplevel'. (See below) - Setting: %scopestart Specify an alternative to `bovine-inner-scope'. - Setting: %outputfile Required. Specifies the file into which this files output is stored. - Setting: %parsetable Required. Specifies a lisp variable into which the output is stored. - Setting: %setupfunction Required. Name of a function into which setup code is to be inserted. - Setting: %keywordtable Required if there are `%token' keywords. Specifies a lisp variable into which the output of a keyword table is stored. This obarray is used to turn symbols into keywords when applicable. - Setting: %token "" Optional. Specify a new token NAME. This is added to a lexical keyword list using TEXT. The symbol is then converted into a new lexical terminal. This requires that the `%keywordtable' specified variable is available in the file specified by `%outputfile'. - Setting: %token type "" Optional. Specify a new token NAME. It is made from an existing lexical token of type TYPE. TEXT is a string which will be matched explicitly. NAME can be used in match rules as though they were flex tokens, but are converted back to TYPE "text" internally. - Setting: %put symbol - Setting: %put ( symbol1 symbol2 ... ) - Setting: %put ( ...) symbol Tokens created without a type are considered keywords, and placed in a keyword table. Use `%put' to apply properties to that keyword. (see *Note Lexing::). - Setting: %languagemode - Setting: %languagemode ( ... ) Optional. Specifies the Emacs major mode associated with the language being specified. When the language is converted, all buffers of this mode will get the new table installed. - Setting: %quotemode backquote Optional. Specifies how symbol quoting is handled in the Optional Lambda Expressions. (See below) - Setting: %( )% Specify setup code to be inserted into the `%setupfunction'. It will be inserted between two specifier strings, or added to the end of the function. When working inside `%( ... )%' tokens, any lisp expression can be entered which will be placed inside the setup function. In general, you probably want to set variables that tell Semantic and related tools how the language works. Here are some variables that control how different programs will work with your language. - Variable: semantic-flex-depth Default flexing depth. This specifies how many lists to create tokens in. - Variable: semantic-number-expression Regular expression for matching a number. If this value is `nil', no number extraction is done during lex. Symbols which match this expression are returned as `number' tokens instead of `symbol' tokens. The default value for this variable should work in most languages. - Variable: semantic-flex-extensions Buffer local extensions to the lexical analyzer. This should contain an alist with a key of a regex and a data element of a function. The function should both move point, and return a lexical token of the form: ( TYPE START . END) `nil' is also a valid return. TYPE can be any type of symbol, as long as it doesn't occur as a nonterminal in the language definition. - Variable: semantic-flex-syntax-modifications Updates to the syntax table for this buffer. These changes are active only while this file is being flexed. This is a list where each element is of the form: (CHAR CLASS) Where CHAR is the char passed to "modify-syntax-entry", and CLASS is the string also passed to "modify-syntax-entry" to define what class of syntax CHAR is. - Variable: semantic-flex-enable-newlines When flexing, report `'newlines' as syntactic elements. Useful for languages where the newline is a special case terminator. Only set this on a per mode basis, not globally. - Variable: semantic-ignore-comments Default comment handling. `t' means to strip comments when flexing. `Nil' means to keep comments as part of the token stream. - Variable: semantic-symbol->name-assoc-list Association between symbols returned, and a string. The string is used to represent a group of objects of the given type. It is sometimes useful for a language to use a different string in place of the default, even though that language will still return a symbol. For example, Java return's includes, but the string can be replaced with `Imports'. - Variable: semantic-case-fold Value for `case-fold-search' when parsing. - Variable: semantic-expand-nonterminal Function to call for each nonterminal production. Return a list of non-terminals derived from the first argument, or `nil' if it does not need to be expanded. Languages with compound definitions should use this function to expand from one compound symbol into several. For example, in C the definition int a, b; is easily parsed into one token, but represents multiple variables. A functions should be written which takes this compound token and turns it into two tokens, one for A, and the other for B. Within the language definition (the `.bnf' sources), it is often useful to set the NAME slot of a token with a list of items that distinguish each element in the compound definition. This list can then be detected by the function set in `semantic-expand-nonterminal' to create multiple tokens. This function has one additional duty of managing the overlays created by semantic. It is possible to use the single overlay in the compound token for all your tokens, but this can pose problems identifying all tokens covering a given definition. Please see `semantic-java.el' for an example of managing overlays when expanding a token into multiple definitions. - Variable: semantic-override-table Buffer local semantic function overrides alist. These overrides provide a hook for a `major-mode' to override specific behaviors with respect to generated semantic toplevel nonterminals and things that these non-terminals are useful for. Each element must be of the form: (SYM . FUN) where SYM is the symbol to override, and FUN is the function to override it with. Available override symbols: SYMBOL PARAMETERS DESCRIPTION find-dependency (token) Find the dependency file find-nonterminal (token & parent) Find token in buffer. find-documentation (token & nosnarf) Find doc comments. abbreviate-nonterminal (token & parent) Return summary string. summarize-nonterminal (token & parent) Return summary string. prototype-nonterminal (token) Return a prototype string. concise-prototype-nonterminal'(tok & parent Return a concise color) prototype string. uml-abbreviate-nonterminal'(tok & parent Return a UML standard color) abbreviation string. uml-prototype-nonterminal' (tok & parent Return a UML like color) prototype string. uml-concise-prototype-nonterminal'(tok & parent Return a UML like color) concise prototype string. prototype-file (buffer) Return a file in which prototypes are placed nonterminal-children (token) Return first rate children. These are children which may contain overlays. nonterminal-external-member-parent(token) Parent of TOKEN nonterminal-external-member-p(parent token) Non nil if TOKEN has PARENT, but is not in PARENT. nonterminal-external-member-children(token & usedb) Get all external children of TOKEN. nonterminal-protection (token & parent) Return protection as a symbol. nonterminal-abstract (token & parent) Return if TOKEN is abstract. nonterminal-leaf (token & parent) Return if TOKEN is leaf. nonterminal-static (token & parent) Return if TOKEN is static. beginning-of-context (& point) Move to the beginning of the current context. end-of-context (& point) Move to the end of the current context. up-context (& point) Move up one context level. get-local-variables (& point) Get local variables. get-all-local-variables (& point) Get all local variables. get-local-arguments (& point) Get arguments to this function. end-of-command Move to the end of the current command beginning-of-command Move to the beginning of the current command ctxt-current-symbol (& point) List of related symbols. ctxt-current-assignment (& point) Variable being assigned to. ctxt-current-function (& point) Function being called at point. ctxt-current-argument (& point) The index to the argument of the current function the cursor is in. Parameters mean: `&' Following parameters are optional `buffer' The buffer in which a token was found. `token' The nonterminal token we are doing stuff with `parent' If a TOKEN is stripped (of positional information) then this will be the parent token which should have positional information in it. - Variable: semantic-type-relation-separator-character Character strings used to separation a parent/child relationship. This list of strings are used for displaying or finding separators in variable field dereferencing. The first character will be used for display. In C, a type field is separated like this: "type.field" thus, the character is a ".". In C, and additional value of "->" would be in the list, so that "type->field" could be found. - Variable: semantic-dependency-include-path Defines the include path used when searching for files. This should be a list of directories to search which is specific to the file being included. This variable can also be set to a single function. If it is a function, it will be called with one arguments, the file to find as a string, and it should return the full path to that file, or nil. This configures Imenu to use semantic parsing. - Variable: imenu-create-index-function The function to use for creating a buffer index. It should be a function that takes no arguments and returns an index of the current buffer as an alist. Simple elements in the alist look like `(INDEX-NAME . INDEX-POSITION)'. Special elements look like `(INDEX-NAME INDEX-POSITION FUNCTION ARGUMENTS...)'. A nested sub-alist element looks like (INDEX-NAME SUB-ALIST). The function `imenu--subalist-p' tests an element and returns t if it is a sub-alist. This function is called within a `save-excursion'. The variable is buffer-local. These are specific to the document tool. `document-comment-start' Comment start string. `document-comment-line-prefix' Comment prefix string. Used at the beginning of each line. `document-comment-end' Comment end string.  File: semantic.info, Node: Rules, Next: Optional Lambda Expression, Prev: Settings, Up: BNF conversion Rules ===== Writing the rules should be very similar to bison for basic syntax. Each rule is of the form RESULT : MATCH1 (optional-lambda-expression) | MATCH2 (optional-lambda-expression) ; RESULT is a non-terminal, or a token synthesized in your grammar. MATCH is a list of elements that are to be matched if RESULT is to be made. The optional lambda expression is a list containing simplified rules for concocting the parse tree. In bison, each time an element of a MATCH is found, it is "shifted" onto the parser stack. (The stack of matched elements.) When all of MATCH1's elements have been matched, it is "reduced" to RESULT. *Note (bison)Algorithm::. The first RESULT written into your language specification should be `bovine-toplevel', or the symbol specified with `%start'. When starting a parse for a file, this is the default token iterated over. You can use any token you want in place of `bovine-toplevel' if you specify what that nonterminal will be with a `%start' token in the settings section. MATCH is made up of symbols and strings. A symbol such as `foo' means that a syntactic token of type `foo' must be matched. A string in the mix means that the previous symbol must have the additional constraint of exactly matching it. Thus, the combination: symbol "moose" means that a symbol must first be encountered, and then it must `string-match "moose"'. Be especially careful to remember that the string is a regular expression. The code: punctuation "." will match any punctuation. For the above example in bison, a LEX rule would be used to create a new token MOOSE. In this case, the MOOSE token would appear. For the bovinator, this task was mixed into the language definition to simplify implementation, though Bison's technique is more efficient. To make a symbol match explicitly for keywords, for example, you can use the `%token' command in the settings section to create new symbols. %token MOOSE "moose" find_a_moose: MOOSE ; will match "moose" explicitly, unlike the previous example where moose need only appear in the symbol. This is because "moose" will be converted to MOOSE in the lexical analysis stage. Thus the symbol MOOSE won't be available any other way. If we specify our token in this way: %token MOOSE symbol "moose" find_a_moose: MOOSE ; then `MOOSE' will match the string "moose" explicitly, but it won't do so at the lexical level, allowing use of the text "moose" in other forms of regular expressions. Non symbol tokens are also allowed. For example: %token PERIOD punctuation "." filename : symbol PERIOD symbol ; will explicitly match one period when used in the above rule. *Note Default syntactic tokens::.  File: semantic.info, Node: Optional Lambda Expression, Next: Examples, Prev: Rules, Up: BNF conversion Optional Lambda Expressions =========================== The OLE (Optional Lambda Expression) is converted into a bovine lambda (see *Note Bovinating::.) This lambda has special short-cuts to simplify reading the Emacs BNF definition. An OLE like this: ( $1 ) results in a lambda return which consists entirely of the string or object found by matching the first (zeroth) element of match. An OLE like this: ( ,(foo $1) ) executes `foo' on the first argument, and then splices its return into the return list whereas: ( (foo $1) ) executes foo, and that is placed in the return list. Here are other things that can appear inline: `$1' the first object matched. `,$1' the first object spliced into the list (assuming it is a list from a non-terminal) `'$1' the first object matched, placed in a list. i.e. ( $1 ) `foo' the symbol foo (exactly as displayed) `(foo)' a function call to foo which is stuck into the return list. `,(foo)' a function call to foo which is spliced into the return list. `'(foo)' a function call to foo which is stuck into the return list in a list. `(EXPAND $1 nonterminal depth)' a list starting with EXPAND performs a recursive parse on the token passed to it (represented by $1 above.) The semantic list is a common token to expand, as there are often interesting things in the list. The NONTERMINAL is a symbol in your table which the bovinator will start with when parsing. NONTERMINAL's definition is the same as any other nonterminal. DEPTH should be at least 1 when descending into a semantic list. `(EXPANDFULL $1 nonterminal depth)' is like EXPAND, except that the parser will iterate over NONTERMINAL until there are no more matches. (The same way the parser iterates over `bovine-toplevel'. This lets you have much simpler rules in this specific case, and also lets you have positional information in the returned tokens, and error skipping. `(ASSOC symbol1 value1 symbol2 value2 ... )' This is used for creating an association list. Each SYMBOL is included in the list if the associated VALUE is non-nil. While the items are all listed explicitly, the created structure is an association list of the form: ( ( symbol1 . value1) (symbol2 . value2) ... ) If the symbol `%quotemode backquote' is specified, then use `,@' to splice a list in, and `,' to evaluate the expression. This lets you send `$1' as a symbol into a list instead of having it expanded inline.  File: semantic.info, Node: Examples, Next: Style Guide, Prev: Optional Lambda Expression, Up: BNF conversion Examples ======== The rule: SYMBOL : symbol is equivalent to SYMBOL : symbol ( $1 ) which, if it matched the string "A", would return ( "A" ) If this rule were used like this: ASSIGN: SYMBOL punctuation "=" SYMBOL ( $1 $3 ) it would match "A=B", and return ( ("A") ("B") ) The letters A and B come back in lists because SYMBOL is a nonterminal, not an actual lexical element. to get a better result with nonterminals, use , to splice lists in like this; ASSIGN: SYMBOL punctuation "=" SYMBOL ( ,$1 ,$3 ) which would return ( "A" "B" )