YooLex References

References

YooLex Functions and Macros
YooLex Data Types
Command Line Options
Configurations
Input Format Overview
Section 1 Description
Section 2 Description
Section 3 Description

YooLex Functions and Macros

Note: all macros can be accessed only in the lex file.

Group	Functions/Macros	Description
Text	`YYTextType _yyText;`	_yyText contains the text scanned in. It can be one of the following three types: `yoogroup::PseudoVector` `std::vector<char>` `std::string` Currently, YooLex does not support Unicode. I will add the feature later when I figure out how `wchar_t` works.
Text	`yyleng`	macro, same as`_yyText.size ()`. Provided for FLEX compatibility.
I/O	`void yySetInput (std::istream& is);`	sets the input stream to `is`.
	`void yySetOutput (std::ostream& os);`	sets the output stream to `os`. Only used for `ECHO` macro.
	`ECHO`	outputs _yyText to the output stream. Provided for FLEX compatibility.
Scanning Buffer	`int yyInput ();`	retrieves one character from the scanning buffer. Return EOF in case of eof.
	`template <typename ForwardIter> ForwardIter yyInput (ForwardIter begin, ForwardIter end);`	retrieves characters from the input buffer and copy them into the buffer beginning with `begin`. Return `begin`.
	`void yyUnput (char c);`	puts a character back into the scanning buffer;
	`void yyUnput (const char *start, size_t count);`	puts characters back into the scanning buffer;
	`void yyUnput (int size);`	resizes the internal buffer to so that characters are re-scanned. Caution: no boundary check is done for performance reason.
	`yyless (n)`	macro, same as `yyUnput (_yyText.size () - n)`. Provided for FLEX compatibility.
States	`void yyBegin (int newState);`	sets the current state to `newState`.
	`int _yyBaseState;`	current state
	`void yyPushState (int newState);`	pushes the current state onto stack and begin `newState`. Caution: this function must be called before using `yyPopState ()` and `yyTopState ()`.
	`void yyPopState ();`	pops the state from the state stack and begin it.
	`int yyTopState ();`	returns the state on top of the state stack.
	`BEGIN (state)`	macro, same as `yyBegin (YYSTATE_##state)`. Provided for FLEX compatibility.
Counting	`size_t _yyCharNum;`	current character position. 1
	`size_t yyGetCharNum () const;`	returns the current character position. 1
	`size_t _yyLineNum;`	current line number. 2
	`size_t yyGetLineNum () const;`	returns the current line number. 2
BOL 3	`bool _yyIsBOL;`	indicates if the current input is at beginning of the line.
	`bool yyIsBOL () const;`	indicates if the current input is at beginning of the line.
	`void yySetBOL ();`	sets the current input as if it is at the beginning of the line.
Misc	`yyterminate ()`	macro, same as `return YYSType (-1)`. Provided for FLEX compatibility.

Only works if "char" option is used.
Only works if "line" option is used.
Only works if "bol" option or BOL states are used.

Data Types

These are the data types defined in yoogroup::YooLex<YYS_TYPE, YYTEXT_TYPE>., which is the parent class of all scanner classes.

YYTextType	Data Type for _yyText
YYSType	Data Type for the return value of `yyLexCase` and `yyLex ()`. Usually it should be `int`, which is the default. It is not advised to alter it unless you have no other ways. YooParse requires it to be type `int`.

Command line option

Usage: yoolex [options] file
Options:
  -h, -u, -?	print this message
  -b		report any DFA backups
  -c class	specify the C++ class name
  -i		use indirect threading for the g++ compiler
  -m		generate a main function
  -t table	specify the DFA table option
	-tm	generate a maximumly compressed table
	-te	generate an equivalence class table
	-tf	generate a full table
  -w		do not generate warning messages

Note that the command line options can be overridden by the %option directive in the input file.

Example:

	yoolex -i -tf lex.l

would generate the fastest scanner class for GCC compilers. If the regular expressions provided contain no DFA backups, there would be an even greater speed improvement. Most speed improvement tips for FLEX also works for YooLex.

Configurations

Configurations are started by %option at the beginning of a line in section 1 of the input file.

`bol`	Force check BOL state for each pattern matched. By default, this option is false unless the user used BOL states.
`ccext = "name"`	Specify the extension of the C++ source file. Default is ```.cc`''.
`ccfile = "name"`	Specify the output C++ source name including the extension. Default is class + ccext
`char`	Do character counting.
`class = "name"`	Specify the C++ scanner class name. Default is ```DefaultLex`''.
`header`	Always generate the C++ header file. By default YooLex only generates the C++ header file if it does not exist.
`hhext = "name"`	Specify the extension of the C++ header file. Default is ```.hh`''.
`hhfile = "name"`	Specify the output C++ source name including the extension. Default is class + hhext. Note, even if the header is not generated, this field is important for specifying the correct inclusion header for the source file.
`ithread`	Use indirect threading for g++ compiler. This feature is slightly faster than the default switch/case method that is used to trigger actions on a token.
`line`	Computes line number. This option works with `yyUnput` functions and trailing contexts.
`linemode`	Uses line buffering instead of block buffering. This option is used for interactive scanners, otherwise do not use it since the scanning speed is very slow.
`main`	generates a default main function.
`namespace = "name"`	Specifies the namespace of the C++ class.
`nocase`	Generates a case in-sensitive scanner. Currently, it is not possible to specify this option for individual regex.
`noheader`	Does not generate the C++ header file at all. This option is intended for use with YooParse.
`nowarn`	Does not generate warning messages.
`table = "name"`	Specifies the DFA table formation. It can be one of the following: `full` This option generates a full DFA table. The lookup speed is the fastest at cost of HUGE memory space. `ecs` This option generates an equivalent class DFA table. It is quite fast and saves significant space comparing to the full DFA table. `max` This option generate a table that is very compact, but slow in lookup. Usually though, scanning part is not the bottleneck. So this option generally works well. It is also the default DFA table format.
`yystype = "name"`	Specifies the return type for `YooLex::yyLex ()`. Default is `int`.
`yytext = "name"`	Specifies the container type for `YooLex::_yyText`. This option only affects the header generated. It specify the container type for YooLex::yytext. It can be one of the following: `string` Use `std::string` as the container type. Slow but convenient. `vector` Use `std::vector<char>` as the container type. `std::vector<char>` is somewhat faster than `std::string` due to much less overhead. `fast` Use `yoolex::PseudoVector` as the container type. This container does not really have memory space allocated for it since it just contains pointers to the internal buffer. The purpose of using this container is to avoid array duplication, which can be costly for a large input. The down side of using this container is that it cannot be resized. Thus, no insertions and deletions are supported. Many YooLex functions can also invalidate this container, so use it at caution. `std::string` and `std::vector<char>` cast operators are provided. Please see `"yoolex.hh"` for more information. This is the default yytext container type. The default is ```fast`''.

An example of using some of the configurations.

%option class = "test"

Overview

The YooLex input file is divided into three sections. The first section contains configurations and some C++ codes. The second section contains mostly regular expressions and the corresponding actions. The third section contains any C++ codes users wish to be included near the end of the generated source. In all sections, line comments (//) and block comments (/* ... */) are allowed. %% is used to seperate two sections.

// section 1

%%

%{
// section 2 prolog
%}

// section 2

%{
// section 2 epilog
%}

%%

// section 3

Section 1 Description:

The purpose of section 1 is to include codes necessary for the scanner as well as defining some regular expression (RegEx) pattern names. YooLex configurations are also specified here.

To insert code at the beginning of the C++ output file, use the following format:

%{
// code here will be echoed at the beginning of the generated
// C++ source file.
%}

Section 1 may contain useful name representation of some regular expressions:

PatternName	RegEx
IDENT		[A-Za-z]+
NUMBER		[0-9]+

Finally Section 1 may contain the declaration of inclusive and exclusive start conditions. The inclusive start condition is started with %s

%s CONDITION_1 CONDITION_2

The exclusive start condition is started with %x

%x CONDITION_1 CONDITION_2

The difference between two start conditions is that when no start conditions are specified, all RegEx actions are shared by inclusive conditions. For exclusive start conditions, the condition name must be explicitly specified for a given RegEx.

Section 2 Description:

Section 2 contains three parts, prolog, regular expression actions and epilog.

Prolog is located at the very beginning of the section 2 and it is started by %{ and ended by %}:

%{
// code here will be inserted at the beginning of class::yyLexCase () function.
// It is useful to has some code that does something whenever yycase () is
// called.  If indirect threading is used, all variables used must be declared in
// this section
%}

Then it comes the definition of various regular expressions and the actions triggered whenever those regular expressions are encountered. YooLex supports nested conditions like FLEX does. It also supports BOL (^), EOL ($), <<EOF>>, as well as trailing contexts in regular expressions. YooLex currrently can handle trailing contexts that contain either fixed headers like:

abc/d*e

or fixed trails like:

abc*/de

The following types of regular expressions are not supported:

abc*/c*e
abc*/d*e

Epilog is located at the very end of the section 2 and it is started by %{ and ended by %}:

%{
// code here will be inserted at the end of class::yyLexCase () function.
// It is useful to has some code that does some clean up works.  The code
// here is only reachable if the user code in the action does not contain
// return or yyterminate ().
%}

Section 3 Description:

All codes in this section are echoed as is at the bottom of the generated source code. So feel free to put class functions and main here.

$Id: reference.html,v 1.4 2003/01/20 09:24:14 coconut Exp $