References



YooLex Functions and Macros

Note: all macros can be accessed only in the lex file.

Group Functions/Macros Description
Text YYTextType _yyText; _yyText contains the text scanned in. It can be one of the following three types:
  • yoogroup::PseudoVector
  • std::vector<char>
  • std::string
Currently, YooLex does not support Unicode. I will add the feature later when I figure out how wchar_t works.
yyleng macro, same as_yyText.size (). Provided for FLEX compatibility.
I/O void yySetInput (std::istream& is); sets the input stream to is.
void yySetOutput (std::ostream& os); sets the output stream to os. Only used for ECHO macro.
ECHO outputs _yyText to the output stream. Provided for FLEX compatibility.
Scanning Buffer int yyInput (); retrieves one character from the scanning buffer. Return EOF in case of eof.
template <typename ForwardIter> ForwardIter yyInput (ForwardIter begin, ForwardIter end); retrieves characters from the input buffer and copy them into the buffer beginning with begin. Return begin.
void yyUnput (char c); puts a character back into the scanning buffer;
void yyUnput (const char *start, size_t count); puts characters back into the scanning buffer;
void yyUnput (int size); resizes the internal buffer to so that characters are re-scanned.
Caution: no boundary check is done for performance reason.
yyless (n) macro, same as yyUnput (_yyText.size () - n). Provided for FLEX compatibility.
States void yyBegin (int newState); sets the current state to newState.
int _yyBaseState; current state
void yyPushState (int newState); pushes the current state onto stack and begin newState.
Caution: this function must be called before using yyPopState () and yyTopState ().
void yyPopState (); pops the state from the state stack and begin it.
int yyTopState (); returns the state on top of the state stack.
BEGIN (state) macro, same as yyBegin (YYSTATE_##state). Provided for FLEX compatibility.
Counting size_t _yyCharNum; current character position. 1
size_t yyGetCharNum () const; returns the current character position. 1
size_t _yyLineNum; current line number. 2
size_t yyGetLineNum () const; returns the current line number. 2
BOL 3 bool _yyIsBOL; indicates if the current input is at beginning of the line.
bool yyIsBOL () const;
indicates if the current input is at beginning of the line.
void yySetBOL (); sets the current input as if it is at the beginning of the line.
Misc yyterminate () macro, same as return YYSType (-1). Provided for FLEX compatibility.
  1. Only works if "char" option is used.
  2. Only works if "line" option is used.
  3. Only works if "bol" option or BOL states are used.

Data Types

These are the data types defined in yoogroup::YooLex<YYS_TYPE, YYTEXT_TYPE>., which is the parent class of all scanner classes.

YYTextType Data Type for _yyText
YYSType Data Type for the return value of yyLexCase and yyLex (). Usually it should be int, which is the default. It is not advised to alter it unless you have no other ways. YooParse requires it to be type int.

Command line option

Usage: yoolex [options] file
Options:
  -h, -u, -?	print this message
  -b		report any DFA backups
  -c class	specify the C++ class name
  -i		use indirect threading for the g++ compiler
  -m		generate a main function
  -t table	specify the DFA table option
	-tm	generate a maximumly compressed table
	-te	generate an equivalence class table
	-tf	generate a full table
  -w		do not generate warning messages
Note that the command line options can be overridden by the %option directive in the input file.

Example:

	yoolex -i -tf lex.l
would generate the fastest scanner class for GCC compilers. If the regular expressions provided contain no DFA backups, there would be an even greater speed improvement. Most speed improvement tips for FLEX also works for YooLex.

Configurations

Configurations are started by %option at the beginning of a line in section 1 of the input file.

bol Force check BOL state for each pattern matched. By default, this option is false unless the user used BOL states.
ccext = "name" Specify the extension of the C++ source file. Default is ``.cc''.
ccfile = "name" Specify the output C++ source name including the extension. Default is class + ccext
char Do character counting.
class = "name" Specify the C++ scanner class name. Default is ``DefaultLex''.
header Always generate the C++ header file. By default YooLex only generates the C++ header file if it does not exist.
hhext = "name" Specify the extension of the C++ header file. Default is ``.hh''.
hhfile = "name" Specify the output C++ source name including the extension. Default is class + hhext.
Note, even if the header is not generated, this field is important for specifying the correct inclusion header for the source file.
ithread Use indirect threading for g++ compiler. This feature is slightly faster than the default switch/case method that is used to trigger actions on a token.
line Computes line number. This option works with yyUnput functions and trailing contexts.
linemode Uses line buffering instead of block buffering. This option is used for interactive scanners, otherwise do not use it since the scanning speed is very slow.
main generates a default main function.
namespace = "name" Specifies the namespace of the C++ class.
nocase Generates a case in-sensitive scanner. Currently, it is not possible to specify this option for individual regex.
noheader Does not generate the C++ header file at all. This option is intended for use with YooParse.
nowarn Does not generate warning messages.
table = "name" Specifies the DFA table formation. It can be one of the following:
  • full
    This option generates a full DFA table. The lookup speed is the fastest at cost of HUGE memory space.
  • ecs
    This option generates an equivalent class DFA table. It is quite fast and saves significant space comparing to the full DFA table.
  • max
    This option generate a table that is very compact, but slow in lookup. Usually though, scanning part is not the bottleneck. So this option generally works well. It is also the default DFA table format.
yystype = "name" Specifies the return type for YooLex::yyLex (). Default is int.
yytext = "name" Specifies the container type for YooLex::_yyText. This option only affects the header generated. It specify the container type for YooLex::yytext. It can be one of the following:
  • string
    Use std::string as the container type. Slow but convenient.
  • vector
    Use std::vector<char> as the container type. std::vector<char> is somewhat faster than std::string due to much less overhead.
  • fast
    Use yoolex::PseudoVector as the container type. This container does not really have memory space allocated for it since it just contains pointers to the internal buffer. The purpose of using this container is to avoid array duplication, which can be costly for a large input. The down side of using this container is that it cannot be resized. Thus, no insertions and deletions are supported. Many YooLex functions can also invalidate this container, so use it at caution. std::string and std::vector<char> cast operators are provided. Please see "yoolex.hh" for more information. This is the default yytext container type.
The default is ``fast''.

An example of using some of the configurations.

%option class = "test"

Overview

The YooLex input file is divided into three sections. The first section contains configurations and some C++ codes. The second section contains mostly regular expressions and the corresponding actions. The third section contains any C++ codes users wish to be included near the end of the generated source. In all sections, line comments (//) and block comments (/* ... */) are allowed. %% is used to seperate two sections.

// section 1

%%

%{
// section 2 prolog
%}

// section 2

%{
// section 2 epilog
%}

%%

// section 3

Section 1 Description:

The purpose of section 1 is to include codes necessary for the scanner as well as defining some regular expression (RegEx) pattern names. YooLex configurations are also specified here.

To insert code at the beginning of the C++ output file, use the following format:

%{
// code here will be echoed at the beginning of the generated
// C++ source file.
%}

Section 1 may contain useful name representation of some regular expressions:

PatternName	RegEx
IDENT		[A-Za-z]+
NUMBER		[0-9]+

Finally Section 1 may contain the declaration of inclusive and exclusive start conditions. The inclusive start condition is started with %s

%s CONDITION_1 CONDITION_2

The exclusive start condition is started with %x

%x CONDITION_1 CONDITION_2

The difference between two start conditions is that when no start conditions are specified, all RegEx actions are shared by inclusive conditions. For exclusive start conditions, the condition name must be explicitly specified for a given RegEx.


Section 2 Description:

Section 2 contains three parts, prolog, regular expression actions and epilog.

Prolog is located at the very beginning of the section 2 and it is started by %{ and ended by %}:

%{
// code here will be inserted at the beginning of class::yyLexCase () function.
// It is useful to has some code that does something whenever yycase () is
// called.  If indirect threading is used, all variables used must be declared in
// this section
%}

Then it comes the definition of various regular expressions and the actions triggered whenever those regular expressions are encountered. YooLex supports nested conditions like FLEX does. It also supports BOL (^), EOL ($), <<EOF>>, as well as trailing contexts in regular expressions. YooLex currrently can handle trailing contexts that contain either fixed headers like:

abc/d*e

or fixed trails like:

abc*/de

The following types of regular expressions are not supported:

abc*/c*e
abc*/d*e

Epilog is located at the very end of the section 2 and it is started by %{ and ended by %}:

%{
// code here will be inserted at the end of class::yyLexCase () function.
// It is useful to has some code that does some clean up works.  The code
// here is only reachable if the user code in the action does not contain
// return or yyterminate ().
%}

Section 3 Description:

All codes in this section are echoed as is at the bottom of the generated source code. So feel free to put class functions and main here.


$Id: reference.html,v 1.4 2003/01/20 09:24:14 coconut Exp $