Note: all macros can be accessed only in the lex file.
Group | Functions/Macros | Description |
Text |
YYTextType _yyText;
|
_yyText contains the text scanned in. It can be one of the following three types:
wchar_t works.
|
yyleng
|
macro, same as_yyText.size () .
Provided for FLEX compatibility.
|
|
I/O |
void yySetInput (std::istream& is);
|
sets the input stream to is .
|
void yySetOutput (std::ostream& os);
|
sets the output stream to os . Only used for ECHO
macro.
|
|
ECHO
|
outputs _yyText to the output stream. Provided for FLEX compatibility. | |
Scanning Buffer |
int yyInput ();
|
retrieves one character from the scanning buffer. Return EOF in case of eof. |
template <typename ForwardIter>
ForwardIter yyInput (ForwardIter begin, ForwardIter end);
|
retrieves characters from the input buffer and copy them into the buffer
beginning with begin . Return begin .
|
|
void yyUnput (char c);
|
puts a character back into the scanning buffer; | |
void yyUnput (const char *start, size_t count);
|
puts characters back into the scanning buffer; | |
void yyUnput (int size);
|
resizes the internal buffer to so that characters are re-scanned.
Caution: no boundary check is done for performance reason.
|
|
yyless (n)
|
macro, same as yyUnput (_yyText.size () - n) .
Provided for FLEX compatibility.
|
|
States |
void yyBegin (int newState);
|
sets the current state to newState .
|
int _yyBaseState;
|
current state | |
void yyPushState (int newState);
|
pushes the current state onto stack and begin newState .
Caution: this function must be called before
using
yyPopState () and yyTopState () . |
|
void yyPopState ();
|
pops the state from the state stack and begin it. | |
int yyTopState ();
|
returns the state on top of the state stack. | |
BEGIN (state)
|
macro, same as yyBegin (YYSTATE_##state) .
Provided for FLEX compatibility.
|
|
Counting |
size_t _yyCharNum;
|
current character position. 1 |
size_t yyGetCharNum () const;
|
returns the current character position. 1 | |
size_t _yyLineNum;
|
current line number. 2 | |
size_t yyGetLineNum () const;
|
returns the current line number. 2 | |
BOL 3 |
bool _yyIsBOL;
|
indicates if the current input is at beginning of the line. |
bool yyIsBOL () const; |
indicates if the current input is at beginning of the line. | |
void yySetBOL ();
|
sets the current input as if it is at the beginning of the line. | |
Misc |
yyterminate ()
|
macro, same as return YYSType (-1) .
Provided for FLEX compatibility.
|
These are the data types defined in yoogroup::YooLex<YYS_TYPE, YYTEXT_TYPE>.
,
which is the parent class of all scanner classes.
YYTextType | Data Type for _yyText |
YYSType |
Data Type for the return value of yyLexCase and yyLex () .
Usually it should be int , which is the default. It is
not advised to alter it unless you have no other ways. YooParse requires
it to be type int .
|
Usage: yoolex [options] file Options: -h, -u, -? print this message -b report any DFA backups -c class specify the C++ class name -i use indirect threading for the g++ compiler -m generate a main function -t table specify the DFA table option -tm generate a maximumly compressed table -te generate an equivalence class table -tf generate a full table -w do not generate warning messagesNote that the command line options can be overridden by the
%option
directive in the input file.
Example:
yoolex -i -tf lex.lwould generate the fastest scanner class for GCC compilers. If the regular expressions provided contain no DFA backups, there would be an even greater speed improvement. Most speed improvement tips for FLEX also works for YooLex.
Configurations are started by %option
at the beginning of a line in
section 1 of the input file.
bol
|
Force check BOL state for each pattern matched. By default, this option is false unless the user used BOL states. |
ccext = "name"
|
Specify the extension of the C++ source file. Default is ``.cc ''.
|
ccfile = "name"
|
Specify the output C++ source name including the extension. Default is class + ccext |
char
|
Do character counting. |
class = "name"
|
Specify the C++ scanner class name. Default is ``DefaultLex ''.
|
header
|
Always generate the C++ header file. By default YooLex only generates the C++ header file if it does not exist. |
hhext = "name"
|
Specify the extension of the C++ header file. Default is ``.hh ''.
|
hhfile = "name"
|
Specify the output C++ source name including the extension. Default is
class + hhext. Note, even if the header is not generated, this field is important for specifying the correct inclusion header for the source file. |
ithread
|
Use indirect threading for g++ compiler. This feature is slightly faster than the default switch/case method that is used to trigger actions on a token. |
line
|
Computes line number. This option works with yyUnput functions
and trailing contexts.
|
linemode
|
Uses line buffering instead of block buffering. This option is used for interactive scanners, otherwise do not use it since the scanning speed is very slow. |
main
|
generates a default main function. |
namespace = "name"
|
Specifies the namespace of the C++ class. |
nocase
|
Generates a case in-sensitive scanner. Currently, it is not possible to specify this option for individual regex. |
noheader
|
Does not generate the C++ header file at all. This option is intended for use with YooParse. |
nowarn
|
Does not generate warning messages. |
table = "name"
|
Specifies the DFA table formation. It can be one of the following:
|
yystype = "name"
|
Specifies the return type for YooLex::yyLex () . Default is int .
|
yytext = "name"
|
Specifies the container type for YooLex::_yyText . This option
only affects the header generated. It specify the container type for YooLex::yytext.
It can be one of the following:
fast ''.
|
An example of using some of the configurations.
%option class = "test"
The YooLex input file is divided into three sections. The first section contains
configurations and some C++ codes. The second section contains mostly regular
expressions and the corresponding actions. The third section contains any C++
codes users wish to be included near the end of the generated source. In all sections,
line comments (//
) and block comments (/* ... */
)
are allowed. %%
is used to seperate two sections.
// section 1 %% %{ // section 2 prolog %} // section 2 %{ // section 2 epilog %} %% // section 3
The purpose of section 1 is to include codes necessary for the scanner as well as defining some regular expression (RegEx) pattern names. YooLex configurations are also specified here.
To insert code at the beginning of the C++ output file, use the following format:
%{ // code here will be echoed at the beginning of the generated // C++ source file. %}
Section 1 may contain useful name representation of some regular expressions:
PatternName RegEx IDENT [A-Za-z]+ NUMBER [0-9]+
Finally Section 1 may contain the declaration of inclusive and exclusive
start conditions. The inclusive start condition is started with %s
%s CONDITION_1 CONDITION_2
The exclusive start condition is started with %x
%x CONDITION_1 CONDITION_2
The difference between two start conditions is that when no start conditions are specified, all RegEx actions are shared by inclusive conditions. For exclusive start conditions, the condition name must be explicitly specified for a given RegEx.
Section 2 contains three parts, prolog, regular expression actions and epilog.
Prolog is located at the very beginning of the section 2 and it is started
by %{
and ended by %}
:
%{ // code here will be inserted at the beginning of class::yyLexCase () function. // It is useful to has some code that does something whenever yycase () is // called. If indirect threading is used, all variables used must be declared in // this section %}
Then it comes the definition of various regular expressions and the actions triggered
whenever those regular expressions are encountered. YooLex supports nested conditions
like FLEX does. It also supports BOL (^)
, EOL ($)
, <<EOF>>
,
as well as trailing contexts in regular expressions. YooLex currrently can handle
trailing contexts that contain either fixed headers like:
abc/d*e
or fixed trails like:
abc*/de
The following types of regular expressions are not supported:
abc*/c*e abc*/d*e
Epilog is located at the very end of the section 2 and it is started by %{
and ended by %}
:
%{ // code here will be inserted at the end of class::yyLexCase () function. // It is useful to has some code that does some clean up works. The code // here is only reachable if the user code in the action does not contain // return or yyterminate (). %}
All codes in this section are echoed as is at the bottom of the generated source code. So feel free to put class functions and main here.