There is no options in rflex.
$ rflex target.l
That command generates target.rs
file in the same directory.
Syntax of rflex is very similar to flex. The first '%%' means the beginning of the rules. Second one means the end of the rules.
%%
%class Lexer
%result_type i32
abc println!("match abc rule"); return Ok(0i32);
[a-z]+ println!("'{}'", self.yytext());
return Ok(10i32); /* action can be defined in multiple lines that starts with white space */
" " /* Skip white space. This comment cannot be omitted. */
%%
The rule contains pattern
and action
in the lines.
Pattern
is a regular-expression sequences.
Action
is a Rust code block to execute when the pattern accepted.
In the above example, abc
is pattern, println!("match abc rule"); return Ok(0i32);
is action.
%class
and %result_type
is special directive to replace generated default struct name and return type.
abc
and [a-z]+
patterns can both accept abc
.
In the rflex, it takes priority that pattern appears first when ambiguous patterns defined.
Scanner code can be called some functions from action or user program.
For example, we can get length of accepted string by yylength
function.
pub fn yylex(&mut self) -> Result<i32, Error>
- Return next token in
i32
.i32
can be replaced with%result_type
directive.
- Return next token in
pub fn is_eof(&self) -> bool
- Return is the scanner reached EOF.
pub fn yybegin(&mut self, new_state: usize)
- Use
new_state
lexer state in the next scan.
- Use
pub fn yystate(&self) -> usize
- Return current lexer state.
pub fn yylength(&self) -> usize
- Return the length of accepted string.
pub fn yytext(&self) -> String
- Return the accepted string.
pub fn yytextpos(&self) -> std::ops::Range<usize>
- Return the position of accepted string.
pub fn yybytepos(&self) -> std::ops::Range<usize>
- Return the byte position of accepted string. It can be used for
str
.
- Return the byte position of accepted string. It can be used for
pub fn yycharat(&self, pos: usize) -> Option<char>
- Return the character at the relative position (0-origin) in the accepted string.
yylex
function returns Result<any_type, Error>
.
Error
enum type is defined as follows.
When reached end of file, yylex
returns Err(Error::EOF)
.
It returns Err(Error::Unmatch)
if the input wasn't accepted.
#[derive(Debug, PartialEq)]
pub enum Error {
EOF,
Unmatch,
}
See codes of example1 and example2, too.
// Write your own Rust code here.
// This code will be inserted into the header of generated lexer file.
use std::io; // example
%%
%class Lexer
%result_type i32
abc println!("match abc rule"); return Ok(0i32);
[a-z]+ println!("'{}'", self.yytext()); return Ok(10i32);
%%
// Write your own Rust code that will be inserted into
// `Lexer` impl.
// So this code can be executed like `lexer.remain();`.
pub fn remain(&self) -> usize {
self.current.clone().count()
}
See example1 code.
%field SpaceCounter space_counter
rflex has a %field
directive to append any fields to lexer struct.
That makes Lexer struct have space_counter
field and generated impl is below:
pub fn new(input: &'a str, space_counter: SpaceCounter) -> Lexer<'a> { /* omission */ }
pub fn get_space_counter(&mut self) -> &mut SpaceCounter { &mut self.space_counter }
Then we can specify in new
and access SpaceCounter struct via get_space_counter
.
%field
directive can be specified multiple times not only one.