Skip to content

Latest commit

 

History

History
124 lines (94 loc) · 3.6 KB

tutorial.md

File metadata and controls

124 lines (94 loc) · 3.6 KB

Tutorial

rflex command line interface

There is no options in rflex.

$ rflex target.l

That command generates target.rs file in the same directory.

Basic

Syntax of rflex is very similar to flex. The first '%%' means the beginning of the rules. Second one means the end of the rules.

%%
%class Lexer
%result_type i32
abc      println!("match abc rule"); return Ok(0i32);
[a-z]+   println!("'{}'", self.yytext());
         return Ok(10i32); /* action can be defined in multiple lines that starts with white space */
" "      /* Skip white space. This comment cannot be omitted. */
%%

The rule contains pattern and action in the lines. Pattern is a regular-expression sequences. Action is a Rust code block to execute when the pattern accepted. In the above example, abc is pattern, println!("match abc rule"); return Ok(0i32); is action.

%class and %result_type is special directive to replace generated default struct name and return type.

Precedence

abc and [a-z]+ patterns can both accept abc. In the rflex, it takes priority that pattern appears first when ambiguous patterns defined.

rflex functions

Scanner code can be called some functions from action or user program. For example, we can get length of accepted string by yylength function.

  • pub fn yylex(&mut self) -> Result<i32, Error>
    • Return next token in i32. i32 can be replaced with %result_type directive.
  • pub fn is_eof(&self) -> bool
    • Return is the scanner reached EOF.
  • pub fn yybegin(&mut self, new_state: usize)
    • Use new_state lexer state in the next scan.
  • pub fn yystate(&self) -> usize
    • Return current lexer state.
  • pub fn yylength(&self) -> usize
    • Return the length of accepted string.
  • pub fn yytext(&self) -> String
    • Return the accepted string.
  • pub fn yytextpos(&self) -> std::ops::Range<usize>
    • Return the position of accepted string.
  • pub fn yybytepos(&self) -> std::ops::Range<usize>
    • Return the byte position of accepted string. It can be used for str.
  • pub fn yycharat(&self, pos: usize) -> Option<char>
    • Return the character at the relative position (0-origin) in the accepted string.

enum Error

yylex function returns Result<any_type, Error>. Error enum type is defined as follows. When reached end of file, yylex returns Err(Error::EOF). It returns Err(Error::Unmatch) if the input wasn't accepted.

#[derive(Debug, PartialEq)]
pub enum Error {
    EOF,
    Unmatch,
}

Embed Your own Rust code in DSL

See codes of example1 and example2, too.

// Write your own Rust code here.
// This code will be inserted into the header of generated lexer file.
use std::io; // example

%%
%class Lexer
%result_type i32
abc      println!("match abc rule"); return Ok(0i32);
[a-z]+   println!("'{}'", self.yytext()); return Ok(10i32);
%%

    // Write your own Rust code that will be inserted into
    // `Lexer` impl.
    // So this code can be executed like `lexer.remain();`.
    pub fn remain(&self) -> usize {
        self.current.clone().count()
    }

Put any fields in lexer struct

See example1 code.

%field SpaceCounter space_counter

rflex has a %field directive to append any fields to lexer struct. That makes Lexer struct have space_counter field and generated impl is below:

pub fn new(input: &'a str, space_counter: SpaceCounter) -> Lexer<'a> { /* omission */ }
pub fn get_space_counter(&mut self) -> &mut SpaceCounter { &mut self.space_counter } 

Then we can specify in new and access SpaceCounter struct via get_space_counter. %field directive can be specified multiple times not only one.