Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python backend #485

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions docs/user_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -284,3 +284,81 @@ BNFC adds the grammar name as a file extension. So if the grammar file is
named ``Calc.cf``, the lexer will be associated to the file extension
``.calc``. To associate other file extensions to a generated lexer, you need to
modify (or subclass) the lexer.

Python Backend
===============

The BNF Converter's Python Backend generates a Python frontend, that uses
Lark, to parse input into an AST (abstract syntax tree).

Lark and Python 3.10 or higher is needed.

Example usage: ::

bnfc --python Calc.cf


.. list-table:: The result is a set of files:
:widths: 25 25
:header-rows: 1

* - Filename
- Description
* - bnfcPyGenCalc/Absyn.py
- Provides the classes for the abstract syntax.
* - bnfcPyGenCalc/ParserDefs.py
- Provides Lark with the information needed to build the lexer and parser.
* - bnfcPyGenCalc/PrettyPrinter.py
- Provides printing for both the AST and the linearized tree.
* - genTest.py
- A ready test-file, that uses the generated frontend to convert input into an AST.
* - skele.py
- Provides skeleton code to deconstruct an AST, using structural pattern matching.

Optionally one may with ``-m``` also create a makefile that contains the target
"distclean" to remove the generated files.

Testing the frontend
....................

It's possible to pipe input, like::

echo "(1 + 2) * 3" | python3 genTest.py

or::

python3 genTest.py < file.txt

and it's possible to just use an argument::

python3 genTest.py file.txt


Caveats
.......

Several entrypoints:
The testfile genTest.py only uses the first entrypoint used by default. To
use all entrypoints, set the start parameter to "start_". If the
entrypoints cause reduce/reduce conflicts, a lark GrammarError will be
produced.

Results from the parameterized tests:
While the Python backend generates working frontends for the example
grammars, five "failures" and six "errors" among the regression
tests are reported.

Skeleton code for using lists as entrypoints:
Matchers for using lists, such as [Exp], are not generated in the
skeleton code as it may confuse users if the grammar uses several different
list categories, as a user may then try to pattern match lists without
checking what type the elements have. Users are instead encouraged to use
non-list entrypoints.

Using multiple separators
Using multiple separators for the same category, such as below, generates
Python functions with overlapping names, causing runtime errors.::

separator Exp1 "," ;
separator Exp1 ";" ;

198 changes: 198 additions & 0 deletions document/BNF_Converter_Python_Mode.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
<!DOCTYPE html>
<head>
<meta http-equiv="content-type"
content="text/html; charset=ISO-8859-1">
<title>BNF Converter Python Mode</title>
</head>
<style>
table {
font-family: arial, sans-serif;
border-collapse: collapse;
width: 100%;
}

td, th {
text-align: left;
padding: 4px;
}

</style>
<body>
<div style="text-align: center;">
<h2>BNF Converter</h2>
<h2>Python Mode</h2>
</div>
<h3>By Björn Werner</h3>

<h3>2024</h3>
<p>
The BNF Converter's Python Backend generates a Python frontend, that uses
Lark, to parse input into an AST (abstract syntax tree).
</p>
<p>
BNFC on Github:<br>
<a href="https://github.com/BNFC/bnfc">https://github.com/BNFC/bnfc</a>
</p>
<p>
Lark github:<br>
<a href="https://github.com/lark-parser/lark">https://github.com/lark-parser/lark</a>
</p>
<p>
Python 3.10 or higher is needed.
</p>
<h3>Usage</h3>
<div style="margin-left: 40px; "><big><span style="font-family: monospace; ">
bnfc --python NAME.cf</span></big><br style="font-family: monospace; ">
</div>
<p>
The result is a set of files:
</p>
<table style="padding: 1cm;">
<tr>
<th>Filename:</th><th>Description:</th>
</tr>
<tr>
<td>bnfcGenNAME/Absyn.py</td><td>Provides the classes for the abstract syntax.</td>
</tr>
<tr>
<td>bnfcGenNAME/ParserDefs.py</td><td>Provides Lark with the information needed to build the lexer and parser.</td>
</tr>
<tr>
<td>bnfcGenNAME/PrettyPrinter.py</td><td>Provides printing for both the AST and the linearized tree.</td>
</tr>
<tr>
<td>genTest.py</td><td>A ready test-file, that uses the generated frontend to convert input into an AST.</td>
</tr>
<tr>
<td>skele.py</td><td>Provides skeleton code to deconstruct an AST, using structural pattern matching.</td>
</tr>
</table>

<h3>Testing the frontend</h3>
<p>
The following example uses a frontend that is generated from a C-like grammar.
</p>
<p style="font-family: monospace;">
$ python3 genTest.py < hello.c
</p>
<p style="font-family: monospace;">
Parse Successful!<br>
<br>
[Abstract Syntax]<br>
(PDefs [(DFun Type_int "main" [] [(SExp (EApp "printString" [(EString "Hello world")])), (SReturn (EInt 0))])])<br>
<br>
[Linearized Tree]<br>
int main ()<br>
{<br>
&nbsp;printString ("Hello world");<br>
&nbsp;return 0;<br>
}<br>
</p>
<h3>The Abstract Syntax Tree</h3>
<p>
The AST is built up using instances of Python classes, using the dataclass decorator, such as:
</p>
<p style="font-family: monospace;">
@dataclass<br>
class EAdd:<br>
&nbsp;exp_1: Exp<br>
&nbsp;exp_2: Exp<br>
&nbsp;_ann_type: _AnnType = field(default_factory=_AnnType)
</p>
<p>
The "_ann_type" variable is a placeholder that can be used to store useful information,
for example type-information in order to create a type-annotated AST.
</p>
<h3>Using the skeleton file</h3>
<p>
The skeleton file serves as a template, to create an interpreter for example.
Two different types of matchers are generated: the first with all the value
categories together, and a second type where each matcher only has one
individual value category, as in the example below:
</p>
<p style="font-family: monospace;">
def matcherExp(exp_: Exp):<br>
&nbsp;match exp_:<br>
&nbsp;&nbsp;case EAdd(exp_1, exp_2, _ann_type):<br>
&nbsp;&nbsp;&nbsp;# Exp "+" Exp1<br>
&nbsp;&nbsp;&nbsp;raise Exception('EAdd not implemented')<br>
&nbsp;&nbsp;case ESub(exp_1, exp_2, _ann_type):<br>
&nbsp;&nbsp;&nbsp;...
</p>
<p>
This can be modified, in order to return the addition of each evaluated argument
category, into:
</p>
<p style="font-family: monospace;">
def matcherExp(exp_: Exp):<br>
&nbsp;match exp_:<br>
&nbsp;&nbsp;case EAdd(exp_1, exp_2, _ann_type):<br>
&nbsp;&nbsp;&nbsp;# Exp "+" Exp1<br>
&nbsp;&nbsp;&nbsp;return matcherExp(exp_1) + matcherExp(exp_2)<br>
&nbsp;&nbsp;case ESub(exp_1, exp_2, _ann_type):<br>
&nbsp;&nbsp;&nbsp;...
</p>
<p>
The function can now be imported and used in the generated test file
(similarly to how the pretty printer is imported and used):
</p>
<p style="font-family: monospace;">
from skele import matcherExp<br>
...<br>
print(matcherExp(ast))
</p>

<h3>Known issues</h3>
<h4>
Skeleton code for using lists as entrypoints:
</h4>
<p>
Matchers for using lists, such as [Exp], are not generated in the
skeleton code as it may confuse users if the grammar uses several different
list categories, as a user may then try to pattern match lists without
checking what type the elements have. Users are instead encouraged to use
non-list entrypoints.
</p>
<p>
The improper way to iterate over lists, as the value category is unknown:
</p>
<p style="font-family: monospace;">
&nbsp;case list():<br>
&nbsp;&nbsp;for ele in ast:<br>
&nbsp;&nbsp;&nbsp;...
</p>
<p>
The proper way to deconstruct lists, where we know the value category:
</p>
<p style="font-family: monospace;">
&nbsp;case RuleName(listexp_):<br>
&nbsp;&nbsp;for exp in listexp_:<br>
&nbsp;&nbsp;&nbsp;...
</p>
<h4>
Using multiple separators
</h4>
<p>
Using multiple separators for the same category, such as below, generates
Python functions with overlapping names, causing runtime errors.
</p>
<p style="font-family: monospace;">
separator Exp1 "," ;<br>
separator Exp1 ";" ;
</p>
<h4>Several entrypoints:</h4>
<p>
The testfile genTest.py only uses the first entrypoint used by default. To
use all entrypoints, set the start parameter to "start_". If the
entrypoints cause reduce/reduce conflicts, a lark GrammarError will be
produced.
</p>
<h4>
Results from the parameterized tests:
</h4>
<p>
While the Python backend generates working frontends for the example
grammars, five "failures" and six "errors" among the regression
tests are reported.
</p>

8 changes: 8 additions & 0 deletions source/BNFC.cabal
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,14 @@ library
BNFC.Backend.TreeSitter.CFtoTreeSitter
BNFC.Backend.TreeSitter.RegToJSReg

-- Python backend
BNFC.Backend.Python
BNFC.Backend.Python.CFtoPyAbs
BNFC.Backend.Python.CFtoPyPrettyPrinter
BNFC.Backend.Python.RegToFlex
BNFC.Backend.Python.PyHelpers
BNFC.Backend.Python.CFtoPySkele

----- Testing --------------------------------------------------------------

test-suite unit-tests
Expand Down
3 changes: 3 additions & 0 deletions source/main/Main.hs
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ import BNFC.Backend.Latex
import BNFC.Backend.OCaml
import BNFC.Backend.Pygments
import BNFC.Backend.TreeSitter
import BNFC.Backend.Python
import BNFC.CF (CF)
import BNFC.GetCF
import BNFC.Options hiding (make, Backend)
Expand Down Expand Up @@ -83,3 +84,5 @@ maketarget = \case
TargetPygments -> makePygments
TargetCheck -> error "impossible"
TargetTreeSitter -> makeTreeSitter
TargetPython -> makePython

Loading
Loading