BNFC · AiStudent · Aug 27, 2024 · Aug 27, 2024 · Aug 27, 2024 · Aug 27, 2024
diff --git a/docs/user_guide.rst b/docs/user_guide.rst
@@ -284,3 +284,81 @@ BNFC adds the grammar name as a file extension. So if the grammar file is
 named ``Calc.cf``, the lexer will be associated to the file extension
 ``.calc``. To associate other file extensions to a generated lexer, you need to
 modify (or subclass) the lexer.
+
+Python Backend
+===============
+
+The BNF Converter's Python Backend generates a Python frontend, that uses 
+Lark, to parse input into an AST (abstract syntax tree).
+
+Lark and Python 3.10 or higher is needed.
+
+Example usage: ::
+
+    bnfc --python Calc.cf
+
+
+.. list-table:: The result is a set of files:
+   :widths: 25 25
+   :header-rows: 1
+
+   * - Filename
+     - Description
+   * - bnfcPyGenCalc/Absyn.py
+     - Provides the classes for the abstract syntax.
+   * - bnfcPyGenCalc/ParserDefs.py
+     - Provides Lark with the information needed to build the lexer and parser.
+   * - bnfcPyGenCalc/PrettyPrinter.py
+     - Provides printing for both the AST and the linearized tree.
+   * - genTest.py
+     - A ready test-file, that uses the generated frontend to convert input into an AST.
+   * - skele.py
+     - Provides skeleton code to deconstruct an AST, using structural pattern matching.
+
+Optionally one may with ``-m``` also create a makefile that contains the target
+"distclean" to remove the generated files.
+
+Testing the frontend
+....................
+
+It's possible to pipe input, like::
+
+    echo "(1 + 2) * 3" | python3 genTest.py
+
+or::
+
+    python3 genTest.py < file.txt
+
+and it's possible to just use an argument::
+
+    python3 genTest.py file.txt
+
+
+Caveats
+.......
+
+Several entrypoints:
+  The testfile genTest.py only uses the first entrypoint used by default. To
+  use all entrypoints, set the start parameter to "start_". If the 
+  entrypoints cause reduce/reduce conflicts, a lark GrammarError will be
+  produced.
+
+Results from the parameterized tests:
+  While the Python backend generates working frontends for the example
+  grammars, five "failures" and six "errors" among the regression
+  tests are reported.
+
+Skeleton code for using lists as entrypoints:
+  Matchers for using lists, such as [Exp], are not generated in the
+  skeleton code as it may confuse users if the grammar uses several different 
+  list categories, as a user may then try to pattern match lists without 
+  checking what type the elements have. Users are instead encouraged to use
+  non-list entrypoints. 
+
+Using multiple separators
+  Using multiple separators for the same category, such as below, generates
+  Python functions with overlapping names, causing runtime errors.::
+
+    separator Exp1 "," ;
+    separator Exp1 ";" ;
+
diff --git a/document/BNF_Converter_Python_Mode.html b/document/BNF_Converter_Python_Mode.html
@@ -0,0 +1,198 @@
+<!DOCTYPE html>
+<head>
+  <meta http-equiv="content-type"
+ content="text/html; charset=ISO-8859-1">
+  <title>BNF Converter Python Mode</title>
+</head>
+<style>
+  table {
+    font-family: arial, sans-serif;
+    border-collapse: collapse;
+    width: 100%;
+  }
+
+  td, th {
+    text-align: left;
+    padding: 4px;
+  }
+
+  </style>
+<body>
+<div style="text-align: center;">
+<h2>BNF Converter</h2>
+<h2>Python Mode</h2>
+</div> 
+<h3>By Björn Werner</h3>
+
+<h3>2024</h3>
+<p>
+  The BNF Converter's Python Backend generates a Python frontend, that uses 
+  Lark, to parse input into an AST (abstract syntax tree).
+</p>
+<p>
+  BNFC on Github:<br>
+  <a href="https://github.com/BNFC/bnfc">https://github.com/BNFC/bnfc</a>
+</p>
+<p>
+  Lark github:<br>
+  <a href="https://github.com/lark-parser/lark">https://github.com/lark-parser/lark</a>
+</p>
+<p>
+  Python 3.10 or higher is needed.
+</p>
+<h3>Usage</h3>
+<div style="margin-left: 40px; "><big><span style="font-family: monospace; ">
+    bnfc --python NAME.cf</span></big><br style="font-family: monospace; ">
+</div>
+<p>
+The result is a set of files:
+</p>
+<table style="padding: 1cm;">
+  <tr>
+    <th>Filename:</th><th>Description:</th>
+  </tr>
+  <tr>
+    <td>bnfcGenNAME/Absyn.py</td><td>Provides the classes for the abstract syntax.</td>
+  </tr>
+  <tr>
+    <td>bnfcGenNAME/ParserDefs.py</td><td>Provides Lark with the information needed to build the lexer and parser.</td>
+  </tr>
+  <tr>
+    <td>bnfcGenNAME/PrettyPrinter.py</td><td>Provides printing for both the AST and the linearized tree.</td>
+  </tr>
+  <tr>
+    <td>genTest.py</td><td>A ready test-file, that uses the generated frontend to convert input into an AST.</td>
+  </tr>
+  <tr>
+    <td>skele.py</td><td>Provides skeleton code to deconstruct an AST, using structural pattern matching.</td>
+  </tr>
+</table>
+
+<h3>Testing the frontend</h3>
+<p>
+  The following example uses a frontend that is generated from a C-like grammar.
+</p>
+<p style="font-family: monospace;">
+  $ python3 genTest.py < hello.c
+</p>
+<p style="font-family: monospace;">
+  Parse Successful!<br>
+  <br>
+  [Abstract Syntax]<br>
+  (PDefs [(DFun Type_int "main" [] [(SExp (EApp "printString" [(EString "Hello world")])), (SReturn (EInt 0))])])<br>
+  <br>
+  [Linearized Tree]<br>
+  int main ()<br>
+  {<br>
+    &nbsp;printString ("Hello world");<br>
+    &nbsp;return 0;<br>
+  }<br>
+</p>
+<h3>The Abstract Syntax Tree</h3>
+<p>
+  The AST is built up using instances of Python classes, using the dataclass decorator, such as:
+</p>
+<p style="font-family: monospace;">
+@dataclass<br>
+class EAdd:<br>
+&nbsp;exp_1: Exp<br>
+&nbsp;exp_2: Exp<br>
+&nbsp;_ann_type: _AnnType = field(default_factory=_AnnType)
+</p>
+<p>
+  The "_ann_type" variable is a placeholder that can be used to store useful information,
+  for example type-information in order to create a type-annotated AST.
+</p>
+<h3>Using the skeleton file</h3>
+<p>
+  The skeleton file serves as a template, to create an interpreter for example.
+  Two different types of matchers are generated: the first with all the value
+  categories together, and a second type where each matcher only has one
+  individual value category, as in the example below:
+</p>
+<p style="font-family: monospace;">
+def matcherExp(exp_: Exp):<br>
+&nbsp;match exp_:<br>
+&nbsp;&nbsp;case EAdd(exp_1, exp_2, _ann_type):<br>
+&nbsp;&nbsp;&nbsp;# Exp "+" Exp1<br>
+&nbsp;&nbsp;&nbsp;raise Exception('EAdd not implemented')<br>
+&nbsp;&nbsp;case ESub(exp_1, exp_2, _ann_type):<br>
+&nbsp;&nbsp;&nbsp;...
+</p>
+<p>
+  This can be modified, in order to return the addition of each evaluated argument
+  category, into:
+</p>
+<p style="font-family: monospace;">
+  def matcherExp(exp_: Exp):<br>
+  &nbsp;match exp_:<br>
+  &nbsp;&nbsp;case EAdd(exp_1, exp_2, _ann_type):<br>
+  &nbsp;&nbsp;&nbsp;# Exp "+" Exp1<br>
+  &nbsp;&nbsp;&nbsp;return matcherExp(exp_1) + matcherExp(exp_2)<br>
+  &nbsp;&nbsp;case ESub(exp_1, exp_2, _ann_type):<br>
+  &nbsp;&nbsp;&nbsp;...
+</p>
+<p>
+  The function can now be imported and used in the generated test file 
+  (similarly to how the pretty printer is imported and used):
+</p>
+<p style="font-family: monospace;">
+  from skele import matcherExp<br>
+  ...<br>
+  print(matcherExp(ast))
+</p>
+
+<h3>Known issues</h3>
+<h4>
+  Skeleton code for using lists as entrypoints:
+</h4>
+<p>
+  Matchers for using lists, such as [Exp], are not generated in the
+  skeleton code as it may confuse users if the grammar uses several different 
+  list categories, as a user may then try to pattern match lists without 
+  checking what type the elements have. Users are instead encouraged to use
+  non-list entrypoints. 
+</p>
+<p>
+  The improper way to iterate over lists, as the value category is unknown:
+</p>
+<p style="font-family: monospace;">
+  &nbsp;case list():<br>
+  &nbsp;&nbsp;for ele in ast:<br>
+  &nbsp;&nbsp;&nbsp;...
+</p>
+<p>
+  The proper way to deconstruct lists, where we know the value category:
+</p>
+<p style="font-family: monospace;">
+  &nbsp;case RuleName(listexp_):<br>
+  &nbsp;&nbsp;for exp in listexp_:<br>
+  &nbsp;&nbsp;&nbsp;...
+</p>
+<h4>
+  Using multiple separators
+</h4>
+<p>
+  Using multiple separators for the same category, such as below, generates
+  Python functions with overlapping names, causing runtime errors.
+</p>
+<p style="font-family: monospace;">
+  separator Exp1 "," ;<br>
+  separator Exp1 ";" ;
+</p>
+<h4>Several entrypoints:</h4>
+<p>
+  The testfile genTest.py only uses the first entrypoint used by default. To
+  use all entrypoints, set the start parameter to "start_". If the 
+  entrypoints cause reduce/reduce conflicts, a lark GrammarError will be
+  produced.
+</p>
+<h4>
+Results from the parameterized tests:
+</h4>
+<p>
+  While the Python backend generates working frontends for the example
+  grammars, five "failures" and six "errors" among the regression
+  tests are reported.
+</p>
+
diff --git a/source/BNFC.cabal b/source/BNFC.cabal
@@ -280,6 +280,14 @@ library
     BNFC.Backend.TreeSitter.CFtoTreeSitter
     BNFC.Backend.TreeSitter.RegToJSReg
 
+    -- Python backend
+    BNFC.Backend.Python
+    BNFC.Backend.Python.CFtoPyAbs
+    BNFC.Backend.Python.CFtoPyPrettyPrinter
+    BNFC.Backend.Python.RegToFlex
+    BNFC.Backend.Python.PyHelpers
+    BNFC.Backend.Python.CFtoPySkele
+
 ----- Testing --------------------------------------------------------------
 
 test-suite unit-tests

diff --git a/source/main/Main.hs b/source/main/Main.hs
@@ -26,6 +26,7 @@ import BNFC.Backend.Latex
 import BNFC.Backend.OCaml
 import BNFC.Backend.Pygments
 import BNFC.Backend.TreeSitter
+import BNFC.Backend.Python
 import BNFC.CF (CF)
 import BNFC.GetCF
 import BNFC.Options hiding (make, Backend)
@@ -83,3 +84,5 @@ maketarget = \case
     TargetPygments     -> makePygments
     TargetCheck        -> error "impossible"
     TargetTreeSitter   -> makeTreeSitter
+    TargetPython       -> makePython
+