feat: Add support for 🐍 Python objects to be updated in README

kvankova · web-flow · commit 9b7eace9bcc5 · 2024-11-03T22:17:04.000+01:00
diff --git a/README.md b/README.md
@@ -5,19 +5,20 @@
 ## **Code Embedder**
 Seamlessly update code snippets in your **README** files! 🔄📝🚀
 
-[Description](#-description) • [How it works](#-how-it-works) • [Examples](#-examples) • [Setup](#-setup) • [Under the hood](#-under-the-hood)
+[Description](#-description) • [How it works](#-how-it-works) • [Setup](#-setup) • [Examples](#-examples) • [Under the hood](#-under-the-hood)
 </div>
 
 
 ## 📚 Description
 
 **Code Embedder** is a GitHub Action that automatically updates code snippets in your markdown (`README`) files. It finds code blocks in your `README` that reference specific scripts, then replaces these blocks with the current content of those scripts. This keeps your documentation in sync with your code.
 
-✨ **Key features**
+### ✨ Key features
 - 🔄 **Automatic synchronization**: Keep your `README` code examples up-to-date without manual intervention.
-- 📝 **Section support**: Update specific sections of the script in the `README`.
 - 🛠️ **Easy setup**: Simply add the action to your GitHub workflow and format your `README` code blocks.
-- 🌐 **Language agnostic**: Works with any programming language or file type.
+- 📝 **Section support**: Update only specific sections of the script in the `README`.
+- 🧩 **Object support**: Update only specific objects (functions, classes) in the `README`. *The latest version supports only 🐍 Python objects (other languages to be added soon).*
+
 
 By using **Code Embedder**, you can focus on writing and updating your actual code 💻, while letting the action take care of keeping your documentation current 📚🔄. This reduces the risk of outdated or incorrect code examples in your project documentation.
 
@@ -43,9 +44,46 @@ You must also add the following comment tags in the script file `path/to/script`
 ...
 [Comment sign] code_embedder:section_name end
 ```
-The comment sign is the one that is used in the script file, e.g. `#` for Python, or `//` for JavaScript. The `section_name` must be unique in the file, otherwise the action will not be able to identify the section.
+The comment sign is the one that is used in the script file, e.g. `#` for Python, or `//` for JavaScript. The `section_name` must be unique in the file, otherwise the action will use the first section found.
+
+### 🧩 **Object** updates
+In the `README` (or other markdown) file, the object of the script is marked with the following tag:
+````md
+ ```language:path/to/script:object_name
+ ```
+````
 
+> [!Note]
+> The object name must match exactly the name of the object (function, class) in the script file. Currently, only 🐍 Python objects are supported.
 
+> [!Note]
+> If there is a section with the same name as any object, the object definition will be used in the `README` instead of the section. To avoid this, **use unique names for sections and objects!**
+
+## 🔧 Setup
+To use this action, you need to configure a yaml workflow file in `.github/workflows` folder (e.g. `.github/workflows/code-embedder.yaml`) with the following content:
+
+```yaml
+name: Code Embedder
+
+on: pull_request
+
+permissions:
+  contents: write
+
+jobs:
+  code_embedder:
+    name: "Code embedder"
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v3
+
+      - name: Run code embedder
+        uses: kvankova/code-embedder@v0.4.0
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+
+```
 
 ## 💡 Examples
 
@@ -112,34 +150,70 @@ print("Embedding successful")
 
 With any changes to the section `A` in `main.py`, the code block section is updated in the `README` file with the next workflow run.
 
-## 🔧 Setup
-To use this action, you need to configure a yaml workflow file in `.github/workflows` folder (e.g. `.github/workflows/code-embedder.yaml`) with the following content:
+### 🧩 Object update
+The tag used for object update follows the same convention as the tag for section update, but you provide `object_name` instead of `section_name`. The object name can be a function name or a class name.
 
-```yaml
-name: Code Embedder
+> [!Note]
+> The `object_name` must match exactly the name of the object (function, class) in the script file, including the case. If you define class `Person` in the script, you must use `Person` as the object name in the `README`, not lowercase `person`.
 
-on: pull_request
+For example, let's say we have the following `README` file:
+````md
+# README
 
-permissions:
-  contents: write
+This is a readme.
 
-jobs:
-  code_embedder:
-    name: "Code embedder"
-    runs-on: ubuntu-latest
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v3
+Function `print_hello` is defined as follows:
+```python:main.py:print_hello
+```
 
-      - name: Run code embedder
-        uses: kvankova/code-embedder@v0.4.0
-        env:
-          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+Class `Person` is defined as follows:
+```python:main.py:Person
+```
+````
 
+The `main.py` file contains the following code:
+```python
+...
+def print_hello():
+    print("Hello, world!")
+...
+
+class Person:
+    def __init__(self, name):
+        self.name = name
+    def say_hello(self):
+        print(f"Hello, {self.name}!")
+...
+```
+
+Once the workflow runs, the code block section will be updated in the `README` file with the content of the function `print_hello` and class `Person` from the script located at `main.py` and pushed to the repository 🚀.
+
+````md
+# README
+
+This is a readme.
+
+Function `print_hello` is defined as follows:
+```python:main.py:print_hello
+def print_hello():
+    print("Hello, world!")
 ```
 
+Class `Person` is defined as follows:
+```python:main.py:Person
+class Person:
+    def __init__(self, name):
+        self.name = name
+    def say_hello(self):
+        print(f"Hello, {self.name}!")
+```
+````
+
+With any changes to the function `print_hello` or class `Person` in `main.py`, the code block sections are updated in the `README` file with the next workflow run.
+
+
 ## 🔬 Under the hood
 This action performs the following steps:
-1. 🔎 Scans through the markdown (`README`) files to identify referenced script files (full script or section).
+1. 🔎 Scans through the markdown (`README`) files to identify referenced script files (full script, section or 🐍 Python object).
 1. 📝 Extracts the contents from those script files and updates the corresponding code blocks in the markdown (`README`) files.
 1. 🚀 Commits and pushes the updated documentation back to the repository.
diff --git a/src/script_content_reader.py b/src/script_content_reader.py
@@ -1,3 +1,4 @@
+import ast
 import re
 from typing import Protocol
 
@@ -16,68 +17,112 @@ def __init__(self) -> None:
         self._section_end_regex = r".*code_embedder:.*end"
 
     def read(self, scripts: list[ScriptMetadata]) -> list[ScriptMetadata]:
-        script_contents = self._read_full_script(scripts)
-        return self._process_scripts(script_contents)
+        scripts_with_full_contents = self._read_full_script(scripts)
+        return self._process_scripts(scripts_with_full_contents)
 
     def _read_full_script(self, scripts: list[ScriptMetadata]) -> list[ScriptMetadata]:
-        script_contents: list[ScriptMetadata] = []
+        scripts_with_full_contents: list[ScriptMetadata] = []
 
         for script in scripts:
             try:
                 with open(script.path) as script_file:
                     script.content = script_file.read()
 
-                script_contents.append(script)
+                scripts_with_full_contents.append(script)
 
             except FileNotFoundError:
                 logger.error(f"Error: {script.path} not found. Skipping.")
 
-        return script_contents
+        return scripts_with_full_contents
 
     def _process_scripts(self, scripts: list[ScriptMetadata]) -> list[ScriptMetadata]:
         full_scripts = [script for script in scripts if not script.extraction_part]
-        scripts_with_sections = [script for script in scripts if script.extraction_part]
+        scripts_with_extraction_part = [script for script in scripts if script.extraction_part]
 
-        if scripts_with_sections:
-            scripts_with_sections = self._read_script_section(scripts_with_sections)
+        if scripts_with_extraction_part:
+            scripts_with_extraction_part = self._update_script_content_with_extraction_part(
+                scripts_with_extraction_part
+            )
 
-        return full_scripts + scripts_with_sections
+        return full_scripts + scripts_with_extraction_part
 
-    def _read_script_section(self, scripts: list[ScriptMetadata]) -> list[ScriptMetadata]:
+    def _update_script_content_with_extraction_part(
+        self, scripts: list[ScriptMetadata]
+    ) -> list[ScriptMetadata]:
         return [
             ScriptMetadata(
                 path=script.path,
                 extraction_part=script.extraction_part,
                 readme_start=script.readme_start,
                 readme_end=script.readme_end,
-                content=self._extract_section(script),
+                content=self._extract_part(script),
             )
             for script in scripts
         ]
 
-    def _extract_section(self, script: ScriptMetadata) -> str:
+    def _extract_part(self, script: ScriptMetadata) -> str:
         lines = script.content.split("\n")
-        section_bounds = self._find_section_bounds(lines)
 
-        if not section_bounds:
-            logger.error(f"Section {script.extraction_part} not found in {script.path}")
+        # Try extracting as object first, then fall back to section
+        is_object, start, end = self._find_object_bounds(script)
+        if is_object:
+            return "\n".join(lines[start:end])
+
+        # Extract section if not an object
+        start, end = self._find_section_bounds(lines)
+        if not self._validate_section_bounds(start, end, script):
             return ""
 
-        start, end = section_bounds
         return "\n".join(lines[start:end])
 
-    def _find_section_bounds(self, lines: list[str]) -> tuple[int, int] | None:
-        section_start = None
-        section_end = None
+    def _validate_section_bounds(
+        self, start: int | None, end: int | None, script: ScriptMetadata
+    ) -> bool:
+        if not start and not end:
+            logger.error(
+                f"Part {script.extraction_part} not found in {script.path}. Skipping."
+            )
+            return False
+
+        if not start:
+            logger.error(
+                f"Start of section {script.extraction_part} not found in {script.path}. "
+                "Skipping."
+            )
+            return False
+
+        if not end:
+            logger.error(
+                f"End of section {script.extraction_part} not found in {script.path}. "
+                "Skipping."
+            )
+            return False
+
+        return True
 
+    def _find_section_bounds(self, lines: list[str]) -> tuple[int | None, int | None]:
         for i, line in enumerate(lines):
             if re.search(self._section_start_regex, line):
-                section_start = i + 1
+                start = i + 1
             elif re.search(self._section_end_regex, line):
-                section_end = i
-                break
-
-        if section_start is None or section_end is None:
-            return None
-
-        return section_start, section_end
+                return start, i
+
+        return None, None
+
+    def _find_object_bounds(
+        self, script: ScriptMetadata
+    ) -> tuple[bool, int | None, int | None]:
+        tree = ast.parse(script.content)
+
+        for node in ast.walk(tree):
+            if (
+                isinstance(node, ast.FunctionDef)
+                | isinstance(node, ast.AsyncFunctionDef)
+                | isinstance(node, ast.ClassDef)
+            ):
+                if script.extraction_part == getattr(node, "name", None):
+                    start = getattr(node, "lineno", None)
+                    end = getattr(node, "end_lineno", None)
+                    return True, start - 1 if start else None, end
+
+        return False, None, None
diff --git a/tests/data/example_python_objects.py b/tests/data/example_python_objects.py
@@ -0,0 +1,16 @@
+import re
+
+
+# Function verifying an email is valid
+def verify_email(email: str) -> bool:
+    return re.match(r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$", email) is not None
+
+
+class Person:
+    def __init__(self, name: str, age: int):
+        self.name = name
+        self.age = age
+
+    # String representation of the class
+    def __str__(self) -> str:
+        return f"Person(name={self.name}, age={self.age})"
diff --git a/tests/data/expected_readme3.md b/tests/data/expected_readme3.md
@@ -0,0 +1,23 @@
+# README 3
+
+This is a test README file for testing the code embedding process.
+
+## Python objects
+
+This section contains examples of Python objects.
+
+```python:tests/data/example_python_objects.py:verify_email
+def verify_email(email: str) -> bool:
+    return re.match(r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$", email) is not None
+```
+
+```python:tests/data/example_python_objects.py:Person
+class Person:
+    def __init__(self, name: str, age: int):
+        self.name = name
+        self.age = age
+
+    # String representation of the class
+    def __str__(self) -> str:
+        return f"Person(name={self.name}, age={self.age})"
+```
diff --git a/tests/data/readme3.md b/tests/data/readme3.md
@@ -0,0 +1,13 @@
+# README 3
+
+This is a test README file for testing the code embedding process.
+
+## Python objects
+
+This section contains examples of Python objects.
+
+```python:tests/data/example_python_objects.py:verify_email
+```
+
+```python:tests/data/example_python_objects.py:Person
+```
diff --git a/tests/test_code_embedding.py b/tests/test_code_embedding.py
@@ -8,11 +8,13 @@ def test_code_embedder(tmp_path) -> None:
         "tests/data/readme0.md",
         "tests/data/readme1.md",
         "tests/data/readme2.md",
+        "tests/data/readme3.md",
     ]
     expected_paths = [
         "tests/data/expected_readme0.md",
         "tests/data/expected_readme1.md",
         "tests/data/expected_readme2.md",
+        "tests/data/expected_readme3.md",
     ]
 
     # Create a temporary copy of the original file
@@ -36,4 +38,4 @@ def test_code_embedder(tmp_path) -> None:
         with open(temp_readme_path) as updated_file:
             updated_readme_content = updated_file.readlines()
 
-        assert expected_readme_content == updated_readme_content
+        assert updated_readme_content == expected_readme_content
diff --git a/tests/test_script_content_reader.py b/tests/test_script_content_reader.py