Skip to content

fix: unwrap top-level paragraph HTML in ipynb markdown import#8691

Open
api2062 wants to merge 2 commits intomarimo-team:mainfrom
api2062:api2062/8651
Open

fix: unwrap top-level paragraph HTML in ipynb markdown import#8691
api2062 wants to merge 2 commits intomarimo-team:mainfrom
api2062:api2062/8651

Conversation

@api2062
Copy link

@api2062 api2062 commented Mar 15, 2026

📝 Summary

Fixes #8651.

When converting some .ipynb notebooks, markdown cells that are stored as HTML paragraph blocks (e.g. <p>...</p>) are imported literally into mo.md(...). This leaves raw HTML tags in converted notebooks and can interfere with normal markdown/LaTeX rendering.

This PR normalizes markdown sources that are entirely top-level paragraph blocks before conversion.

🔍 Description of Changes

  • added _normalize_paragraph_html() in marimo/_convert/common/format.py
  • applied normalization in markdown_to_marimo() before markdown code generation
  • normalization is intentionally scoped:
    • applies only when the markdown source is composed entirely of top-level <p>...</p> blocks
    • converts those blocks into plain markdown paragraphs separated by blank lines
    • leaves mixed/other HTML (e.g. <div><p>...</p></div>) unchanged
  • added regression tests:
    • tests/_convert/common/test_convert_format.py
      • test_markdown_to_marimo_unwraps_top_level_paragraph_html
      • test_markdown_to_marimo_keeps_non_paragraph_html
    • tests/_convert/ipynb/test_ipynb_to_ir.py
      • test_convert_ipynb_markdown_unwraps_top_level_paragraph_html

✅ Validation

  • reproduced using the exact notebook from the issue (SM_sphere_S2.ipynb)
  • before fix: converted output contained raw <p> / </p> tags
  • after fix: converted output no longer contains those tags
  • ran targeted tests for new behavior: passed

📋 Checklist

  • I have read the contributor guidelines.
  • For large changes, or changes that affect the public API: this change was discussed or approved through an issue, on Discord, or the community discussions (Please provide a link if applicable).
  • Tests have been added for the changes made.
  • Documentation has been updated where applicable, including docstrings for API changes.
  • Pull request title is a good summary of the changes - it will be used in the release notes.

@vercel
Copy link

vercel bot commented Mar 15, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
marimo-docs Ready Ready Preview, Comment Mar 15, 2026 10:29pm

Request Review

@github-actions
Copy link

github-actions bot commented Mar 15, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@api2062
Copy link
Author

api2062 commented Mar 15, 2026

I have read the CLA Document and I hereby sign the CLA

@api2062
Copy link
Author

api2062 commented Mar 15, 2026

recheck

@api2062
Copy link
Author

api2062 commented Mar 15, 2026

recheck

@api2062 api2062 marked this pull request as ready for review March 15, 2026 22:33
@api2062 api2062 requested a review from dmadisetti as a code owner March 15, 2026 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Jupyter notebook importer leaves stray <p> </p> HTML tags

1 participant