Open
Description
Bug Description
When converting HTML tables to Markdown, if a table cell contains multiple <p>
tags, the output merges their content without spacing, resulting in incorrect values.
Example:
This HTML cell:
<td><p>3</p><p>1</p></td>
is currently converted to:
| 31 |
instead of preserving the structure like:
| 3<br>1 |
or:
| 3
1 |
Expected Behavior
- The Markdown converter should preserve the semantic line breaks or paragraph separations within a table cell.
- Multiple
<p>
tags should not be flattened into a single string with no delimiter.
Current Behavior
- The converter merges multiple
<p>
tags into one line, resulting in values like31
instead of maintaining their structure or indicating separation. - This misrepresents the actual data from the HTML source.
Environment
- Python: 3.12
- Docling version: Latest
I will attach the HTML file that reproduces the issue.
174627142405997939927c_page_111.zip
Thank you very much in advance for your time and support.