Open
Description
Question
- When using the following code to parse an Excel file:
converter = DocumentConverter(
allowed_formats=[InputFormat.XLSX],
format_options={InputFormat.XLSX: ExcelFormatOption(pipeline_cls=SimplePipeline)},
)
If the sheet contains merged cells in the header, the parser recognizes it as multiple tables. Is there any way to make it recognize only one Markdown table per sheet? - By the way, is there a parameter to specify a particular sheet to parse? Not for all Excel sheets.
- The output JSON from the
export_to_dict
function is a bit too complex, could provide a simple explanation of what each variable represents? My goal is to extract tables from Excel in a row-based JSON format, but currently, I can only achieve this by exporting to Markdown and then splitting the content by blank lines :(