Skip to content

[Bug] Conversion of HTML manual pages to markdown fails for HTML figure code #4864

@neteler

Description

@neteler

Describe the bug

I am working on the mass conversion of all HTML manual pages to markdown. To convert all HTML files to markdown I have written a pandoc based converter script (see #4620) which already does most of the job.

A showstopper in the conversion of HTML manual pages to markdown are the figures as the related HTML snippets vary from manual page to manual page, nonetheless there is a style recommendation.

For an easier discussion, I have moved the figure issue here to separate it out from #4748.

Many figures looks ugly after MD conversion (resulting MD code is paertially garbage):

  • v.fill.holes.html figures
  • v.to.rast3.html figure
  • ... many more
  • often the figure caption are not properly detected: mkdocs/site/raster3dintro.html

I have written a LUA filter for pandoc (yet unsubmitted) but it can only convert that specific HTML code. With so many HTML variants I have no idea how to do that.

To reproduce

I tried to submit the converted MD files for community review but I get stuck in the pre-commit stage:

From my terminal:

markdownlint-fix.........................................................Failed
- hook id: markdownlint-fix
- exit code: 1
- files were modified by this hook
display/d.rast/d.rast.md:14:1 MD033/no-inline-html Inline HTML [Element: div]
display/d.rast/d.rast.md:16:1 MD033/no-inline-html Inline HTML [Element: img]
display/d.rast/d.rast.md:29:1 MD033/no-inline-html Inline HTML [Element: div]
display/d.rast/d.rast.md:31:1 MD033/no-inline-html Inline HTML [Element: img]
display/d.rast/d.rast.md:43:1 MD033/no-inline-html Inline HTML [Element: div]
display/d.rast/d.rast.md:45:1 MD033/no-inline-html Inline HTML [Element: img]
gui/wxpython/docs/wxGUI.toolboxes.md:180:1 MD033/no-inline-html Inline HTML [Element: img]
gui/wxpython/timeline/g.gui.timeline.md:14:1 MD033/no-inline-html Inline HTML [Element: img]
raster/r.li/r.li.cwed/r.li.cwed.md:12:6 MD033/no-inline-html Inline HTML [Element: span]
raster/r.li/r.li.cwed/r.li.cwed.md:12:26 MD033/no-inline-html Inline HTML [Element: span]
raster/r.li/r.li.cwed/r.li.cwed.md:14:6 MD033/no-inline-html Inline HTML [Element: span]
raster/r.li/r.li.cwed/r.li.cwed.md:14:26 MD033/no-inline-html Inline HTML [Element: span]
raster/r.li/r.li.cwed/r.li.cwed.md:21:1 MD033/no-inline-html Inline HTML [Element: span]
raster/r.li/r.li.mpa/r.li.mpa.md:10:6 MD033/no-inline-html Inline HTML [Element: span]
raster/r.li/r.li.mpa/r.li.mpa.md:10:26 MD033/no-inline-html Inline HTML [Element: span]
raster/r.path/r.path.md:122:1 MD033/no-inline-html Inline HTML [Element: div]
raster/r.path/r.path.md:124:2 MD033/no-inline-html Inline HTML [Element: img]
raster/r.path/r.path.md:176:1 MD033/no-inline-html Inline HTML [Element: div]
raster/r.path/r.path.md:178:2 MD033/no-inline-html Inline HTML [Element: img]
raster/r.resamp.filter/r.resamp.filter.md:98:1 MD033/no-inline-html Inline HTML [Element: div]
raster/r.resamp.filter/r.resamp.filter.md:100:1 MD033/no-inline-html Inline HTML [Element: img]
raster/r.sim/r.sim.water/r.sim.water.md:30:1 MD033/no-inline-html Inline HTML [Element: div]
raster/r.sim/r.sim.water/r.sim.water.md:32:1 MD033/no-inline-html Inline HTML [Element: img]
raster/r.sim/r.sim.water/r.sim.water.md:154:81 MD013/line-length Line length [Expected: 80; Actual: 147]
raster/r.sim/r.sim.water/r.sim.water.md:168:81 MD013/line-length Line length [Expected: 80; Actual: 95]
raster/r.sim/r.sim.water/r.sim.water.md:175:1 MD033/no-inline-html Inline HTML [Element: div]
raster/r.sim/r.sim.water/r.sim.water.md:177:1 MD033/no-inline-html Inline HTML [Element: img]
raster/r.sunmask/r.sunmask.md:89:81 MD013/line-length Line length [Expected: 80; Actual: 96]
raster/r.univar/r.univar.md:59:1 MD033/no-inline-html Inline HTML [Element: div]
raster/r.univar/r.univar.md:61:1 MD033/no-inline-html Inline HTML [Element: img]
raster/r.univar/r.univar.md:187:1 MD033/no-inline-html Inline HTML [Element: div]
...

Expected behavior

I wonder if we have to touch the ~170 HTML files manually to streamline the HTML figure code therein in order to eventually develop a single pandoc LUA filer.

Support welcome!

Metadata

Metadata

Assignees

Labels

HTMLRelated code is in HTMLbugSomething isn't workingdocsmanualDocumentation related issuesmarkdownRelated to markdown, markdown files

Type

Projects

Status

Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions