Skip to content

Two backslashes gets converted to 3 backslashes #404

Open
@tomgoddard

Description

@tomgoddard

In the current PyPi html2text converting a single backslash in html produces a single backslash in plain text. That seems right. But converting two backslashes in html produces 3 backslashes in plain text. It seems like two backslashes in html should produce two in plain text. The where I am seeing this is in html that shows two backslashes in Windows some file paths to indicate the backslash is escaped. When we convert in our ChimeraX application to plain text for bug reporting it then appears as 3 backslashes in the file names (https://www.rbvi.ucsf.edu/trac/ChimeraX/ticket/10252).

Note that in the python strings in the test script below the appearance of two backslashes in a Python string means just one backslash since "\" is an escape indicating a single character string containing one backslash.

  • Version by html2text --version
    2020.1.16

  • Test script

import html2text
h = html2text.HTML2Text()
h.handle('<p>\\</p>')
    '\\\n\n'   # Seems right
h.handle('<p>\\\\</p>')
    '\n\n\\\\\\\n\n'  # Seems wrong, 3 backslashes in the output.
html2text.__version__
    (2020, 1, 16)
  • Python version python --version
    Python 3.10.9

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions