Skip to content

LayoutAwareDFXPParser prints BeautifulSoup4 XMLParsedAsHTMLWarning #331

Open
@rlaphoenix

Description

@rlaphoenix

Since the LayoutAwareDFXPParser uses the html.parser feature instead of xml or lxml or such, it prints the warning when it think the content is XML and not HTML.

The warning:

.venv\Lib\site-packages\bs4\builder\__init__.py:545:
XMLParsedAsHTMLWarning: It looks like you're parsing an XML document using an
HTML parser. If this really is an HTML document (maybe it's XHTML?), you can
ignore or filter this warning. If it's XML, you should know that using an XML
parser will be more reliable. To parse this document as XML, make sure you have
the lxml package installed, and pass the keyword argument `features="xml"` into
the BeautifulSoup constructor.

A possible solution would be:

import warnings
from bs4 import GuessedAtParserWarning
warnings.filterwarnings('ignore', category=GuessedAtParserWarning)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions