-
Notifications
You must be signed in to change notification settings - Fork 8
Using specialised class directly
To migitate type error, either:
- Quote all subscripted annotations as string
- Use
from __future__ import annotations
Starting from Python 3.7, classes support subscript notation
(PEP 560 — Core support for typing module and generic types) to denote, e.g.,
a collection of some other types. Python native data types
(list, dict etc) supported this since 3.9, such as list[int],
dict[str, str] etc.
It is used extensively in types-lxml for 'specialising' classes
in lxml. Sometimes, classes behave differently depending on
initialisation arguments. However, when writing functions or methods
that make use of such classes as argument, it is unavoidable to
encounter errors like the following, if not taking precaution:
TypeError: 'type' object is not subscriptable
For example, Element Tree can contain different kinds of elements:
- Normal XML elements (
lxml.etree._Elementand friends) - HTML elements (
lxml.html.HtmlElementand friends) - Objectified Element (
lxml.objectify.ObjectifiedElement)
This difference depends on input parser argument of certain
lxml functions that produce Element Tree, such as:
-
lxml.etree.ElementTree()factory function -
lxml.etree.parse()module function
And in turn, this affects the type of .parser property and type of
root element inside the tree.
However, lxml runtime will probably never support subscripted usage due to its nature; lxml is implemented in Cython, and maintains compatibility with very ancient Python versions. Such situation will lead to conflict when aforementioned classes are used and annotated directly in, say, function arguments, as illustrated in following example:
from lxml.etree import _Element, _ElementTree, XMLParser
def get_parser(tree: _ElementTree[_Element]) -> XMLParser[_Element]:
...Usage of above code would lead to TypeError message mentioned before,
because runtime lxml classes actually don't support subscripts
during runtime.
Similar problem has already been asked and answered on StackOverflow, but that deals with native data types. Our sitatuion is a little bit different (and simpler).
Modifying above code example:
from lxml.etree import _Element, _ElementTree, XMLParser
def get_parser(tree: "_ElementTree[_Element]") -> "XMLParser[_Element]":
...This allows Python interpreter to skip evaluating the annotation, but static type checkers can still understand.
This is established in (PEP 563 — Postponed Evaluation of Annotations). It is effectively the same as automatically applying method (1) to all annotations in the same file.
Here are the classes making use of subscripts in annotation:
| Class | Description |
|---|---|
lxml.etree._ElementTree |
As described above. Its subscript denotes the type of element contained in ElementTree — more specifically, the root node of ElementTree. |
lxml.etree.XMLParser and HTMLParser
|
Document / content parsers. They are the main factor deciding which kind of elements are produced. More on it in next document section. |
lxml.builder.ElementMaker |
Element factory function. Similar to parsers, the subscript denotes what kind of elements would be produced. |
lxml.sax.ElementTreeContentHandler |
lxml adapter for official python sax event handler. Its subscript denotes the kind of Element Tree that would be produced in .etree property. |
lxml.etree._IDDict |
Dictionary-like class that contains mapping of XML:ID attribute name to corresponding element. The subscript indicates type of element. |
lxml.etree._ElementUnicodeResult |
One of the possible output (string) when evaluating XPath expressions, as described in official document. This string subclass contains .getparent() method, allowing to access the original element that produced the string. Its subscript represents type of original element. See Smart string usage for more info. |
lxml.etree.ParserTarget |
Its subscript reflects what kind of value would be returned in the parser target's .close() method. See Custom target parser page for more info. |
Above table mentions lxml.etree.XMLParser and lxml.etree.HTMLParser
do use subscript to denote type of element it is supposed to produce.
But that doesn't necessarily apply to all subclasses. Parsers in lxml.html
submodule (html.HTMLParser and html.XHTMLParser) have no subscripts.
html submodule parsers are designed to always produce lxml.html.HtmlElement
and friends. This production can be changed with .set_element_class_lookup()
method; but such change degenerates the parser into common XML parser, and
usage of html submodule parsers becomes moot.
As mentioned before, parser.set_element_class_lookup() method allows producing
different kind of element. This is actually done in, say, ObjectifiedElement
parser. But due to limitation of python typing feature, annotation can't be
changed automatically to reflect such situation. It has to be manually modified:
from typing import TYPE_CHECKING, cast
from lxml.etree import XMLParser
from lxml.objectify import ObjectifiedElement, ObjectifyElementClassLookup
p = XMLParser() # type is XMLParser[_Element]
if TYPE_CHECKING:
p = cast('XMLParser[ObjectifiedElement]', p)
else:
p.set_element_class_lookup(ObjectifyElementClassLookup())Here is another example demonstrating how to create
lxml.html.HTMLParser manually from lxml.etree.HTMLParser.
html.HTMLParser is really created this way, we only show
how type annotation fits in this scenario:
from typing import TYPE_CHECKING, cast
from lxml.etree import HTMLParser, fromstring
from lxml.html import HtmlElement, HtmlElementClassLookup
parser = HTMLParser()
reveal_type(parser) # HTMLParser[_Element]
result = fromstring(data, parser=parser)
reveal_type(result) # _Element
if TYPE_CHECKING:
# check specialization as understood by type checker
parser = cast('HTMLParser[HtmlElement]', parser)
else:
parser.set_element_class_lookup(HtmlElementClassLookup())
result = fromstring(data, parser=parser)
reveal_type(result) # now becomes HtmlElement