Using specialised class directly

TL;DR

To migitate type error, either:

Quote all subscripted annotations as string
Use from __future__ import annotations

Description of problem, part 1

Starting from Python 3.7, classes support subscript notation (PEP 560 — Core support for typing module and generic types) to denote, e.g., a collection of some other types. Python native data types (list, dict etc) supported this since 3.9, such as list[int], dict[str, str] etc.

It is used extensively in types-lxml for 'specialising' classes in lxml. Sometimes, classes behave differently depending on initialisation arguments. However, when writing functions or methods that make use of such classes as argument, it is unavoidable to encounter errors like the following, if not taking precaution:

TypeError: 'type' object is not subscriptable

Example

For example, Element Tree can contain different kinds of elements:

Normal XML elements (lxml.etree._Element and friends)
HTML elements (lxml.html.HtmlElement and friends)
Objectified Element (lxml.objectify.ObjectifiedElement)

This difference depends on input parser argument of certain lxml functions that produce Element Tree, such as:

lxml.etree.ElementTree() factory function
lxml.etree.parse() module function

And in turn, this affects the type of .parser property and type of root element inside the tree.

Description of problem, part 2

However, lxml runtime will probably never support subscripted usage due to its nature; lxml is implemented in Cython, and maintains compatibility with very ancient Python versions. Such situation will lead to conflict when aforementioned classes are used and annotated directly in, say, function arguments, as illustrated in following example:

from lxml.etree import _Element, _ElementTree, XMLParser

def get_parser(tree: _ElementTree[_Element]) -> XMLParser[_Element]:
    ...

Usage of above code would lead to TypeError message mentioned before, because runtime lxml classes actually don't support subscripts during runtime.

The fix

Similar problem has already been asked and answered on StackOverflow, but that deals with native data types. Our sitatuion is a little bit different (and simpler).

1. Quote subscripted annotation as string

Modifying above code example:

from lxml.etree import _Element, _ElementTree, XMLParser

def get_parser(tree: "_ElementTree[_Element]") -> "XMLParser[_Element]":
    ...

This allows Python interpreter to skip evaluating the annotation, but static type checkers can still understand.

2. Use `from future import annotations`

This is established in (PEP 563 — Postponed Evaluation of Annotations). It is effectively the same as automatically applying method (1) to all annotations in the same file.

Scope of subscript usage

Here are the classes making use of subscripts in annotation:

Class	Description
`lxml.etree._ElementTree`	As described above. Its subscript denotes the type of element contained in ElementTree — more specifically, the root node of ElementTree.
`lxml.etree.XMLParser` and `HTMLParser`	Document / content parsers. They are the main factor deciding which kind of elements are produced. More on it in next document section.
`lxml.builder.ElementMaker`	Element factory function. Similar to parsers, the subscript denotes what kind of elements would be produced.
`lxml.sax.ElementTreeContentHandler`	lxml adapter for official python sax event handler. Its subscript denotes the kind of Element Tree that would be produced in `.etree` property.
`lxml.etree._IDDict`	Dictionary-like class that contains mapping of `XML:ID` attribute name to corresponding element. The subscript indicates type of element.
`lxml.etree._ElementUnicodeResult`	One of the possible output (string) when evaluating XPath expressions, as described in official document. This string subclass contains `.getparent()` method, allowing to access the original element that produced the string. Its subscript represents type of original element. See Smart string usage for more info.
`lxml.etree.ParserTarget`	Its subscript reflects what kind of value would be returned in the parser target's `.close()` method. See Custom target parser page for more info.

Caveat

Not all parsers use subscripts

Above table mentions lxml.etree.XMLParser and lxml.etree.HTMLParser do use subscript to denote type of element it is supposed to produce. But that doesn't necessarily apply to all subclasses. Parsers in lxml.html submodule (html.HTMLParser and html.XHTMLParser) have no subscripts.

html submodule parsers are designed to always produce lxml.html.HtmlElement and friends. This production can be changed with .set_element_class_lookup() method; but such change degenerates the parser into common XML parser, and usage of html submodule parsers becomes moot.

No automatic change of subscript

As mentioned before, parser.set_element_class_lookup() method allows producing different kind of element. This is actually done in, say, ObjectifiedElement parser. But due to limitation of python typing feature, annotation can't be changed automatically to reflect such situation. It has to be manually modified:

from typing import TYPE_CHECKING, cast
from lxml.etree import XMLParser
from lxml.objectify import ObjectifiedElement, ObjectifyElementClassLookup

p = XMLParser()  # type is XMLParser[_Element]
if TYPE_CHECKING:
    p = cast('XMLParser[ObjectifiedElement]', p)
else:
    p.set_element_class_lookup(ObjectifyElementClassLookup())

Here is another example demonstrating how to create lxml.html.HTMLParser manually from lxml.etree.HTMLParser. html.HTMLParser is really created this way, we only show how type annotation fits in this scenario:

from typing import TYPE_CHECKING, cast
from lxml.etree import HTMLParser, fromstring
from lxml.html import HtmlElement, HtmlElementClassLookup

parser = HTMLParser()
reveal_type(parser)  # HTMLParser[_Element]
result = fromstring(data, parser=parser)
reveal_type(result)  # _Element

if TYPE_CHECKING:
    # check specialization as understood by type checker
    parser = cast('HTMLParser[HtmlElement]', parser)
else:
    parser.set_element_class_lookup(HtmlElementClassLookup())
        
result = fromstring(data, parser=parser)
reveal_type(result)  # now becomes HtmlElement

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using specialised class directly

TL;DR

Description of problem, part 1

Example

Description of problem, part 2

The fix

1. Quote subscripted annotation as string

2. Use `from future import annotations`

Scope of subscript usage

Caveat

Not all parsers use subscripts

No automatic change of subscript

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Using specialised class directly

TL;DR

Description of problem, part 1

Example

Description of problem, part 2

The fix

1. Quote subscripted annotation as string

2. Use from __future__ import annotations

Scope of subscript usage

Caveat

Not all parsers use subscripts

No automatic change of subscript

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

2. Use `from future import annotations`