Skip to content

Custom target parser

Abel Cheung edited this page Apr 18, 2025 · 3 revisions

Lxml parsers support accepting a custom defined target object via target parameter.

Page TBD

Boilerplate code for parser target

if TYPE_CHECKING:
    from lxml.etree import ParserTarget
else:
    from typing import GenericAlias
    class ParserTarget:
        __class_getitem__ = classmethod(GenericAlias)

Example

After including above boilerplate code, it is possible to inherit ParserTarget in your custom target. Here is a superfluous example target that calculates input content length:

class MyTarget(ParserTarget[int]):
    def __init__(self) -> None:
        self._len = 0

    def data(self, data: str) -> None:
        self._len += len(data)

    def close(self) -> int:
        return self._len

To use your target in standard XML parsing routines (with etree.XML() as example):

parser = XMLParser(target=MyTarget())
reveal_type(parser)  # reveal_type result below
Pyright Runtime
CustomParserTarget[int] XMLParser
with open('yourfile.xml', mode='rb') as f:
    result = XML(f.read(), parser)
reveal_type(result)
Pyright Runtime
int int

Alternatively here is how to use it in feed parser interface:

parser = XMLParser(target=MyTarget())
with open('yourfile.xml', mode='rb') as f:
    parser.feed(f.read())
result = parser.close()
reveal_type(result)
Pyright Runtime
int int

Target method signatures

Out of all methods, only .close() is mandatory. As an extreme case, an object with noop .close() method is a valid null parser target doing nothing. In general, mark the type of return value in both ParserTarget subscript and .close() signature. Say if the target object should return a floating point:

class MyTarget(ParserTarget[float])
    def close(self) -> float:
        ...

Here are fully annotated target methods which are documented in lxml website:

def start(self, tag: str, attrib: dict[str, str]) -> None: ...
def comment(self, text: str) -> None: ...
def data(self, data: str) -> None: ...
def end(self, tag: str) -> None: ...

Methods below are from undocumented SAX events but usable nonetheless:

def pi(self, target: str, data: str) -> None: ...
def start_ns(self, prefix: str, uri: str) -> None: ...
def end_ns(self, prefix: str) -> None: ...
def doctype(
    self,
    root_tag: str | None,
    public_id: str | None,
    system_id: str | None,
) -> None: ...

Clone this wiki locally