-
Notifications
You must be signed in to change notification settings - Fork 8
Custom target parser
Lxml parsers support accepting a custom defined target object via target parameter.
Page TBD
if TYPE_CHECKING:
from lxml.etree import ParserTarget
else:
from typing import GenericAlias
class ParserTarget:
__class_getitem__ = classmethod(GenericAlias)After including above boilerplate code, it is possible to inherit ParserTarget in your custom target. Here is a superfluous example target that calculates input content length:
class MyTarget(ParserTarget[int]):
def __init__(self) -> None:
self._len = 0
def data(self, data: str) -> None:
self._len += len(data)
def close(self) -> int:
return self._lenTo use your target in standard XML parsing routines (with etree.XML() as example):
parser = XMLParser(target=MyTarget())
reveal_type(parser) # reveal_type result belowPyright |
Runtime |
|---|---|
CustomParserTarget[int] |
XMLParser |
with open('yourfile.xml', mode='rb') as f:
result = XML(f.read(), parser)
reveal_type(result)Pyright |
Runtime |
|---|---|
int |
int |
Alternatively here is how to use it in feed parser interface:
parser = XMLParser(target=MyTarget())
with open('yourfile.xml', mode='rb') as f:
parser.feed(f.read())
result = parser.close()
reveal_type(result)Pyright |
Runtime |
|---|---|
int |
int |
Out of all methods, only .close() is mandatory. As an extreme case, an object
with noop .close() method is a valid null parser target doing nothing.
In general, mark the type of return value in both ParserTarget subscript and
.close() signature. Say if the target object should return a floating point:
class MyTarget(ParserTarget[float])
def close(self) -> float:
...Here are fully annotated target methods which are documented in lxml website:
def start(self, tag: str, attrib: dict[str, str]) -> None: ...
def comment(self, text: str) -> None: ...
def data(self, data: str) -> None: ...
def end(self, tag: str) -> None: ...Methods below are from undocumented SAX events but usable nonetheless:
def pi(self, target: str, data: str) -> None: ...
def start_ns(self, prefix: str, uri: str) -> None: ...
def end_ns(self, prefix: str) -> None: ...
def doctype(
self,
root_tag: str | None,
public_id: str | None,
system_id: str | None,
) -> None: ...