Releases: abelcheung/types-lxml
Releases · abelcheung/types-lxml
2025.08.25
Breaking or important changes
- Brings in full
lxml6.0.x support. Additional exported constants were already present in earliertypes-lxmlrelease, here are the remaining features: - No more tested against lxml 4.9.x. Doesn't mean it will break immediately, but will not have any guarantee that
types-lxmlcompletely matches 4.9.x API over time.
Features
- Also test against lxml 5.4 and newest 5.3.x
- (#92, thanks to @macro1) Apply
mypy.stubtestcheck to help guarantee stub implementation doesn't deviate too much from runtime signatures and types, except intentional ones. Helps finding many of the bug fixes below. - Compatible with
mypy1.16+ andpyright1.1.399+ - (#86) Revive custom target parser support (stub-only
ParserTargetas target object, andCustomTargetParseras stub-only variant ofXMLParser)- Functions involved:
fromstring(),parse(),_ElementTree.parse(),ElementTree(),fromstringlist(),HTML(),XML() - Params of all target object methods are positional
- attribute is a dict in target object
.start()method - Leave the capability of creating custom target parser to only
XMLParserandHTMLParser, and droptarget=param from all parser subclasses (such aslxml.htmlones) C14NWriterTargetinherits fromParserTarget
- Functions involved:
Bug Fixes
- Sync or add
__all__in various submodules
Fixes for lxml.etree
- (#85, thanks to @BrandonStudio)
cleanup_namespaces()shouldn't warn withoutkeep_ns_prefixesarg - Allow specifying default value of
output_parentarg forXSLTExtension.apply_template()and.process_children() - Mark
_Attribas final - Add missing
XMLSyntaxAssertionError.__init__() set_default_parser()arg missing default valuestrip_elements()with_tailarg should be keyword-only- Use original param name in tag cleanup functions
- Strip unnecessary arguments in
XSLTExtensionoverloads - Give users a rough idea about
XSLTExtensionmethod arguments, such as using_Elementto approximately represent_ReadOnlyElementProxy. Avoids creating even more stub-only classes and requiring user to poke into them
Fixes for lxml.html
FormElement._nameis a method, not property
Fixes for lxml.isoschematron
- some Schematron variables are Literal constants
Fixes for lxml.objectify
enable_recursive_str()arg missing default valueparse()file parameter name was wrong
Minor changes
- Trim down
canonicalize(),etree.tostring()andExtension()overloads to avoid confusion - Implement
objectify.NumberElementafter all, in rare case where somebody wants to implement new type of number related toDataElement - Move
NumberElement._setValueParser()to subclasses - (#71) Remove last traces of
_AnyStr - Reorder
_ElementTree.write()overloads, with the most generic overload presented first for UX - Fix
XMLParserandHTMLParserAPI doc links - Better docstring and warning for
C14NWriterTarget - Drop unused
_HtmlElemParseralias
2025.03.30
Features
- (#82) Add buffer type support for upcoming lxml 6.0.
HtmlElement.text_content()result will become plainstrsince lxml 6.0. This change shouldn't break much compatibility for users of previous lxml versions.- Warn user about
strinput andguess_charsetcombo bug inhtml.html5parserfunctions - Warn user about incorrect usage of specifying single element as
.extend()argument - lxml 6.0 exports
LIBXML_COMPILED_FEATURESconstant
Bug fixes
- (#84) Tag selector supports iterator but not
bytearray - A few combinations of
QNameconstruction argument were actually disallowed; second argument can't beQNameor_Elementif first argument is non-empty - Multiple issues for
Resolverclass- Don't annotate opaque internal context object
- Drop
_ResolverRegistry.resolve()which can't possibly appear in user land code - Missing default value for
Resolver.resolve_file()keyword arguments Resolver.resolve()arguments can beNone
- Drop unused keyword arguments from
iterparse()html mode overload namespacesarg of.xpath()method accepts tuple form. Change forXPathclasses already done earlier.- Confine the type of public element (subclass of
ElementBase) class attributes _Element.findtext()didn't allow default argument in certain overload formRelaxNG.from_rnc_string()base_urlargument acceptsbyteshtml.html5parserguess_charsetbug revisitedparse()is not affected as it always open files/URL in binary mode- For other functions, even
guess_charset=Falsetriggers the bug
- Some
html5parser.HTMLParserinitialisation arguments should be keyword only - Corrected import of
typing.Neverinhtmlmodule andhtml.html5parsersubmodule .extend()and__setitem__()of_ElementandHtmlElementsupport iterator as value_Element.index()had wrong parameter name- Continued verification of properties and arguments supporting
bytearray:_Element.textand.tailproperties- Content-only elements
XPathinput expression_IDDictmixin argumentsxmlfile.write*()methods andencodingargument
Minor changes
- Drop
_ElemClsLookupArgalias, which is almost unused - Rename
_StrictNSMapto more aptly named_StrOnlyNSMap - Don't include superclass attributes in
ParseErrordefinition - Continue getting rid of
_AnyStrin most places - Mark constants as
Final
Tests related
- Migrate following tests to property based runtime testing:
- All basic validators:
DTD,RelaxNG,ISO Schematron(XMLSchemadone in earlier release) - All existing
_Elementmethod / property tests and content-only elements html.html5parsersubmoduleXMLID()and friendsQName
- All basic validators:
- For all negative tests on properties or arguments bombarded with random objects, also add iterables of correct objects to the list, to make sure iterables of correct argument or value would become incorrect arguments.
Documentation
- Fill in docstring for all
_Elementproperties and methods
2025.03.04
Features and breaking changes
- Depends on
beautifulsoup4itself because version 4.13 has bundled inline annotation. Droppingtypes-beautifulsoup4dependency as result. - Multi subclass patch includes change in
CSSSelectorresult - Implement
ErrorTypesconstants as enum
Bug fixes
- Additional
type: ignores that improve compatibility with older versions ofmypyandpyright - For
soupparsersubmodule input arguments, copy definition frombeautifulsoup4code directly html.fragment_fromstringcreate_parentargument can be string (#83, thanks to @sciyoshi)XPathnamespacesargument can accept namespace tuples- Fixes compatibility with mypy 1.14+
bytesnot allowed ashtml.diff.htmldiff()argument- Parser
encodingarguments do supportbytearray _ListErrorLog.filter_from_level()supports real numbers
Minor changes and tests
- Migrate
beautifulsoupandErrorLogtests to property based - Migrate
cssselectandXMLSchematests to runtime ones - Add mocked HTTP response to file input fixture; introduces
urllib3andpookas test dependency
2025.02.24
Features and breaking changes
-
Add
basedpyrighttype checker support -
Incorporate changes from
lxml5.3.1 and (pending) 6.0- More
html.buildershorthands libxmlfeature constantsetree.DTD(external_id=...)supportstrnow- Deprecate some
Memdebugmethods
- More
Bug fixes
-
html.submit_form()always returnHTTPResponsefor default handler -
Instance attributes are converted to properties because they are not deletable:
html.SelectElement.multiplehtml.InputElement.type
-
More function arguments supports
bytearray:register_namespace()inclusive_ns_prefixesparameter ofetree.tostring()
Minor changes
- Add docstring for some
etreemodule functionoverloads - Drop
_AnyStrfrometreemodule level functions
2024.12.13
Breaking changes and features
bytearrayaccepted as tag names, attribute names and attribute values- Related change: create
_TextArgtype alias to slowly replace existing_AnyStr(#71)
- Related change: create
- Warn IDE users via
warnings.deprecatedabout exception upon certain argument combinations in HTML link functions
Bug fixes
- Property deleter missing for HTML elements (#73)
etree.strip_attributes()supportbytesandQNameas input- Completion of #64 for remaining known cases
- Corrected link replacement function return type in
html.rewrite_links() etree.canonicalize()shouldn't acceptbytesas input
Tests related
- Use
hypothesisfor extensive tests on function arguments, currently used in_Attriband HTML link function tests (#75) reveal_type()injector has been split into its own project and pulled via dependency
Internal changes
- Folder structure changes for the whole repository (#70)
- Remove
_HANDLE_FAILUREStype alias and show values directly to users - Rename type-only protocol
SupportsLaxedItemstoSupportsLaxItems
Full Changelog: 2024.11.08...2024.12.13
2024.11.08
Breaking and important changes
pyrightusers (and IDE that can make use ofpyright) will see warning if a single string is supplied where collection of string is expected (tuple,set,listetc). In terms of typing, a singlestritself is valid as aSequence, so type checkers normally would not raise alarm when usingstrin such function parameters, but can induce unexpected runtime behavior. (#64)_ElementTree.write(),etree.fromstringlist(),etree.tostring(),html.soupparser.fromstring(),html.soupparser.parse()
- It is possible to verify release files indeed come from GitHub and not maliciously altered. See Release file attestation for detail.
- Runtime tests support comparing with
mypyresults, therefore officially making static stub tests obsolete
Bug fixes
- Element tag names, attribute names and attribute values support
bytearray. This is discovered viahypothesistesting, which is intended to be utilized in next release - Compatibility with
pyright ⩾ 1.1.378, which imposes additional overload warning foretree.iterparse() - Use relative import in
lxml.ElementInclude, otherwisemypytriggers--install-typebehavior. ObjectifiedElement__getitem()__and__setitem()__should acceptstras key, which behaves mostly like__getattr__()and__setattr__(). That means,elem["foo"]is equivalent toelem.foofor non-repeating subelements.
fixes for etree submodule
_Element.tagproperty is not just astr. It isstrafter initial document or string parsing, but can be set manually to any type supported by tag name and returns the same object.- When
QNameis initialized with first argument set toNone,_Elementcan be used as second argument (which is promoted to first argument in implementation) - Relax single argument usage in
_Element.iter*()method family, doesn't needtag=keyword when argument isNone FunctionNamespace()should generate an_XPathFunctionNamespaceRegistryobject, not its superclass- For decorator usage of
_XPathFunctionNamespaceRegistryand_ClassNamespaceRegistry, decorator signature included an extraneous argument, though it doesn't affect any existing correct usage. indent()first parameter has wrong name
fixes for html submodule
soupparser.parse()should acceptpathlib.Pathobject as input.valueproperty ofSelectElementcan't be set tobytes.actionproperty ofFormElementcan have a value ofNone, and can be set toNone. They have different meanings though.
Small and internal changes
- Declare python 3.13 support and perform CI tests.
- Separation of
pyrightandmypyignore comments: in previous releases# type: ignore[code]was enabled inpyrightsettings. Now it only uses# pyright: ignore[code]somypycomment won't affectpyrightbehavior. - Add
._nameproperty tohtml.FormElementfor form name - Eliminate
typing.TypeAliasusage (declared obsolete, and we can do without it)
Test related changes
- Stub tests migration to runtime:
- Most of remaining
etree._Elementmethods, now only.makeelement()and.xpath()left in stub test
- Most of remaining
- Runtime test additions:
ElementNamespaceClassLookup()
toxconfig migrated topyproject.toml, thus requiringtox ⩾ 4.22- Runtime tests are now executed within
test-rtfolder due to python/mypy#8400 - Some tests need to be performed conditionally when multi-subclass patch is applied
- Some tests or syntaxes need to be turned off to cope with
mypydeficiencies - Usage of Rust-based
uvas well as relatedtoxplugin to speed up test environment recreation - Don't force users installing
tox-gh-actionswhen checkout out repository, it is only useful for GitHub workflows
Docstring additions
etreesubmodule:parse(),fromstringlist(),tostring(),indent(),iselement(),adopt_external_document(),DocInfoproperties,QName,CData, some exception classeshtml.soupparsersubmodule:fromstring(),parse(),convert_tree()
2024.09.16
Bug fix and small changes
- Namespace argument in Elementpath methods should allow
None(#60 thanks to @cukiernick)
Internal changes
- Perform runtime tests against
lxml 5.3
2024.08.07
Breaking changes
- Multiple builds available, with the alternative build enhancing multiple XML subclassing scenario. See relevant README section for detail. Thanks to @scanny for the driving force behind #51.
Mypy1.11 required, which introduced backward incompatible@typing.overloadchanges.lxml.html.cleanstub depreated,lxml 5.2.0completely removes the submodule due to multiple security issues. Corresponding code and type definitions are split into a new independent repo.
Features
- (#56) Replace
typing.TypeGuardwithtyping.TypeIs - Use callback protocol for more precise element and
ElementMakerfactory function typing lxml.etree.ICONV_COMPILED_VERSIONexported since5.2.2- Special handling for
ObjectifiedElementandHTMLElementinlxml.cssselect.CSSSelectorand variouscssselect()methods html.buildershorthands return more precise element type for certain HTML elements. For example,html.builder.LABEL(), corresponding to<LABEL>tag, yieldsLabelElement.- More precise
etree.Extension()annotation depending on supplied namespace - Stricter namespace argument type in
_ElementElementPath methods - For
lxml.builder.ElementMakerclass:- Provide better hint in
__call__()argument - Accepts namespace tuple in
nsmapargument - Export private properties
- Provide better hint in
- For
lxml.saxmodule:- Export private properties in various classes
- Explicitly list all inherited methods in
ElementTreeContentHandlerclass, as method arguments names are different from superclass ones
- Alert
etree.HTMLParserusers to remove deprecatedstrip_cdataargument
Bug fix and small changes
- Some
_Elementrelated input arguments fixed to usetyping.Sequenceinstead ofInterable, as_Elementis already anIterableitself. Supplying_Elementwhere a properIterableis expected would cause problem. - Similar situation arises for
strorbytein tag selector argument; usetyping.Collectionto alert user more clearly. Nonecan't be used asetree.strip_*()argument- Some
etree.DocInforead-only properties can't beNone - Fix
etree.Resolvermethod return types - Avoid exception raising arg combinations in
html.html5parser.HTMLParser
Internal changes
- The usual static stub to runtime test migration:
- Part of basic
_Elementtests and itsfind*()methods - More extensive
_Attribtests
- Part of basic
- Use
ruffto replaceblackandisortas code formatter - Migrate stub tests to support
pytest-mypy-plugins ⩾ 2.0 - Use
pdm-backendas build backend due to its more versatile versioning support
2024.04.14
Breaking changes
Mypy1.9 is required, dropping 1.5 support. 1.6 - 1.8 was never supported.lxml.ElementIncludecompletely reworked
Features
- PEP 696 support, simplifying usage of some subscripted types (#42)
- As a convenient side effect,
lxml.htmlparser constructor signatures can be removed
- As a convenient side effect,
- All annotations do provide default values in their signatures now instead of
...
Bug fix and small changes
- Type of
_Comment.textproperty (and those of similar elements) is alwaysstr(#46, thanks to @eemeli) - Tag selector argument in element iterator methods should support keyword with a single tag (#45, thanks to @eemeli)
html.fragments_fromstring()should receive same fix ashtml.html5parser.fragments_fromstring()do (#43, thanks to @Wuestengecko)@overloadforetree.SubElement()on handling ofHtmlElementandObjectifiedElement- Some exported constants were missing from
lxml.ElementIncludestub html.soupparsermodule functions return type depends onmakeelementargument- Keyword arguments in
html.soupparsermodule functions are explicitly listed now (instead of generic**kwargsbefore) - The 2 arguments in
html.diff.html_annotate()should align their annotation types html.submit_form()return type depends on the result ofopen_httpfunction argument- Add missing exported variable for
lxml.isoschematron - Uppercase variants of output method arguments ("HTML", "TEXT", "XML") were dropped
Internal changes
- Usual runtime test additions:
lxml.html.soupparser,lxml.ElementInclude, various exported constants - Runtime tests also do test against lxml 5.2
2024.03.27
Breaking change
- Requires
cssselect ⩾ 1.2for annotation inlxml.cssselect, sincecssselectis now inline annotated.
Bug fix and small changes
- Compatibility with
pyright ⩾ 1.1.353 - In
etree.clean_*functions, first argument (the Element or ElementTree to be processed) must be strictly positional etree._LogEntry.filenameproperty is never empty, as it uses the value<string>as fallbacketree._BaseErrorLog.receive()argument name was wrong- Self brewed
SupportsReadCloseprotocol dropped, replacing with more standardizedSupportsRead html.html5parser.parse()should support data stream as inputhtml.html5parser.fragments_fromstring()return type is dependent onno_leading_textargumentencodingarguments in various methods / functions used to only support ASCII and UTF-8 as byte encodings, now the restriction is lifted- Place some
typingusage under python version check (if sys.version_info >= (3, x)) etree.PyErrorLogconstructor shouldn't accept 2 logger arguments simultaneouslyetree.PyErrorLog.level_mapproperty reverted to vanilla type (int) instead of our fakeenum
