Skip to content

Use defusedxml in XMLLoader to block entity expansion#4873

Open
biefan wants to merge 1 commit intocrewAIInc:mainfrom
biefan:codex/secure-xml-loader-with-defusedxml
Open

Use defusedxml in XMLLoader to block entity expansion#4873
biefan wants to merge 1 commit intocrewAIInc:mainfrom
biefan:codex/secure-xml-loader-with-defusedxml

Conversation

@biefan
Copy link

@biefan biefan commented Mar 14, 2026

Summary

This changes XMLLoader to use defusedxml instead of the standard library XML parser so entity expansion is rejected by default.

Problem

The current implementation uses xml.etree.ElementTree, which expands internal entities. A malicious XML payload such as <!DOCTYPE foo [<!ENTITY xxe "EXPANDED">]><root>&xxe;</root> is therefore parsed into normal output instead of being rejected.

That means the loader does not currently fail closed for unsafe XML input, which is exactly the class of issue called out in #4865.

Root Cause

XMLLoader relied on the standard parser and only handled ParseError. Once the parser was switched to defusedxml, dangerous entity payloads raised DefusedXmlException, which also needed to be handled by the loader's existing parse-error fallback path.

Fix

  • add defusedxml as a crewai-tools dependency
  • switch XMLLoader to defusedxml.ElementTree
  • catch DefusedXmlException alongside ParseError
  • add a regression test that verifies internal entities are rejected instead of expanded

Validation

  • uv run pytest lib/crewai-tools/tests/rag/test_xml_loader_security.py -q
  • uv run ruff check lib/crewai-tools/src/crewai_tools/rag/loaders/xml_loader.py lib/crewai-tools/tests/rag/test_xml_loader_security.py

Closes #4865.


Note

Medium Risk
Changes XML parsing behavior by switching to defusedxml, which may reject previously-accepted XML (e.g., documents with DTD/entity declarations) but is scoped to the RAG XML loader and improves security against XXE/entity expansion.

Overview
Hardens XML ingestion in RAG. XMLLoader now uses defusedxml.ElementTree instead of the stdlib parser and treats DefusedXmlException the same as ParseError, falling back to returning the raw content with parse_error metadata.

Adds defusedxml as a crewai-tools dependency (and updates uv.lock) and includes a regression test ensuring internal entity declarations are rejected rather than expanded.

Written by Cursor Bugbot for commit 7880b36. This will update automatically on new commits. Configure here.

@biefan biefan changed the title [codex] Use defusedxml in XMLLoader to block entity expansion Use defusedxml in XMLLoader to block entity expansion Mar 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant