fix: replace xml.etree.ElementTree with defusedxml to prevent XXE attacks#4866
fix: replace xml.etree.ElementTree with defusedxml to prevent XXE attacks#4866devin-ai-integration[bot] wants to merge 3 commits intomainfrom
Conversation
…acks Addresses #4865 - The native Python xml library is vulnerable to XML External Entity (XXE) attacks that can leak confidential data and XML bombs that can cause denial of service. Changes: - Replace xml.etree.ElementTree with defusedxml.ElementTree in xml_loader.py - Replace xml.etree.ElementTree with defusedxml.ElementTree in arxiv_paper_tool.py - Add defusedxml~=0.7.1 as a dependency in crewai-tools pyproject.toml - Update arxiv_paper_tool_test.py to use defusedxml - Replace WebPageLoader tests in test_xml_loader.py with proper XMLLoader tests - Add XXE attack tests (entity expansion, billion laughs, parameter entities) - Remove noqa: S314 comments since defusedxml is safe Co-Authored-By: João <[email protected]>
|
Prompt hidden (unlisted session) |
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Replace 'import defusedxml.ElementTree as ET' with explicit imports (fromstring, ParseError, Element) to satisfy ruff N817 rule that flags CamelCase imported as acronym. Co-Authored-By: João <[email protected]>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
| root = fromstring(content) | ||
| else: | ||
| root = parse(source_ref).getroot() # noqa: S314 | ||
| root = parse(source_ref).getroot() |
There was a problem hiding this comment.
Exception handler won't catch defusedxml security exceptions
High Severity
defusedxml raises EntitiesForbidden or DTDForbidden when blocking XXE attacks, which inherit from DefusedXmlException → ValueError. The except ParseError handler in _parse_xml won't catch these because ParseError inherits from SyntaxError — a completely separate hierarchy. Malicious XML will cause an unhandled exception crash instead of returning a graceful LoaderResult with parse_error metadata. The test comment claiming EntitiesForbidden "is a subclass of ParseError" is incorrect.
Additional Locations (1)
Co-Authored-By: João <[email protected]>


fix: replace xml.etree.ElementTree with defusedxml to prevent XXE attacks
Summary
Addresses #4865 — The native Python
xmllibrary is vulnerable to XML External Entity (XXE) attacks that can leak confidential data, and "XML bombs" (Billion Laughs) that can cause denial of service.This PR replaces all usage of
xml.etree.ElementTreewithdefusedxml.ElementTreeacross two source files:lib/crewai-tools/src/crewai_tools/rag/loaders/xml_loader.pylib/crewai-tools/src/crewai_tools/tools/arxiv_paper_tool/arxiv_paper_tool.pydefusedxml~=0.7.1is added as a core dependency tocrewai-tools. The# noqa: S314suppression comments are removed sincedefusedxmldoes not trigger the Bandit S314 rule.The test file
tests/rag/test_xml_loader.pypreviously contained misplaced WebPageLoader tests (not XMLLoader tests). It has been rewritten with actual XMLLoader tests, including XXE attack vector coverage (entity expansion, billion laughs, parameter entities, file-based XXE).Review & Testing Checklist for Human
test_xml_loader.pypreviously containedTestWebPageLoadertests that were fully replaced. Confirm these tests are duplicated in another test file, or they need to be moved/preserved.defusedxml.ElementTreeexposesET.Element—arxiv_paper_tool.pyusesET.Elementin type hints (lines 124, 128).defusedxmlwraps stdlib but confirm this still resolves correctly.uv.lockupdates are handled by CI — The lock file was not updated locally due to a pre-existing parse error inuv.lock. CI should regenerate it.defusedxmldrop-in replacement works end-to-end.Notes
Note
Medium Risk
Changes XML parsing in
XMLLoaderandArxivPaperToolto usedefusedxml, which can alter parsing/error behavior and impacts data ingestion paths. Adds a new core dependency and replaces/expands tests, so regressions would show up at runtime if XML handling differs from stdlib.Overview
Hardens XML parsing against XXE/XML-bomb attacks by switching
XMLLoaderandArxivPaperToolfromxml.etree.ElementTreetodefusedxml.ElementTree(and removing the associatedS314suppressions).Adds
defusedxml~=0.7.1as a core dependency and rewritestest_xml_loader.pyto cover realXMLLoaderbehavior, including URL/file loading, parse-error fallback, doc-id consistency, and explicit XXE/billion-laughs blocking; updates the Arxiv tool test to expectdefusedxmlParseError.Written by Cursor Bugbot for commit a74df94. This will update automatically on new commits. Configure here.