Skip to content

Conversation

josegar74
Copy link
Member

@josegar74 josegar74 commented Sep 29, 2025

XML documents that begin with a Byte Order Mark (BOM) cause the following exception:

org.jdom.JDOMException: Error occurred while trying to load an xml file: ...: Error on line 1: Content is not allowed in prolog.
at org.fao.geonet.utils.Xml.loadFile(Xml.java:232) ~[gn-common-4.2.13-0.jar:?]

This change request checks whether the XML document begins with a BOM and, if so, removes it so that the XML load works as intended.

Checklist

  • I have read the contribution guidelines
  • Pull request provided for main branch, backports managed with label
  • Good housekeeping of code, cleaning up comments, tests, and documentation
  • Clean commit history broken into understandable chucks, avoiding big commits with hundreds of files, cautious of reformatting and whitespace changes
  • Clean commit messages, longer verbose messages are encouraged
  • API Changes are identified in commit messages
  • Testing provided for features or enhancements using automatic tests
  • User documentation provided for new features or enhancements in manual
  • Build documentation provided for development instructions in README.md files
  • Library management using pom.xml dependency management. Update build documentation with intended library use and library tutorials or documentation

@josegar74 josegar74 added this to the 4.4.10 milestone Sep 29, 2025
@josegar74 josegar74 requested a review from ianwallen September 29, 2025 10:59
@CLAassistant
Copy link

CLAassistant commented Sep 29, 2025

CLA assistant check
All committers have signed the CLA.

@josegar74 josegar74 marked this pull request as ready for review September 29, 2025 13:26
SAXBuilder builder = getSAXBuilderWithPathXMLResolver(false, null); //new SAXBuilder();
builder.setFeature("http://apache.org/xml/features/validation/schema", false);
builder.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
Document jdoc = builder.build(f);

Check failure

Code scanning / CodeQL

Resolving XML external entity in user-controlled data Critical

XML parsing depends on a
user-provided value
without guarding against external entity expansion.
XML parsing depends on a
user-provided value
without guarding against external entity expansion.
XML parsing depends on a
user-provided value
without guarding against external entity expansion.
XML parsing depends on a
user-provided value
without guarding against external entity expansion.
XML parsing depends on a
user-provided value
without guarding against external entity expansion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josegar74 hey, it looks like this was not introduced in the PR but already present before. Do you have an idea how to mitigate this? thank you!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jahow I see in https://codeql.github.com/codeql-query-help/java/java-xxe/ that the good example uses:

factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);

If it sounds fine, maybe we can create another pull request to test this properly as I am not sure if can cause side effects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jahow I have created #9066 to solve this issue, so we can test the suggested change in https://codeql.github.com/codeql-query-help/java/java-xxe/ properly.

I've done some tests, for example using the CSW harvester and looks fine, but more testing is required.

@josegar74 josegar74 merged commit 812168c into geonetwork:main Oct 9, 2025
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants