Skip to content

Conversation

fxprunayre
Copy link
Member

Shacl validation is using Shacl shapes to validate RDF document and can be done with Jena library (library used for the DCAT harvester).

Shacl rules are usually published with DCAT profiles eg. https://semiceu.github.io/DCAT-AP/releases/3.0.1/#validation-of-dcat-ap. Shacl is used by various tools like https://data.europa.eu/mqa/shacl-validator-ui/data-provision or https://www.itb.ec.europa.eu/shacl/dcat-ap/upload but it is sometimes hard to analyse errors and be able to easily improve DCAT compatibility. This work was also the opportunity to get more knowledge about how RDF validation works (eg. loading shapes, loading ontologies, infer model) and if DCAT validation can be similar to validation provided by INSPIRE. Validation using Shacl is quite dependent on the validation configuration and background knowledge provided so validation results depends a lot on that.

Changes:

  • API / Add endpoints to retrieve testsuites (similar to INSPIRE ones). A testsuite is a collection of shapes and ontologies.
  • API / Add endpoints to run a testsuite on a record
image
  • Editor / Start and display simple validation report
image image

Notes:

  • A Shacl testsuites has to be applied on a particular formatter (currently hardcoded on eu-dcat-ap-hvd).
  • Shacl reports (similar to XSD) is quite technical pointing to object in the RDF graph and is not simple to understand for end users. To mitigate this, schematron analysis (eg. Standard / ISO / Schematron for DCAT HVD requirements #8555) could be a way to anticipate Shacl errors based on the ISO document analysis providing better context and hints for each types of rules.
  • This is maybe not necessary to be merge in main depending on people interested in that topic. It can also depends on the availability on "official" online validator that GeoNetwork could maybe use directly.

Checklist

  • I have read the contribution guidelines
  • Pull request provided for main branch, backports managed with label
  • Good housekeeping of code, cleaning up comments, tests, and documentation
  • Clean commit history broken into understandable chucks, avoiding big commits with hundreds of files, cautious of reformatting and whitespace changes
  • Clean commit messages, longer verbose messages are encouraged
  • API Changes are identified in commit messages
  • Testing provided for features or enhancements using automatic tests
  • User documentation provided for new features or enhancements in manual
  • Build documentation provided for development instructions in README.md files
  • Library management using pom.xml dependency management. Update build documentation with intended library use and library tutorials or documentation

Funded by Service Public de Wallonie

Shacl validation is using [Shacl](https://www.w3.org/TR/shacl/) shapes to validate RDF document and can be done with [Jena library](https://jena.apache.org/documentation/shacl/) (library used for the DCAT harvester).

Shacl is used by various tools like https://data.europa.eu/mqa/shacl-validator-ui/data-provision or https://www.itb.ec.europa.eu/shacl/dcat-ap/upload but it is sometimes hard to analyse errors and be able to easily improve DCAT compatibility. This work was also the opportunity to be more knowledgeable about how RDF validation works (eg. loading shapes, loading ontologies, infer model) and if a more yes/no validation can be achieve for DCAT like validation provided by INSPIRE. Validation using Shacl is quite dependent on the validation configuration and background knowledge provided so validation results depends a lot on that.

Changes:

* API / Add endpoints to retrieve testsuites (similar to INSPIRE ones)
* API / Add endpoints to run a testsuite on a record
* Editor / Start and display simple validation report
Copy link

sonarqubecloud bot commented Sep 3, 2025

Quality Gate Failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 80%)
C Reliability Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

AbstractMetadata metadata = ApiUtils.canEditRecord(metadataUuid, request);

ServiceContext context = ApiUtils.createServiceContext(request);
return shaclValidationService.validate(formatter, metadata, testsuite, shapeModel, context, acceptHeader, isSavingValidationStatus);

Check warning

Code scanning / CodeQL

Information exposure through an error message Medium

Error information
can be exposed to an external user.

Copilot Autofix

AI about 1 month ago

To fix this, catch all exceptions thrown from the shaclValidationService.validate(...) invocation within the controller method. Log the exception internally (with stack trace or message for auditing and debugging), but return a generic error message to the client, avoiding exposure of the original exception message or stack trace.

  • Where: Only change the controller method body in ShaclValidationApi.java, lines 142–146.
  • What:
    • Surround the body with a try-catch block.
    • On exception, log the error (using any available logger, e.g., LoggerFactory or org.apache.log4j.Logger—but only use what we see imported, or add a well-known logger import).
    • Return a generic error message, such as "Validation failed due to an internal error." Optionally set an appropriate HTTP status code (e.g., 500).
    • The controller's signature returns a String (likely serialised as content), so send the generic message as the response.
  • Extra imports: If a logger is not yet available, import org.slf4j.Logger and org.slf4j.LoggerFactory or use another suitable (already imported) logger.

Suggested changeset 2
services/src/main/java/org/fao/geonet/api/records/ShaclValidationApi.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/services/src/main/java/org/fao/geonet/api/records/ShaclValidationApi.java b/services/src/main/java/org/fao/geonet/api/records/ShaclValidationApi.java
--- a/services/src/main/java/org/fao/geonet/api/records/ShaclValidationApi.java
+++ b/services/src/main/java/org/fao/geonet/api/records/ShaclValidationApi.java
@@ -42,6 +42,8 @@
 import org.springframework.beans.factory.annotation.Autowired;
 import org.springframework.http.HttpHeaders;
 import org.springframework.http.HttpStatus;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 import org.springframework.http.MediaType;
 import org.springframework.security.access.prepost.PreAuthorize;
 import org.springframework.stereotype.Controller;
@@ -55,6 +57,8 @@
 
 @RequestMapping(value = {"/{portal}/api/records"})
 @Tag(name = API_CLASS_RECORD_TAG, description = API_CLASS_RECORD_OPS)
+
+    private static final Logger logger = LoggerFactory.getLogger(ShaclValidationApi.class);
 @Controller("shaclValidationApi")
 @PreAuthorize("hasAuthority('Editor')")
 @ReadWriteController
@@ -139,10 +143,16 @@
         @Parameter(hidden = true)
         @RequestHeader(value = HttpHeaders.ACCEPT, defaultValue = MediaType.APPLICATION_XML_VALUE)
         String acceptHeader) throws Exception {
-        AbstractMetadata metadata = ApiUtils.canEditRecord(metadataUuid, request);
+        try {
+            AbstractMetadata metadata = ApiUtils.canEditRecord(metadataUuid, request);
 
-        ServiceContext context = ApiUtils.createServiceContext(request);
-        return shaclValidationService.validate(formatter, metadata, testsuite, shapeModel, context, acceptHeader, isSavingValidationStatus);
+            ServiceContext context = ApiUtils.createServiceContext(request);
+            return shaclValidationService.validate(formatter, metadata, testsuite, shapeModel, context, acceptHeader, isSavingValidationStatus);
+        } catch(Exception ex) {
+            logger.error("Exception during SHACL validation", ex);
+            // Return a generic error message, suppressing internal details
+            return "Validation failed due to an internal error.";
+        }
     }
 
 
EOF
@@ -42,6 +42,8 @@
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.http.HttpHeaders;
import org.springframework.http.HttpStatus;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.http.MediaType;
import org.springframework.security.access.prepost.PreAuthorize;
import org.springframework.stereotype.Controller;
@@ -55,6 +57,8 @@

@RequestMapping(value = {"/{portal}/api/records"})
@Tag(name = API_CLASS_RECORD_TAG, description = API_CLASS_RECORD_OPS)

private static final Logger logger = LoggerFactory.getLogger(ShaclValidationApi.class);
@Controller("shaclValidationApi")
@PreAuthorize("hasAuthority('Editor')")
@ReadWriteController
@@ -139,10 +143,16 @@
@Parameter(hidden = true)
@RequestHeader(value = HttpHeaders.ACCEPT, defaultValue = MediaType.APPLICATION_XML_VALUE)
String acceptHeader) throws Exception {
AbstractMetadata metadata = ApiUtils.canEditRecord(metadataUuid, request);
try {
AbstractMetadata metadata = ApiUtils.canEditRecord(metadataUuid, request);

ServiceContext context = ApiUtils.createServiceContext(request);
return shaclValidationService.validate(formatter, metadata, testsuite, shapeModel, context, acceptHeader, isSavingValidationStatus);
ServiceContext context = ApiUtils.createServiceContext(request);
return shaclValidationService.validate(formatter, metadata, testsuite, shapeModel, context, acceptHeader, isSavingValidationStatus);
} catch(Exception ex) {
logger.error("Exception during SHACL validation", ex);
// Return a generic error message, suppressing internal details
return "Validation failed due to an internal error.";
}
}


services/pom.xml
Outside changed files

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/services/pom.xml b/services/pom.xml
--- a/services/pom.xml
+++ b/services/pom.xml
@@ -240,7 +240,12 @@
       <groupId>commons-fileupload</groupId>
       <artifactId>commons-fileupload</artifactId>
     </dependency>
-  </dependencies>
+      <dependency>
+        <groupId>org.slf4j</groupId>
+        <artifactId>slf4j-api</artifactId>
+        <version>2.1.0-alpha1</version>
+    </dependency>
+</dependencies>
   <build>
     <plugins>
 <!--
EOF
@@ -240,7 +240,12 @@
<groupId>commons-fileupload</groupId>
<artifactId>commons-fileupload</artifactId>
</dependency>
</dependencies>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>2.1.0-alpha1</version>
</dependency>
</dependencies>
<build>
<plugins>
<!--
This fix introduces these dependencies
Package Version Security advisories
org.slf4j:slf4j-api (maven) 2.1.0-alpha1 None
Copilot is powered by AI and may make mistakes. Always verify output.
AbstractMetadata metadata = ApiUtils.canEditRecord(metadataUuid, request);

ServiceContext context = ApiUtils.createServiceContext(request);
return shaclValidationService.validate(formatter, metadata, testsuite, shapeModel, context, acceptHeader, isSavingValidationStatus);

Check warning

Code scanning / CodeQL

Cross-site scripting Medium

Cross-site scripting vulnerability due to a
user-provided value
.
Cross-site scripting vulnerability due to a
user-provided value
.

Copilot Autofix

AI about 1 month ago

To resolve this issue, the application should prevent untrusted user input from being reflected directly into output fields in a way that could cause security issues. The most robust solution is to either:

  1. Validate or restrict the possible values of the formatter (and other relevant inputs) using a server-side whitelist (i.e., only permit specific, expected string values); or
  2. If the set of allowed values can’t be strictly enumerated, ensure user-controlled string data is properly escaped when embedded in responses.

Given the context (the formatter parameter refers to a set of possible format names, often documented in the API as dcat, eu-dcat-ap, etc.), a whitelist of allowed names is best and changes very little functional behavior. This should be enforced in validateRecordUsingShacl (or inside validate). If an invalid value is supplied, return a clear error message.

Alternatively (for completeness or if new formatters may be added dynamically), any string values returned to the client (especially those embedded in error/status messages in JSON) should be escaped to prevent the formation of illegal or executable content. For JSON, this is typically handled by constructing your response as a POJO and serializing with a safe library (e.g., Jackson), or at minimum escaping quotes and special characters inside the string.

To implement the whitelist fix:

  • Import a set of allowed values at the top of the API class (or service).
  • Prior to passing formatter to shaclValidationService.validate, check if it is present in the allowed set. If not, throw an exception or return an error JSON stating the allowed values.

To implement escaping as a defense-in-depth:

  • In buildStatusResponse, escape special characters in the message parameter using a library like org.apache.commons.text.StringEscapeUtils.escapeJson() before including them in the response string.

We will implement BOTH: adding a whitelist check for formatter in the API, and ensuring messages are safely encoded in responses (using Apache Commons Text for JSON escape).


Suggested changeset 3
services/src/main/java/org/fao/geonet/api/records/ShaclValidationApi.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/services/src/main/java/org/fao/geonet/api/records/ShaclValidationApi.java b/services/src/main/java/org/fao/geonet/api/records/ShaclValidationApi.java
--- a/services/src/main/java/org/fao/geonet/api/records/ShaclValidationApi.java
+++ b/services/src/main/java/org/fao/geonet/api/records/ShaclValidationApi.java
@@ -63,6 +63,9 @@
     @Autowired
     LanguageUtils languageUtils;
 
+    // Allowed formatter names. Keep in sync with your supported formatters!
+    private static final List<String> ALLOWED_FORMATTERS = List.of("dcat", "eu-dcat-ap", "eu-dcat-ap-hvd");
+
     @Autowired
     ShaclValidationService shaclValidationService;
 
@@ -142,6 +145,10 @@
         AbstractMetadata metadata = ApiUtils.canEditRecord(metadataUuid, request);
 
         ServiceContext context = ApiUtils.createServiceContext(request);
+        if (!ALLOWED_FORMATTERS.contains(formatter)) {
+            // Defensive: respond with error JSON if unknown formatter.
+            return "{\"valid\": false, \"message\": \"Invalid formatter provided. Allowed values: " + String.join(", ", ALLOWED_FORMATTERS) + "\"}";
+        }
         return shaclValidationService.validate(formatter, metadata, testsuite, shapeModel, context, acceptHeader, isSavingValidationStatus);
     }
 
EOF
@@ -63,6 +63,9 @@
@Autowired
LanguageUtils languageUtils;

// Allowed formatter names. Keep in sync with your supported formatters!
private static final List<String> ALLOWED_FORMATTERS = List.of("dcat", "eu-dcat-ap", "eu-dcat-ap-hvd");

@Autowired
ShaclValidationService shaclValidationService;

@@ -142,6 +145,10 @@
AbstractMetadata metadata = ApiUtils.canEditRecord(metadataUuid, request);

ServiceContext context = ApiUtils.createServiceContext(request);
if (!ALLOWED_FORMATTERS.contains(formatter)) {
// Defensive: respond with error JSON if unknown formatter.
return "{\"valid\": false, \"message\": \"Invalid formatter provided. Allowed values: " + String.join(", ", ALLOWED_FORMATTERS) + "\"}";
}
return shaclValidationService.validate(formatter, metadata, testsuite, shapeModel, context, acceptHeader, isSavingValidationStatus);
}

services/pom.xml
Outside changed files

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/services/pom.xml b/services/pom.xml
--- a/services/pom.xml
+++ b/services/pom.xml
@@ -240,7 +240,12 @@
       <groupId>commons-fileupload</groupId>
       <artifactId>commons-fileupload</artifactId>
     </dependency>
-  </dependencies>
+      <dependency>
+        <groupId>org.apache.commons</groupId>
+        <artifactId>commons-text</artifactId>
+        <version>1.14.0</version>
+    </dependency>
+</dependencies>
   <build>
     <plugins>
 <!--
EOF
@@ -240,7 +240,12 @@
<groupId>commons-fileupload</groupId>
<artifactId>commons-fileupload</artifactId>
</dependency>
</dependencies>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-text</artifactId>
<version>1.14.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<!--
services/src/main/java/org/fao/geonet/api/records/ShaclValidationService.java
Outside changed files

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/services/src/main/java/org/fao/geonet/api/records/ShaclValidationService.java b/services/src/main/java/org/fao/geonet/api/records/ShaclValidationService.java
--- a/services/src/main/java/org/fao/geonet/api/records/ShaclValidationService.java
+++ b/services/src/main/java/org/fao/geonet/api/records/ShaclValidationService.java
@@ -14,6 +14,7 @@
 import jeeves.server.context.ServiceContext;
 import org.apache.commons.codec.digest.DigestUtils;
 import org.apache.commons.lang3.StringUtils;
+import org.apache.commons.text.StringEscapeUtils;
 import org.apache.jena.graph.Node;
 import org.apache.jena.graph.compose.MultiUnion;
 import org.apache.jena.rdf.model.Model;
@@ -171,7 +172,8 @@
     }
 
     private static String buildStatusResponse(String message, boolean isValid) {
-        return String.format("{\"valid\": %s, \"message\": \"%s\"}", isValid, message);
+        // Defensive: escape message for JSON to avoid XSS
+        return String.format("{\"valid\": %s, \"message\": \"%s\"}", isValid, StringEscapeUtils.escapeJson(message));
     }
 
     private static String buildValidationReportKey(String formatter, String testsuite, List<String> shaclShapes) {
EOF
@@ -14,6 +14,7 @@
import jeeves.server.context.ServiceContext;
import org.apache.commons.codec.digest.DigestUtils;
import org.apache.commons.lang3.StringUtils;
import org.apache.commons.text.StringEscapeUtils;
import org.apache.jena.graph.Node;
import org.apache.jena.graph.compose.MultiUnion;
import org.apache.jena.rdf.model.Model;
@@ -171,7 +172,8 @@
}

private static String buildStatusResponse(String message, boolean isValid) {
return String.format("{\"valid\": %s, \"message\": \"%s\"}", isValid, message);
// Defensive: escape message for JSON to avoid XSS
return String.format("{\"valid\": %s, \"message\": \"%s\"}", isValid, StringEscapeUtils.escapeJson(message));
}

private static String buildValidationReportKey(String formatter, String testsuite, List<String> shaclShapes) {
This fix introduces these dependencies
Package Version Security advisories
org.apache.commons:commons-text (maven) 1.14.0 None
Copilot is powered by AI and may make mistakes. Always verify output.
MultiUnion shapesGraph = new MultiUnion();
for (String shaclFile : shaclFiles) {
Path shaclPath = dataDirectory.getConfigDir().resolve("shacl").resolve(shaclFile);
if (!Files.exists(shaclPath)) {

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI about 1 month ago

To fix this issue, user-supplied shape file names in shaclFiles should be strictly validated to prevent path traversal and ensure they are only file names (single path component), with no path separators or ".." references. The best fix is to check each supplied file name for forbidden components ("/", "\", "..") before using it to construct a resolved Path. Alternatively or additionally, after resolution, we can check that the canonical path of the resolved file remains within the intended shacl directory. Implement this check at the start of parseShapesFromFiles. Reject any file names failing the check with a clear error.

Changes needed:

  • In parseShapesFromFiles, validate each shaclFile in shaclFiles to ensure it:
    • Does not contain "/" or "\" or "..".
    • After resolution, remains strictly inside shaclRulesFolder.
  • If any check fails, throw an exception.

No new methods or imports required other than possibly java.nio.file.Paths and/or java.io.IOException.

Suggested changeset 1
services/src/main/java/org/fao/geonet/api/records/ShaclValidationService.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/services/src/main/java/org/fao/geonet/api/records/ShaclValidationService.java b/services/src/main/java/org/fao/geonet/api/records/ShaclValidationService.java
--- a/services/src/main/java/org/fao/geonet/api/records/ShaclValidationService.java
+++ b/services/src/main/java/org/fao/geonet/api/records/ShaclValidationService.java
@@ -149,8 +149,17 @@
 
     private Shapes parseShapesFromFiles(List<String> shaclFiles) {
         MultiUnion shapesGraph = new MultiUnion();
+        Path shaclRulesFolder = dataDirectory.getConfigDir().resolve("shacl").normalize().toAbsolutePath();
         for (String shaclFile : shaclFiles) {
-            Path shaclPath = dataDirectory.getConfigDir().resolve("shacl").resolve(shaclFile);
+            // Validate shameFile: must not contain path separators or ".."
+            if (shaclFile.contains("/") || shaclFile.contains("\\") || shaclFile.contains("..")) {
+                throw new IllegalArgumentException("Invalid SHACL shape file name: " + shaclFile);
+            }
+            Path shaclPath = shaclRulesFolder.resolve(shaclFile).normalize().toAbsolutePath();
+            // Ensure resolved file is inside shaclRulesFolder
+            if (!shaclPath.startsWith(shaclRulesFolder)) {
+                throw new IllegalArgumentException("SHACL shape file escapes rules folder: " + shaclFile);
+            }
             if (!Files.exists(shaclPath)) {
                 throw new IllegalArgumentException("SHACL shape file not found: " + shaclPath);
             }
EOF
@@ -149,8 +149,17 @@

private Shapes parseShapesFromFiles(List<String> shaclFiles) {
MultiUnion shapesGraph = new MultiUnion();
Path shaclRulesFolder = dataDirectory.getConfigDir().resolve("shacl").normalize().toAbsolutePath();
for (String shaclFile : shaclFiles) {
Path shaclPath = dataDirectory.getConfigDir().resolve("shacl").resolve(shaclFile);
// Validate shameFile: must not contain path separators or ".."
if (shaclFile.contains("/") || shaclFile.contains("\\") || shaclFile.contains("..")) {
throw new IllegalArgumentException("Invalid SHACL shape file name: " + shaclFile);
}
Path shaclPath = shaclRulesFolder.resolve(shaclFile).normalize().toAbsolutePath();
// Ensure resolved file is inside shaclRulesFolder
if (!shaclPath.startsWith(shaclRulesFolder)) {
throw new IllegalArgumentException("SHACL shape file escapes rules folder: " + shaclFile);
}
if (!Files.exists(shaclPath)) {
throw new IllegalArgumentException("SHACL shape file not found: " + shaclPath);
}
Copilot is powered by AI and may make mistakes. Always verify output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant