-
-
Notifications
You must be signed in to change notification settings - Fork 499
Validation / Add Shacl validator #9006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Shacl validation is using [Shacl](https://www.w3.org/TR/shacl/) shapes to validate RDF document and can be done with [Jena library](https://jena.apache.org/documentation/shacl/) (library used for the DCAT harvester). Shacl is used by various tools like https://data.europa.eu/mqa/shacl-validator-ui/data-provision or https://www.itb.ec.europa.eu/shacl/dcat-ap/upload but it is sometimes hard to analyse errors and be able to easily improve DCAT compatibility. This work was also the opportunity to be more knowledgeable about how RDF validation works (eg. loading shapes, loading ontologies, infer model) and if a more yes/no validation can be achieve for DCAT like validation provided by INSPIRE. Validation using Shacl is quite dependent on the validation configuration and background knowledge provided so validation results depends a lot on that. Changes: * API / Add endpoints to retrieve testsuites (similar to INSPIRE ones) * API / Add endpoints to run a testsuite on a record * Editor / Start and display simple validation report
|
AbstractMetadata metadata = ApiUtils.canEditRecord(metadataUuid, request); | ||
|
||
ServiceContext context = ApiUtils.createServiceContext(request); | ||
return shaclValidationService.validate(formatter, metadata, testsuite, shapeModel, context, acceptHeader, isSavingValidationStatus); |
Check warning
Code scanning / CodeQL
Information exposure through an error message Medium
Error information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
To fix this, catch all exceptions thrown from the shaclValidationService.validate(...)
invocation within the controller method. Log the exception internally (with stack trace or message for auditing and debugging), but return a generic error message to the client, avoiding exposure of the original exception message or stack trace.
- Where: Only change the controller method body in
ShaclValidationApi.java
, lines 142–146. - What:
- Surround the body with a try-catch block.
- On exception, log the error (using any available logger, e.g.,
LoggerFactory
ororg.apache.log4j.Logger
—but only use what we see imported, or add a well-known logger import). - Return a generic error message, such as "Validation failed due to an internal error." Optionally set an appropriate HTTP status code (e.g., 500).
- The controller's signature returns a
String
(likely serialised as content), so send the generic message as the response.
- Extra imports: If a logger is not yet available, import
org.slf4j.Logger
andorg.slf4j.LoggerFactory
or use another suitable (already imported) logger.
-
Copy modified lines R45-R46 -
Copy modified lines R60-R61 -
Copy modified lines R146-R147 -
Copy modified lines R149-R155
@@ -42,6 +42,8 @@ | ||
import org.springframework.beans.factory.annotation.Autowired; | ||
import org.springframework.http.HttpHeaders; | ||
import org.springframework.http.HttpStatus; | ||
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
import org.springframework.http.MediaType; | ||
import org.springframework.security.access.prepost.PreAuthorize; | ||
import org.springframework.stereotype.Controller; | ||
@@ -55,6 +57,8 @@ | ||
|
||
@RequestMapping(value = {"/{portal}/api/records"}) | ||
@Tag(name = API_CLASS_RECORD_TAG, description = API_CLASS_RECORD_OPS) | ||
|
||
private static final Logger logger = LoggerFactory.getLogger(ShaclValidationApi.class); | ||
@Controller("shaclValidationApi") | ||
@PreAuthorize("hasAuthority('Editor')") | ||
@ReadWriteController | ||
@@ -139,10 +143,16 @@ | ||
@Parameter(hidden = true) | ||
@RequestHeader(value = HttpHeaders.ACCEPT, defaultValue = MediaType.APPLICATION_XML_VALUE) | ||
String acceptHeader) throws Exception { | ||
AbstractMetadata metadata = ApiUtils.canEditRecord(metadataUuid, request); | ||
try { | ||
AbstractMetadata metadata = ApiUtils.canEditRecord(metadataUuid, request); | ||
|
||
ServiceContext context = ApiUtils.createServiceContext(request); | ||
return shaclValidationService.validate(formatter, metadata, testsuite, shapeModel, context, acceptHeader, isSavingValidationStatus); | ||
ServiceContext context = ApiUtils.createServiceContext(request); | ||
return shaclValidationService.validate(formatter, metadata, testsuite, shapeModel, context, acceptHeader, isSavingValidationStatus); | ||
} catch(Exception ex) { | ||
logger.error("Exception during SHACL validation", ex); | ||
// Return a generic error message, suppressing internal details | ||
return "Validation failed due to an internal error."; | ||
} | ||
} | ||
|
||
|
-
Copy modified lines R243-R248
@@ -240,7 +240,12 @@ | ||
<groupId>commons-fileupload</groupId> | ||
<artifactId>commons-fileupload</artifactId> | ||
</dependency> | ||
</dependencies> | ||
<dependency> | ||
<groupId>org.slf4j</groupId> | ||
<artifactId>slf4j-api</artifactId> | ||
<version>2.1.0-alpha1</version> | ||
</dependency> | ||
</dependencies> | ||
<build> | ||
<plugins> | ||
<!-- |
Package | Version | Security advisories |
org.slf4j:slf4j-api (maven) | 2.1.0-alpha1 | None |
AbstractMetadata metadata = ApiUtils.canEditRecord(metadataUuid, request); | ||
|
||
ServiceContext context = ApiUtils.createServiceContext(request); | ||
return shaclValidationService.validate(formatter, metadata, testsuite, shapeModel, context, acceptHeader, isSavingValidationStatus); |
Check warning
Code scanning / CodeQL
Cross-site scripting Medium
user-provided value
Cross-site scripting vulnerability due to a
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
To resolve this issue, the application should prevent untrusted user input from being reflected directly into output fields in a way that could cause security issues. The most robust solution is to either:
- Validate or restrict the possible values of the
formatter
(and other relevant inputs) using a server-side whitelist (i.e., only permit specific, expected string values); or - If the set of allowed values can’t be strictly enumerated, ensure user-controlled string data is properly escaped when embedded in responses.
Given the context (the formatter
parameter refers to a set of possible format names, often documented in the API as dcat
, eu-dcat-ap
, etc.), a whitelist of allowed names is best and changes very little functional behavior. This should be enforced in validateRecordUsingShacl
(or inside validate
). If an invalid value is supplied, return a clear error message.
Alternatively (for completeness or if new formatters may be added dynamically), any string values returned to the client (especially those embedded in error/status messages in JSON) should be escaped to prevent the formation of illegal or executable content. For JSON, this is typically handled by constructing your response as a POJO and serializing with a safe library (e.g., Jackson), or at minimum escaping quotes and special characters inside the string.
To implement the whitelist fix:
- Import a set of allowed values at the top of the API class (or service).
- Prior to passing
formatter
toshaclValidationService.validate
, check if it is present in the allowed set. If not, throw an exception or return an error JSON stating the allowed values.
To implement escaping as a defense-in-depth:
- In
buildStatusResponse
, escape special characters in themessage
parameter using a library likeorg.apache.commons.text.StringEscapeUtils.escapeJson()
before including them in the response string.
We will implement BOTH: adding a whitelist check for formatter
in the API, and ensuring messages are safely encoded in responses (using Apache Commons Text for JSON escape).
-
Copy modified lines R66-R68 -
Copy modified lines R148-R151
@@ -63,6 +63,9 @@ | ||
@Autowired | ||
LanguageUtils languageUtils; | ||
|
||
// Allowed formatter names. Keep in sync with your supported formatters! | ||
private static final List<String> ALLOWED_FORMATTERS = List.of("dcat", "eu-dcat-ap", "eu-dcat-ap-hvd"); | ||
|
||
@Autowired | ||
ShaclValidationService shaclValidationService; | ||
|
||
@@ -142,6 +145,10 @@ | ||
AbstractMetadata metadata = ApiUtils.canEditRecord(metadataUuid, request); | ||
|
||
ServiceContext context = ApiUtils.createServiceContext(request); | ||
if (!ALLOWED_FORMATTERS.contains(formatter)) { | ||
// Defensive: respond with error JSON if unknown formatter. | ||
return "{\"valid\": false, \"message\": \"Invalid formatter provided. Allowed values: " + String.join(", ", ALLOWED_FORMATTERS) + "\"}"; | ||
} | ||
return shaclValidationService.validate(formatter, metadata, testsuite, shapeModel, context, acceptHeader, isSavingValidationStatus); | ||
} | ||
|
-
Copy modified lines R243-R248
@@ -240,7 +240,12 @@ | ||
<groupId>commons-fileupload</groupId> | ||
<artifactId>commons-fileupload</artifactId> | ||
</dependency> | ||
</dependencies> | ||
<dependency> | ||
<groupId>org.apache.commons</groupId> | ||
<artifactId>commons-text</artifactId> | ||
<version>1.14.0</version> | ||
</dependency> | ||
</dependencies> | ||
<build> | ||
<plugins> | ||
<!-- |
-
Copy modified line R17 -
Copy modified lines R175-R176
@@ -14,6 +14,7 @@ | ||
import jeeves.server.context.ServiceContext; | ||
import org.apache.commons.codec.digest.DigestUtils; | ||
import org.apache.commons.lang3.StringUtils; | ||
import org.apache.commons.text.StringEscapeUtils; | ||
import org.apache.jena.graph.Node; | ||
import org.apache.jena.graph.compose.MultiUnion; | ||
import org.apache.jena.rdf.model.Model; | ||
@@ -171,7 +172,8 @@ | ||
} | ||
|
||
private static String buildStatusResponse(String message, boolean isValid) { | ||
return String.format("{\"valid\": %s, \"message\": \"%s\"}", isValid, message); | ||
// Defensive: escape message for JSON to avoid XSS | ||
return String.format("{\"valid\": %s, \"message\": \"%s\"}", isValid, StringEscapeUtils.escapeJson(message)); | ||
} | ||
|
||
private static String buildValidationReportKey(String formatter, String testsuite, List<String> shaclShapes) { |
Package | Version | Security advisories |
org.apache.commons:commons-text (maven) | 1.14.0 | None |
MultiUnion shapesGraph = new MultiUnion(); | ||
for (String shaclFile : shaclFiles) { | ||
Path shaclPath = dataDirectory.getConfigDir().resolve("shacl").resolve(shaclFile); | ||
if (!Files.exists(shaclPath)) { |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression High
user-provided value
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 1 month ago
To fix this issue, user-supplied shape file names in shaclFiles
should be strictly validated to prevent path traversal and ensure they are only file names (single path component), with no path separators or ".." references. The best fix is to check each supplied file name for forbidden components ("/", "\", "..") before using it to construct a resolved Path
. Alternatively or additionally, after resolution, we can check that the canonical path of the resolved file remains within the intended shacl
directory. Implement this check at the start of parseShapesFromFiles
. Reject any file names failing the check with a clear error.
Changes needed:
- In
parseShapesFromFiles
, validate eachshaclFile
inshaclFiles
to ensure it:- Does not contain "/" or "\" or "..".
- After resolution, remains strictly inside
shaclRulesFolder
.
- If any check fails, throw an exception.
No new methods or imports required other than possibly java.nio.file.Paths
and/or java.io.IOException
.
-
Copy modified line R152 -
Copy modified lines R154-R162
@@ -149,8 +149,17 @@ | ||
|
||
private Shapes parseShapesFromFiles(List<String> shaclFiles) { | ||
MultiUnion shapesGraph = new MultiUnion(); | ||
Path shaclRulesFolder = dataDirectory.getConfigDir().resolve("shacl").normalize().toAbsolutePath(); | ||
for (String shaclFile : shaclFiles) { | ||
Path shaclPath = dataDirectory.getConfigDir().resolve("shacl").resolve(shaclFile); | ||
// Validate shameFile: must not contain path separators or ".." | ||
if (shaclFile.contains("/") || shaclFile.contains("\\") || shaclFile.contains("..")) { | ||
throw new IllegalArgumentException("Invalid SHACL shape file name: " + shaclFile); | ||
} | ||
Path shaclPath = shaclRulesFolder.resolve(shaclFile).normalize().toAbsolutePath(); | ||
// Ensure resolved file is inside shaclRulesFolder | ||
if (!shaclPath.startsWith(shaclRulesFolder)) { | ||
throw new IllegalArgumentException("SHACL shape file escapes rules folder: " + shaclFile); | ||
} | ||
if (!Files.exists(shaclPath)) { | ||
throw new IllegalArgumentException("SHACL shape file not found: " + shaclPath); | ||
} |
Shacl validation is using Shacl shapes to validate RDF document and can be done with Jena library (library used for the DCAT harvester).
Shacl rules are usually published with DCAT profiles eg. https://semiceu.github.io/DCAT-AP/releases/3.0.1/#validation-of-dcat-ap. Shacl is used by various tools like https://data.europa.eu/mqa/shacl-validator-ui/data-provision or https://www.itb.ec.europa.eu/shacl/dcat-ap/upload but it is sometimes hard to analyse errors and be able to easily improve DCAT compatibility. This work was also the opportunity to get more knowledge about how RDF validation works (eg. loading shapes, loading ontologies, infer model) and if DCAT validation can be similar to validation provided by INSPIRE. Validation using Shacl is quite dependent on the validation configuration and background knowledge provided so validation results depends a lot on that.
Changes:
Notes:
eu-dcat-ap-hvd
).Checklist
main
branch, backports managed with labelREADME.md
filespom.xml
dependency management. Update build documentation with intended library use and library tutorials or documentationFunded by Service Public de Wallonie