Skip to content

Conversation

tylerjmchugh
Copy link
Contributor

Currently the report returned for batch validation is missing useful information and includes unused fields. This is problematic for users that want to validate multiple records as they will need to open the editor for each record to see the full schematron validation results.

This PR aims to fix this issue by creating a new MetadataValidationProcessingReport class and gnBatchValidationReport directive.

The new report includes missing fields like:

  • validMetadata + count
  • invalidMetadata + count
  • metadataWithWarnings + count
  • validationErrors
  • validationWarnings

The new report excludes unused ProcessingReport fields like:

  • numberOfRecordsUnchanged - Not used in validation process
  • metadataErrors + count - Replaced with validationErrors and validationWarnings
  • metadataInfos - Replaced with validMetadata
  • noProcessFoundCount - Not used in validation process

For the below samples consider the following:

  • #88994 - Valid Metadata
  • #88995 - Invalid Metadata
  • #88996 - Validation produces warnings

The current API response:

{
  "errors": [],
  "infos": [],
  "uuid": "23f7b983-a689-4272-bac4-25136140690d",
  "metadata": [
    88995,
    88994,
    88996
  ],
  "metadataErrors": {
    "88995": [
      {
        "message": "(a11fa1e5-dd91-4871-91a9-5071ab61be0c) Is invalid",
        "uuid": "a11fa1e5-dd91-4871-91a9-5071ab61be0c",
        "draft": true,
        "approved": false,
        "date": "2025-08-11T12:51:53.373Z",
        "stack": ""
      },
      {
        "message": "cvc-enumeration-valid: Value '' is not facet-valid with respect to enumeration '[farming, biota, boundaries, climatologyMeteorologyAtmosphere, economy, elevation, environment, geoscientificInformation, health, imageryBaseMapsEarthCover, intelligenceMilitary, inlandWaters, location, oceans, planningCadastre, society, structure, transportation, utilitiesCommunication]'. It must be a value from the enumeration. (Element: gmd:MD_TopicCategoryCode with parent element: gmd:topicCategory)",
        "uuid": "a11fa1e5-dd91-4871-91a9-5071ab61be0c",
        "draft": true,
        "approved": false,
        "date": "2025-08-11T12:51:53.388Z",
        "stack": ""
      },
      {
        "message": "cvc-type.3.1.3: The value '' of element 'gmd:MD_TopicCategoryCode' is not valid. (Element: gmd:MD_TopicCategoryCode with parent element: gmd:topicCategory)",
        "uuid": "a11fa1e5-dd91-4871-91a9-5071ab61be0c",
        "draft": true,
        "approved": false,
        "date": "2025-08-11T12:51:53.397Z",
        "stack": ""
      },
      {
        "message": "Value is required for Topic Category",
        "uuid": "a11fa1e5-dd91-4871-91a9-5071ab61be0c",
        "draft": true,
        "approved": false,
        "date": "2025-08-11T12:51:53.406Z",
        "stack": ""
      },
      {
        "message": "Cited Responsible Party - Electronic mail address is required in both languages",
        "uuid": "a11fa1e5-dd91-4871-91a9-5071ab61be0c",
        "draft": true,
        "approved": false,
        "date": "2025-08-11T12:51:53.415Z",
        "stack": ""
      }
    ]
  },
  "metadataInfos": {
    "88994": [
      {
        "message": "Is valid",
        "uuid": "4a577daf-1f96-4095-a08a-d1be96782e55",
        "draft": true,
        "approved": false,
        "date": "2025-08-11T12:51:51.338Z"
      }
    ],
    "88996": [
      {
        "message": "Is valid",
        "uuid": "b230177b-ce8f-4416-856b-3e97aea27cd7",
        "draft": true,
        "approved": false,
        "date": "2025-08-11T12:51:52.61Z"
      }
    ]
  },
  "numberOfRecordsUnchanged": 0,
  "numberOfNullRecords": 0,
  "numberOfRecords": 3,
  "numberOfRecordNotFound": 0,
  "numberOfRecordsNotEditable": 0,
  "numberOfRecordsProcessed": 3,
  "numberOfRecordsWithErrors": 1,
  "running": false,
  "startIsoDateTime": "2025-08-11T12:51:48.903Z",
  "endIsoDateTime": "2025-08-11T12:51:54.299Z",
  "ellapsedTimeInSeconds": 5,
  "totalTimeInSeconds": 5,
  "type": "SimpleMetadataProcessingReport"
}

The API response after these changes:

{
  "uuid": "bb8311c1-2565-4ce7-baf7-e53de1d14660",
  "metadata": [
    88995,
    88994,
    88996
  ],
  "validMetadata": {
    "88994": {
      "message": "Is valid",
      "uuid": "aac5a82d-d3ae-45b7-b0a7-c95cd39b2f00",
      "draft": true,
      "approved": false,
      "date": "2025-08-11T12:49:05.566Z"
    },
    "88996": {
      "message": "Is valid",
      "uuid": "7ef939a8-4016-490a-86f0-8bdbaabd7a45",
      "draft": true,
      "approved": false,
      "date": "2025-08-11T12:49:05.698Z"
    }
  },
  "invalidMetadata": {
    "88995": {
      "message": "Is invalid",
      "uuid": "30717ed1-f96a-4432-abd2-4610c7b3f817",
      "draft": true,
      "approved": false,
      "date": "2025-08-11T12:49:05.792Z"
    }
  },
  "metadataWithWarnings": {
    "88996": {
      "message": "Has warnings",
      "uuid": "7ef939a8-4016-490a-86f0-8bdbaabd7a45",
      "draft": true,
      "approved": false,
      "date": "2025-08-11T12:49:05.698Z"
    }
  },
  "validationErrors": {
    "88995": [
      {
        "schematron": "XSD",
        "messages": {
          "NO_PATTERN_TITLE": [
            "cvc-enumeration-valid: Value '' is not facet-valid with respect to enumeration '[farming, biota, boundaries, climatologyMeteorologyAtmosphere, economy, elevation, environment, geoscientificInformation, health, imageryBaseMapsEarthCover, intelligenceMilitary, inlandWaters, location, oceans, planningCadastre, society, structure, transportation, utilitiesCommunication]'. It must be a value from the enumeration. (Element: gmd:MD_TopicCategoryCode with parent element: gmd:topicCategory)",
            "cvc-type.3.1.3: The value '' of element 'gmd:MD_TopicCategoryCode' is not valid. (Element: gmd:MD_TopicCategoryCode with parent element: gmd:topicCategory)"
          ]
        }
      },
      {
        "schematron": "schematron-rules-common",
        "messages": {
          "Data Identification": [
            "Value is required for Topic Category"
          ]
        }
      }
    ]
  },
  "validationWarnings": {
    "88996": [
      {
        "schematron": "schematron-rules-resource-content",
        "messages": {
          "Sample_Rejected.pdf": [
            "The document must include a Title (dc:title) in its metadata so that screen-readers can announce the title to users"
          ]
        }
      }
    ]
  },
  "numberOfRecords": 3,
  "numberOfRecordsProcessed": 3,
  "numberOfValidRecords": 2,
  "numberOfInvalidRecords": 1,
  "numberOfRecordsWithValidationWarnings": 1,
  "numberOfNullRecords": 0,
  "numberOfRecordsNotEditable": 0,
  "startIsoDateTime": "2025-08-11T12:49:05.224Z",
  "endIsoDateTime": "2025-08-11T12:49:06.616Z",
  "ellapsedTimeInSeconds": 1,
  "totalTimeInSeconds": 1,
  "running": false,
  "type": "MetadataValidationProcessingReport"
}

The current UI:

Screenshot 2025-08-08 090632 Screenshot 2025-08-08 090655

After these changes:

image image

Most importantly, without these changes we cannot see any warnings messages and it is unclear where the error messages originate from. It is also easier to determine which / how many records are valid, invalid, and have warnings with the proposed changes.

Checklist

  • I have read the contribution guidelines
  • Pull request provided for main branch, backports managed with label
  • Good housekeeping of code, cleaning up comments, tests, and documentation
  • Clean commit history broken into understandable chucks, avoiding big commits with hundreds of files, cautious of reformatting and whitespace changes
  • Clean commit messages, longer verbose messages are encouraged
  • API Changes are identified in commit messages
  • Testing provided for features or enhancements using automatic tests
  • User documentation provided for new features or enhancements in manual
  • Build documentation provided for development instructions in README.md files
  • Library management using pom.xml dependency management. Update build documentation with intended library use and library tutorials or documentation

@ianwallen ianwallen added this to the 4.4.9 milestone Aug 11, 2025
@ianwallen ianwallen added the api change Indicate a change in the API label Aug 11, 2025
namespaces.add(Namespace.getNamespace("geonet", "http://www.fao.org/geonetwork"));
namespaces.add(Namespace.getNamespace("svrl", "http://purl.oclc.org/dsdl/svrl"));

restructureReportToHavePatternRuleHierarchy(schemaTronReport);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method addAllReportsMatchingRequirement is called 2 times with the schemaTronReport.

As restructureReportToHavePatternRuleHierarchywill be applied twice to the schemaTronReport and changes it's structure. Have you verified that does cause any side effects?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latest push I moved restructureReportToHavePatternRuleHierarchy so that it can be called before the addAllReportsMatchingRequirement calls to avoid restructuring twice


List<?> failedAssert = Xml.selectNodes(schemaTronReport,
"geonet:report[@geonet:required = '" + SchematronRequirement.REQUIRED + "']/svrl:schematron-output/svrl:failed-assert",
// Get all the errors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file contains some non-used imports, I'm not sure if related to the pull request, but can you update it to remove them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in latest push

<span data-translate="">batchValidationReport</span>
</strong>
</div>
<div
Copy link
Member

@josegar74 josegar74 Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add some styling to the report panel (the div in line 11) to display scrollbars, otherwise the dialog gets too long if there are many entries in the table. Something like:

   max-height: 400px;
   overflow: auto;

Before:
Screenshot 2025-08-12 at 09 37 00

After:

Screenshot 2025-08-12 at 09 37 09

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in latest push

Copy link
Member

@josegar74 josegar74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some comments / suggestions, but other than that the code changes look fine and the reporting is much more useful.

To consider if it should be backported to 4.2.x or not.

@tylerjmchugh tylerjmchugh requested a review from josegar74 August 14, 2025 13:01
@jahow
Copy link
Member

jahow commented Aug 27, 2025

This looks really good @tylerjmchugh, thanks! Since this contains an API change I would suggest that we keep this in 4.4.x. Would that make sense to you?

@ianwallen
Copy link
Contributor

I have removed the 4.2.x backport.

Copy link
Member

@josegar74 josegar74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the latest changes. I noticed an issue changing a schematron from required to warning (or the other way around), I need to restart the application so the report shows changed validation errors in the correct section. It seems some kind of cache issue.

Test:

  1. Validate the sample metadata with default schematron configuration (all required) --> some errors reported related to Datacite for ISO19139

  2. Update the validation rule to be optional

  3. Reissue the validation --> Datacite elements are still in the errors section.

  4. Restart the application and reissue the validation --> Datacite elements are displayed in the warning section.

@tylerjmchugh
Copy link
Contributor Author

Thanks for the latest changes. I noticed an issue changing a schematron from required to warning (or the other way around), I need to restart the application so the report shows changed validation errors in the correct section. It seems some kind of cache issue.

Test:

  1. Validate the sample metadata with default schematron configuration (all required) --> some errors reported related to Datacite for ISO19139
  2. Update the validation rule to be optional
  3. Reissue the validation --> Datacite elements are still in the errors section.
  4. Restart the application and reissue the validation --> Datacite elements are displayed in the warning section.

@josegar74

I have identified the issue. There seems to be a caching mechanism for the validation reports so the old validation result (where the schematron is REQUIRED) is being used.

The validation report cache will also cause issues for validations performed by external APIs. For example we have a schematron which calls an API to validate resources associated with the record. Since the cached validation report is only deleted on an update to the metadata if the resource is updated the validation result would be stale.

To resolve these issues I have removed the validation report caching mechanism.

@jahow jahow modified the milestones: 4.4.9, 4.4.10 Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Indicate a change in the API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants