Formatter / schema.org / Add croissant spec 🥐 #8939
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
"Croissant 🥐 is a high-level format for machine learning datasets that combines metadata, resource file descriptions, data structure, and default ML semantics into a single file; it works with existing datasets to make them easier to find, use, and support with tools. Croissant builds on schema.org, and its Dataset vocabulary, a widely used format to represent datasets on the Web, and make them searchable." https://docs.mlcommons.org/croissant/
Croissant is extending schema.org, this improvement review the current schema.org formatter to support additional 🥐 metadata available in ISO format. This is mainly about adding:
Refactor JSON-LD formatter for using same base formatter for both ISO19139 and ISO19115-3 to facilitate maintenance (similar to citation and DCAT formatter).
Improve formatter producing JSON output by ensuring the output is JSON valid, format it and log any error in order to be able to track errors and improve not well managed encoding.
schema.org improvement:
inLanguage
correspond to the resource language, not the metadata language.producer
(eg. provider, producer, copyrightHolder, publisher, author, funder)temporalCoverage
) if no corresponding element in input documentSimilar initiatives:
Checklist
main
branch, backports managed with labelREADME.md
filespom.xml
dependency management. Update build documentation with intended library use and library tutorials or documentationFunded by BRGM.