-
Notifications
You must be signed in to change notification settings - Fork 9
More closely follow file system in RO-Crate metadata json #543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…otated files and folder in FileSystemTree
Little note on why I used the double type The same object satisifies two distinct use-cases, which according to semantics and the profiles each require their own type:
|
tests/ARCtrl/ARCtrl.Tests.fs
Outdated
@@ -1772,6 +1772,29 @@ let tests_ROCrate = | |||
Expect.sequenceEqual inputCol.Cells expectedCells "First table input column should have correct cells" | |||
/// Assays | |||
Expect.equal arc.AssayCount 2 "ARC should contain 2 assays" | |||
ftestCase "IncludeFilesystem" <| fun _ -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is still an ftestCase? i would assume this throws on CI. If not we can pass pyxpecto an argument to throw if ftests exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i assume this is critical as it blocks nearly 1900 tests from running
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing! That's quite a big whoopsie on my side. And yes, could you add the argument in a commit to this PR after I fix this specific case?
connected to nfdi4plants/arc-export#54
Background
In ARC Scaffold Annotation Tables,
Data
objects can be used as Inputs or Outputs. In the actual ARC Filesystem, theseData
annotations can refer to three different entities:All of these make sense and there is currently no intention to change this annotation in the ARC Scaffold.
The problem arises, when mapping the
ARC Scaffold
metadata toARC RO-Crate
metadata json. The RO-Crate is meant to be used for various tasks and should be as semantically sound as possible. This is not the case at the moment, as theFolder
objects in the Annotation Table are just mapped toFiles
in RO-Crate, losing both knowledge about its actual type as well as the files it contains (see nfdi4plants/arc-export#54 for why this is problematic).Implemented Solution
We don't want to access the actual Filesystem when mapping the metadata from Scaffold to RO-Crate, so, in this PR, I implemented logic to make use of the In-Memory Filesystem stored as a field in the ARC object. From there we can check, whether an object annotated as
Data
is actually a file or a folder and then map it accordingly.If the object is a folder, I currently add a second type
Dataset
in addition toFile
and create and reference all subfiles it contains via thehasPart
property.E.g. when we have an annotation table that references the folder
ABC.D
asInput [Data]
, which in turn contains the two filesSubFile.txt
andSubFolder/SubSubFile.txt
, the output RO-Crate metadata will contain the following objects:To go full circle, the file and folder objects referenced in the ARC RO-Crate will also be put into the Filesystem of the ARC Scaffold object.