Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permit file inventory of packaged content (TAR/ZIP) - E-ARK AIP format requirement #68

Open
shsdev opened this issue Jun 2, 2021 · 3 comments

Comments

@shsdev
Copy link

shsdev commented Jun 2, 2021

The following issue description is part of a requirement to define AIP storage recommendations for E-ARK AIPs.

The inventory.json could allow listing the file inventory of packaged archive files, such as TAR or ZIP.

This would allow documenting the changes (updates/additions/deletions) between different versions of packaged archive files.

To give an example, the following directory contains an archive file mydataobject.tar:

mydataobject/data
├── 0=ocfl_object_1.0
├── inventory.json
├── inventory.json.sha512
└── v001
    └── content
        └── mydataobject_v001.tar

The OCFL inventory would allow documenting the versions of this data object:

{
    "digestAlgorithm": "sha512",
    "fixity": {
        "md5": {
            "e5ad509db4ddb4cef0de4c1c19c7988b": [
                "v001/content/mydataobject_v001.tar"
            ]
        },
        "sha256": {
            "68a5b60ddef62758389f6894a1e7df28c1d228a5d56d2eec3ce2f74e80c27910": [
                "v001/content/mydataobject_v001.tar"
            ]
        }
    },
    "head": "v001",
    "id": "urn:uuid:1017cc9b-eaed-4064-947e-a07c752d3760",
    "manifest": {
        "24db03a2a7d9c7e2e7ea533e2ac84b7274f937eaff31e95f508cd9c5418a902adf5c18d2f67fa80aa25b7d72ce829951e79ea66210959c86aab33b5ef0c8b8bc": [
            "v001/content/mydataobject_v001.tar"
        ]
    },
    "type": "https://ocfl.io/1.0/spec/#inventory",
    "versions": {
        "v001": {
            "created": "2021-03-27T18:49:22Z",
            "message": "Initial data object",
            "state": {
                "24db03a2a7d9c7e2e7ea533e2ac84b7274f937eaff31e95f508cd9c5418a902adf5c18d2f67fa80aa25b7d72ce829951e79ea66210959c86aab33b5ef0c8b8bc": [
                    "v001/content/mydataobject_v001.tar"
                ]
            }
        }
    }
}

It would be desirable to have an option to create an inventory of the packaged archive file as if it would be unpackaged. So instead of the directory listing with the TAR file above, it would treat the TAR file as if it would be unpackaged, for example as follows (in this case with a bagit container inside):

mydataobject/data
├── 0=ocfl_object_1.0
├── inventory.json
├── inventory.json.sha512
└── v00000
    └── content
        └── mydataobject_v001
            ├── bag-info.txt
            ├── bagit.txt
            ├── data
            │   ├── data_file1.pdf
            │   ├── data_file2.pdf
            │   └── ...
            ├── manifest-sha256.txt
            ├── manifest-sha512.txt
            ├── tagmanifest-sha256.txt
            └── tagmanifest-sha512.txt

This would allow using OCFL to document updates/additions/deletions in archived container files.

@shsdev shsdev changed the title Permit file inventory of packaged content (TAR/ZIP) Permit file inventory of packaged content (TAR/ZIP) - E-ARK AIP format requirement Jun 7, 2021
@zimeon
Copy link
Contributor

zimeon commented Jun 10, 2021

The first two example snippets are supported by OCFL v1.0 - one can have a tar file or any other sort of file as content. I think the interesting question is whether we should support some elements that "look inside" package files. We should probably write up a use case here (some similarity with OCFL/Use-Cases#33)

@shsdev
Copy link
Author

shsdev commented Jun 10, 2021

The first examples show how we are currently using OCFL.
Would have been better if we provided the unpackaged version of the OCFL example, this is only given in form of the last tree listing.
The request is indeed for the "look inside" package files: "It would be desirable to have an option to create an inventory of the packaged archive file as if it would be unpackaged.".

@neilsjefferies
Copy link
Member

Adding this to the OCFL inventory raises a number of issues about supported container file formats and ongoing compatibility. This also needs to be optional for those who do not require such functionality. Therefore we propose that this becomes an Extension which defines an OCFL-like sidecar inventory for this use case.

@rosy1280 rosy1280 transferred this issue from OCFL/spec Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants