Skip to content

Document handling of non-UTF8 paths in TOC #1611

Open
@aochagavia

Description

@aochagavia

When building the TOC, it is necessary to provide the full path of each entry, as mentioned in the docs:

- **`name`** *string*
This REQUIRED property contains the name of the tar entry.
This MUST be the complete path stored in the tar file.

However, in many systems the path is not guaranteed to be UTF8, and blindly including it here could result in invalid JSON (as defined in RFC 8259). For the sake of interoperable implementations of estargz, it would be useful to document what an implementation should do when creating an estargz layer that contains non-UTF8 file paths. The only options that come to my mind are:

  1. Creating non-compliant JSON anyway, assuming whoever loads the layer will be able to handle it.
  2. Using some form of escaping when encoding the paths, which get unescaped when decoding them.

Could anyone tell me what the current implementation does (I find it difficult to read the code, because I'm unfamiliar with Go)? Using that information I'd gladly come up with a PR later.

Note: this issue also applies to the link_name field of TOCEntry.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions