Skip to content

Detect Data Package version with version() function #262

Open
@peterdesmet

Description

@peterdesmet

To support Data Package v2 we need to be able to detect the version used by a package.

  • $schema is undefined
    • version = 1.0
    • We theoretically should look at the profile property in this case (see backward compatibility note) but frictionless-r ignores this property since it doesn't use it (it is useful for validation etc.).
  • $schema = https://datapackage.org/profiles/1.0/datapackage.json
    • version = 1.0
    • profile is ignored (since new property $schema is used)
  • $schema = https://datapackage.org/profiles/2.0/datapackage.json
    • version = 2.0
    • profile is ignored (deprecated in 2.0)
  • $schema = https://datapackage.org/profiles/2.1-rc.1/datapackage.json
    • version = 2.1-rc.1 (theoretical example)
    • profile is ignored
  • $schema = https://fiscal.datapackage.org/profiles/fiscal-data-package.json, https://raw.githubusercontent.com/tdwg/camtrap-dp/1.0.1/camtrap-dp-profile.json or any other value
    • version = >=2.0: we can't detect the version, but it is higher or equal to 2.0 since $schema is used.
    • profile is ignored

Even if we would read profile, the end result would still be version = 1.0

  • profile is undefined
  • profile = data-package
    • version = 1.0 (implied by profile use)
  • profile= tabular-data-package, fiscal-data-package, https://raw.githubusercontent.com/tdwg/camtrap-dp/1.0.1/camtrap-dp-profile.json or any other value
    • version = 1.0 (implied by profile use)

In my opinion, the best way to implement this is with a version(package) function. This allows us in the future to create a version()<- function. Alternative names:

  • get_version(): this limits us in the future from having a version(). I'm tempted to rename all functions that start with get_
  • package_version(): the version logic is the same for package, resource, dialect, schema: it's just the name of the file in the URL that is different (datapackage.json, dataresource.json). I therefore think we can make one function for all of these, rather than four functions.

I think we can generalize to a version(list) function:

  • Use the logic above for any incoming list (JSON). Search for $schema and get the version from the URL if it starts with https://datapackage.org, otherwise use 1.0 (if undefined $schema) or >2.0.
  • Setting the version is a bit harder, since - especially for a value like https://raw.githubusercontent.com/tdwg/camtrap-dp/1.0.1/camtrap-dp-profile.json if it's a package, resource, etc. I think this can be solved with an extra argument in the set function: version(level = "schema") <- 2.0 would assign https://datapackage.org/profiles/2.0/tableschema.json

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions