Description
I wanted to assess the complexity of converting a v1 to a v2 Data Package. Below are the steps that need to be taken. For version detection, see #262. @khusmann could you review these? There are a couple of items I'm unsure about.
Package
Add package.$schema, remove package.profile
Use package.profile
, then remove it.
-
NULL
=>https://datapackage.org/profiles/2.0/datapackage.json
-
data-package
(registered id) =>https://datapackage.org/profiles/2.0/datapackage.json
-
tabular-data-package
(registered id) =>https://datapackage.org/profiles/2.0/datapackage.json
. This also removes deprecated tabular-data-package -
fiscal-data-package
(registered id) => Unsure, should we use the 1.0 URL for fiscal-data-package? - A URL => Unsure, the referenced schema will likely point to Data Package v1, making it a v1
- Any other value => Unsure, not allowed by https://specs.frictionlessdata.io/profiles/
Add package.contributors.roles
- For each contributor set
roles
(array) based onrole
(string). Removerole
Other changes
- package.version: documentation change, no action required
- package.contributors: no action required for
title
,givenName
andfamilyName
. - package.sources: Unsure, but I think no action is required
Each resource
Add resource.$schema, remove resource.profile
Use resource.profile
, then remove it
-
NULL
=>https://datapackage.org/profiles/2.0/dataresource.json
-
data-resource
(registered id) =>https://datapackage.org/profiles/2.0/dataresource.json
-
tabular-data-resource
(registered id) =>https://datapackage.org/profiles/2.0/dataresource.json
(but seeresource.type
) - A URL => Unsure, the referenced schema will likely point to Data Package v1, making it a v1
- Any other value => Unsure, not allowed by https://specs.frictionlessdata.io/profiles/
- There is also the edge case where
$schema
is already present (i.e. a v1 package with a v2 resource). => Unsure, should the presentresource.$schema
be left as is then?
Add resource.type
Use resource.profile
:
-
NULL
=> don't set -
tabular-data-resource
=>table
- Any other value or URL => don't set
Other changes
- resource.sources: no change required
- resource.name: rules are relaxed, existing names can remain as is
- resource.path: dot-paths are now forbidden. In the edge case there is such a path provided, we should not convert it, because it is impossible to know what would be the correct path. These types of paths will be flagged when reading a resource.
- resource.encoding: allows more, no action required
For each dialect
Note that upconverting a dialect requires a remote one to be downloaded and verbosely included.
Add dialect.$schema
-
dialect.caseSensitiveHeader
is present =>https://datapackage.org/profiles/1.0/tabledialect.json
-
dialect.csvddfVersion
is present =>https://datapackage.org/profiles/1.0/tabledialect.json
- Otherwise this can safely be set to
https://datapackage.org/profiles/2.0/tabledialect.json
Unsure about this though. For example, if a dialect was absent (very often the case), one will be added with just the $schema
property. The alternative is to leave all dialects as v1 (assuming a $schema
that defaults to https://datapackage.org/profiles/1.0/tabledialect.json
). That would also mean that remote dialects can stay remote.
Other changes
- dialect.table: new property, no action required
For each schema
Note that upconverting a schema requires a remote one to be downloaded and verbosely included.
Add schema.$schema
- Set to
https://datapackage.org/profiles/2.0/tableschema.json
because we will update the schema it to that version.
Update schema.primaryKey
- Convert from string to array.
Update schema.foreignKeys
- Convert
schema.foreignKeys.fields
from string to array - Convert
schema.foreignKeys.reference[x].fields
from string to array - If
schema.foreignKeys.reference[x].resource
= resource name => remove property
No action required
- schema.missingValues: old format still valid, no action required
- schema.fieldMatch: this is
exact
for all v1, but that is also the default for this field, so no need to set it - schema.uniqueKeys: new property, no action required
For each field
Other changes
- field.categories: new property, no action => We can't assume that every field with an
enum
should be converted to a field withcategories
. - fields.categoriesOrdered: new property, no action required
- fields.missingValues: new property, no action required
- integer field type:
groupChar
is a new property, no action required - list field type: new property, no action required
- datetime field type: default format merely extends current one, no action required
- geopoint field type: documentation update, no action required
- any field type: no conversion needed, but frictionless needs to interpret differently when reading Do not guess
type = any
, potentially provide opt-in #168 - min/max constraints: can now be used for duration, no action needed
- exclusiveMin/Max constraints: new property, no action required
- jsonSchema constraint: new property, no action required