Terminology around datasets, files, packages, etc. #793
Replies: 2 comments 6 replies
-
I'm not quite sure what else you'd call it. For the distinction, let's use a simplified cpdb as an example dataset_package:
datasets:
- name: cpdb_projects
type: csv
- name: cpdb_commitments
type: csv
- name: cpdb_projects_mapped
type: shapefile For these, cpdb_projects and cpdb_commitments are definitively different datasets, so maybe we'd be okay with calling this segment datasets. However, the two projects outputs represent the same underlying data just in a different format, so maybe we need a different label? OR is it that you feel that either way, "dataset" is potentially an ill-fitting choice of word |
Beta Was this translation helpful? Give feedback.
-
a tricky thing about tables too is that we very often have more than one single table at the end of a build that's used to generate files we distribute. the shorter the list of "final" tables the better though |
Beta Was this translation helpful? Give feedback.
-
When @damonmcc and I were working on metadata for our datasets (in the context of the Socrata automations) we arrived at the following:
dataset_package
: the full collection of files for a product release, basically thedataset
(see below) + attachmentsdataset
: an output of the table, with a specific file format (ignore for now that an fgdb might have multiple tables/layers)attachments
: pretty self-explanatoryHere's an example:
But I don't think this is quite right - I don't think the shapefile output of Pluto is a really a "dataset." What is it though?
Also happy to get input on other names. This also makes me think we should start a glossary in our repo. cc @fvankrieken @damonmcc @sf-dcp
Beta Was this translation helpful? Give feedback.
All reactions