-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make it possible to add non-PUDL data sources and metadata to the archiver #506
Conversation
src/pudl_archiver/frictionless.py
Outdated
"""Create a datapackage for sources that won't end up in PUDL.""" | ||
# TODO: This is a slightly ugly workaround to avoid having to add sources into | ||
# the from_id method - should I just fix this at the source? | ||
data_source = DataSource( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if you change this to call from_id
instead of dict_from_id
, that should fix the broken tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, I see now that from_id
doesn't take the sources
dict right now. It's a bit of a pain to have to do another PUDL PR, but I think the most consistent API would be to add sources
to the from_id
method, since you're basically just copying what from_id
does here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I think updating from_id
to take sources
as a parameter would be my preference to fix the test issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, got sidetracked earlier, but looks good now!
Overview
A lightweight temporary solution to #499.
What problem does this address?
In concert with catalyst-cooperative/pudl#4003, makes it possible to add non-PUDL metadata sources to the archiver directly.
What did you change in this PR?
metadata
folder that replicates the one in the PUDL repository, but only contains information about data sources that aren't integrated into PUDL (currently only MECS).from_non_pudl_metadata()
method in theDataPackage
class which calls theNON_PUDL_SOURCES
dictionary (should work after the PR in the PUDL repository merges to make this possible).README
mecs
toeiamecs
to make it more consistent with our other data sourcessources.py
in preparation for the hackathon, while I was at it!Testing
How did you make sure this worked? How can a reviewer verify this?
pudl_archiver --dataset eiamecs --sandbox --initialize
To-do list
Tasks