Skip to content

Archive EPA eGrid #517

Closed
Closed
@cmgosnell

Description

@cmgosnell

Motivation and context:

Briefly describe the dataset. What is it, and why do we want to archive it regularly?
Include a link to the dataset webpage and any metadata documentation.

The links to all of the files show up on these two pages above, but the urls where the data is actually stored all seem to follow this pattern:

https://www.epa.gov/system/files/documents/{year of publication}-{month}/egrid{data year}{file name}

Note that the publication date is different than the data year. The data year is what we want to reference when we archive this data.

Also note that there are several files per year. We want to grab all of them so you'll need to use add_to_archive.

Requirements for archiving

To be archived on Zenodo, a dataset must be:

  • published under an open license that permits reuse and redistribution
  • less than 50Gb in size (when zipped)
  • relevant to energy modelling and research

Checklist for archive creation

Based on the README documentation on creating a new archive:

- [x] [Define the dataset's metadata](https://github.com/catalyst-cooperative/pudl-archiver#step-1-define-the-datasets-metadata)
- [ ] [Implement archiver interface](https://github.com/catalyst-cooperative/pudl-archiver#step-2-implement-archiver-interface)
- [ ] [Test archiver locally](https://github.com/catalyst-cooperative/pudl-archiver#step-3-test-archiver-locally)
- [ ] [Test uploading to Zenodo](https://github.com/catalyst-cooperative/pudl-archiver#step-4-test-uploading-to-zenodo)
- [ ] [Manually review archive before publication](https://github.com/catalyst-cooperative/pudl-archiver#step-5-manually-review-your-archive-before-publication)
- [ ] [Finalize archive](https://github.com/catalyst-cooperative/pudl-archiver#step-6-finalizing-the-archive) (only core Catalyst developers can complete this step)
- [ ] [Automate archiving](https://github.com/catalyst-cooperative/pudl-archiver#step-7-automate-archiving)

Links to published archives:

Include a link to the published sandbox archive for review.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions