Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finalize/clean up ingest models, add additional preprocessing #1166

Merged
merged 13 commits into from
Oct 16, 2024
1 change: 1 addition & 0 deletions dcpy/lifecycle/ingest/configure.py
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@ def get_config(
return Config(
id=template.id,
version=version,
attributes=template.attributes,
archival_timestamp=run_details.timestamp,
raw_filename=filename,
acl=template.acl,
Expand Down
8 changes: 8 additions & 0 deletions dcpy/lifecycle/ingest/templates/dcp_commercialoverlay.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
id: dcp_commercialoverlay
acl: public-read

attributes:
name: DCP NYC Commercial Overlay Districts
description: |
Polygon features representing the within-tax-block limits for commercial overlay districts,
as shown on the DCP zoning maps. Commercial overlay district designations are indicated in the OVERLAY attribute.
url: https://www1.nyc.gov/site/planning/data-maps/open-data/dwn-gis-zoning.page#metadata

target_crs: EPSG:4326
source:
type: edm_publishing_gis_dataset
Expand Down
17 changes: 12 additions & 5 deletions dcpy/models/lifecycle/ingest.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,20 +45,24 @@ class PreprocessingStep(BaseModel):
mode: str | None = None


class DatasetAttributes(BaseModel):
name: str | None = None
description: str | None = None
url: str | None = None
custom: dict | None = None


class Template(BaseModel, extra="forbid", arbitrary_types_allowed=True):
"""Definition of a dataset for ingestion/processing/archiving in edm-recipes"""

id: str
acl: recipes.ValidAclValues

target_crs: str | None = None
attributes: DatasetAttributes | None = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make it a required attribute? To avoid creation of new templates without relevant dataset info

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like the right choice


## these two fields might merge to "source" or something equivalent at some point
## for now, they are distinct so that they can be worked on separately
## when implemented, "None" should not be valid type
target_crs: str | None = None
source: Source
file_format: file.Format

processing_steps: list[PreprocessingStep] = []

## this is the original library template, included just for reference while we build out our new templates
Expand All @@ -72,6 +76,9 @@ class Config(BaseModel, extra="forbid", arbitrary_types_allowed=True):
"""

id: str = Field(validation_alias=AliasChoices("id", "name"))

attributes: DatasetAttributes | None = None

version: str
archival_timestamp: datetime
check_timestamps: list[datetime] = []
Expand Down