List view
- No due date•4/6 issues closed
- Due by March 31, 2024•0/2 issues closed
- Due by April 8, 2024•7/7 issues closed
- No due date•14/14 issues closed
This phase covers refining the initial approaches and expending them to other datasets. ## Glossary `packaging` - data and metadata transformations that happen prior to push to other data stores (e.g. Socrata) `flow` - a movement of data. It could be a push or a pull. `metadata` - -- attachments (pdfs, etc.), -- file-level (ie baked into shapefiles), -- external system data (ie Socrata `Legislative Compliance` section, or contact info)
Due by July 1, 2024•2/8 issues closed## Scheduled data updates - Monthly: ZTL, PLUTO, ZAP - January: CPDB, DevDB, FacDB, PFF - February: EDDE, CEQR Air, CEQR Schools
Due by March 31, 2024•9/9 issues closed- Due by March 31, 2024•1/2 issues closed
- No due date•6/6 issues closed
- Due by March 11, 2024•5/5 issues closed
- Due by March 11, 2024•6/6 issues closed
The scope of this is limited to a subset of our product build infrastructure. This is focused on how builds are invoked and configured. ## Requirements - All data loading uses the `recipe.yml` approach - Some products will be excluded from this for now either because their source data isn't in `edm-recipes` or they don't use a DB to build - All credentials are provided via 1Password
Due by January 12, 2024•3/3 issues closedCopying from #250 per brainstorms on 9/21 and 9/25 ## context - AE is building a POC of a new app ([planning doc in sharepoint](https://nyco365.sharepoint.com/:w:/s/NYCPLANNING/itd/labs/EfZY8LZXYW9Lob5410yHhBMB3HwdfiwZ2TCN3ZVTaYAXrQ?e=uZsojM&wdOrigin=TEAMS-WEB.teams.fileLink&wdExp=TEAMS-CONTROL&wdhostclicktime=1695222032777&web=1)) - AE and DE would like to minimize the amount of data engineering performed by AE and implement an ideal AE tech stack ## details and ideas - DE is helping with the planning and generating of source data for the app - the planned source data is: - Mapbox Vector Tiles (MVTs) of MapPLUTO and Zoning Districts - a Postgis DB with MapPLUTO - likely want to generate MVTs during the DE Publish stage of a product build - but DE doesn't build Zoning Districts, so should it be considered more of a pull from AE that generates all MVTs? ## needs scoping - which zoom levels should the POC MVTs be generate at? - which MapPLUTO and Zoning Districts columns should be in the MVTs? - what data cleaning does AE need (e.g. drop invalid geometries, simplify geometries)?
Due by December 29, 2023•6/6 issues closed## Project Description We'd like to use our persistent postgres database for all product build runs. ## Background Previously, all builds used a temporary postgres database created by the relevant build github action. This worked well and kept product builds isolated from each other. But this meant that the tables created during a build could not be inspected when a run succeeded or failed. ## Requirements - All builds use the `edm-data` database cluster hosted on Digital Ocean - All builds use a DB specific to the data product and a DB schema specific to the github branch used to run it
Due by December 1, 2023•7/7 issues closed## Priorities and Goals ### Product Update and release scheduled data products - Monthly: ZTL, ZAP, and PLUTO - October: CPDB - November: COLP, KPDB Enhance and create products - CPDB using new FISA source data - Create a Zoning Resolution Appendix C data pipeline (DE + Web Team + AE) - Create new Historical capital spending data product ### Operation Collaborate with AE on easy projects early in Q4 Collaborate with GIS to improve their monthly data update processes Improve DE's QA of source data and builds ### Ecosystem Standardize DE’s data dictionaries ### Community Give at least one presentation to a non-ITD audience
Due by December 31, 2023•3/3 issues closed## Priorities and Goals ### Product Update and release scheduled data products on time - PLUTO major and minors - Developments database - Facilities Database - PFF ACS DHC data - CEQR Air bi-annual? ### Operation Streamline build process for core data products to shorten and simplify builds - Have everything* building in mono repo - Reconfigure data library to host raw source data and subsequently redo backend of data library - Set up a data warehouse as the transformation engine ### Ecosystem Simplify and update internal documentation in preparation for new team member ### Community Each team member author one blog post or give one presentation
Due by September 30, 2023•2/2 issues closed