-
-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable multi_asset
subsetting
#2773
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## dev #2773 +/- ##
=====================================
Coverage 88.7% 88.7%
=====================================
Files 90 90
Lines 10994 11000 +6
=====================================
+ Hits 9758 9764 +6
Misses 1236 1236 ☔ View full report in Codecov by Sentry. |
}, | ||
), | ||
], | ||
materialize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope! This is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I think this whole script could probably be replaced with dagster asset materialize
, like:
$ dagster asset materialize -m pudl.etl --select "raw_eia860__fgd_equipment"
Which works on this branch but fails with the This AssetsDefinition does not support subsetting.
error on dev
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True but I'm not sure you can feed the CLI command run config like you can with the materialize()
function which is helpful to running subsets of years:
Usage: dagster asset materialize [OPTIONS]
Execute a run to materialize a selection of assets
Options:
-a, --attribute TEXT Attribute that is either a 1) repository or job or 2) a function that returns a
repository or job
--package-name TEXT Specify Python package where repository or job function lives
-m, --module-name TEXT Specify module where dagster definitions reside as top-level symbols/variables and
load the module as a code location in the current python environment.
-f, --python-file PATH Specify python file where dagster definitions reside as top-level symbols/variables
and load the file as a code location in the current python environment.
-d, --working-directory TEXT Specify working directory to use when loading the repository or job
--select TEXT Asset selection to target [required]
--partition TEXT Asset partition to target
-h, --help Show this message and exit.
Also, can we use the vs code debugger with the CLI command?
# "raw_emissions_control_eia923", | ||
# "raw_eia923__emissions_control", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we've moved on to a new DOI by now if we want to turn this back on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whee, I'm glad this is finally getting into dev
!
Questions I'd like to see addressed before merging (and once you address them, feel free to dismiss this review and merge!)
I think we can just get rid of materialize_asset.py
completely. What do you think?
There are still lots of multi_asset
s that don't support subsetting on this branch - how did you choose what to change and what to not change?
Non-blocking comment:
It would also be good to follow up on Zane's comment, about the new DOI maybe fixing things.
}, | ||
), | ||
], | ||
materialize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though I think this whole script could probably be replaced with dagster asset materialize
, like:
$ dagster asset materialize -m pudl.etl --select "raw_eia860__fgd_equipment"
Which works on this branch but fails with the This AssetsDefinition does not support subsetting.
error on dev
.
environments/conda-lock.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can exclude these conda-lock files from rendering by default by marking them as 'generated files': https://docs.github.com/en/repositories/working-with-files/managing-files/customizing-how-changed-files-appear-on-github
return ( | ||
Output(output_name=table_name, value=df) for table_name, df in glue_dfs.items() | ||
) | ||
for table_name, df in glue_dfs.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking - would a generator expression work here instead?
return (
Output(output_name=table_name, value=df)
for table_name, df in glue_dfs.items()
if table_name in context.selected_output_names
)
? I guess that's not really much better, is it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. I am just working off of the dagster docs examples. IDK if dagster cares if its a return or yield statement.
PR Overview
This PR enables
multi_asset
subsetting so we can run a single asset that has an upstream dependency created by amulti_asset
. Without it enabled we get this kind of error:This makes it annoying/impossible to debug single assets using a debugger.
This PR also simplifies our debugging script called
devtools/materialize_asset.py
TODOs
PR Checklist
dev
).