Chaining inputs/outputs of PODs/ "sub-PODs" #159

jfbooth · 2020-12-08T15:18:18Z

jfbooth
Dec 8, 2020

My group has developed a POD that currently works as follows: Step 1: track extratropical cyclones in space and time. Step 2: use the cyclone tracks to grab other variable fields in the vicinity of the cyclones. Currently, the POD is set up with options that allow a user to choose to run: (1) Step 1 and Step 2, or (2) Step 1 only, or (3) Step 2 only - with the user providing a set of track data in a specific format. However, we are now thinking - for the sake of debugging it might be easier if the two steps are 2 separate PODs. As 2 separate PODS, what would be the work flow if someone wanted to run both of them? So the question can be generalized to: would you prefer bigger, self-contained PODs that include flags for options. Or smaller PODs some of which might depend on output from the others.

I would appreciate some feedback on this. (Also note, if this question is outside of scope of the "Issues" list, I apologize - email me and we can discuss offline - Jimmy [email protected]).

wrongkindofdoctor · 2020-12-15T16:08:28Z

wrongkindofdoctor
Dec 15, 2020

@jfbooth Larger self-contained PODs with flags that allow the user to select the step(s) they want to run are preferable.The framework does not currently support intra- or inter-POD data dependencies because of the difficulties in maintenance and debugging that tend to occur. It is also a good idea (though not strictly required at this point) to include unit tests for the different option combinations with your POD code.

That said, there may be a way to structure the framework to allow self-contained workflows with "chained" PODs that depend on results/output. If there is a demand for it, we can revisit the possibility of integrating it in the future.

0 replies

tsjackson-noaa · 2021-03-04T18:34:56Z

tsjackson-noaa
Mar 4, 2021

Migrated this from issues, where it was posted under the title "Does the framework allow implementation of a POD that uses output from a different POD."

0 replies

tsjackson-noaa · 2021-03-04T18:35:20Z

tsjackson-noaa
Mar 4, 2021

I believe we'll inevitably be drawn towards including this functionality as PODs become more complex; two self-evident use cases are

Including reusable/standardized "building blocks" of diagnostics, such as the cyclone tracker mentioned here;
Performing the same analysis on multiple model variables, only some of which may be present in the data to be analyzed (e.g. current the Wheeler-Kiladis POD).

For the purposes of package development, I think this jump in complexity is the point at which the framework itself should shift to being implemented in terms of a third-party workflow engine, rather than the ad-hoc implementation of a data pipeline we've currently written ourselves.

To meet the design needs of the project, such an engine would need to be 1) embeddable; 2) run entirely in user space (i.e., not be based on a client-server architecture) and 3) preferably be python-centric. From my notes, I've singled out luigi and its extensions sci-luigi and luigi analysis workflow as meeting these criteria, but we'll need to re-examine this when we're ready to implement this functionality.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chaining inputs/outputs of PODs/ "sub-PODs" #159

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Chaining inputs/outputs of PODs/ "sub-PODs" #159

Uh oh!

jfbooth Dec 8, 2020

Replies: 3 comments

Uh oh!

wrongkindofdoctor Dec 15, 2020

Uh oh!

tsjackson-noaa Mar 4, 2021

Uh oh!

tsjackson-noaa Mar 4, 2021

jfbooth
Dec 8, 2020

wrongkindofdoctor
Dec 15, 2020

tsjackson-noaa
Mar 4, 2021

tsjackson-noaa
Mar 4, 2021