Skip to content

Conversation

@fmigneault
Copy link
Member

@fmigneault fmigneault commented Jul 22, 2025

Description

Add utility script that parses the click definitions of relevant CLIs found under src/goldfinch/processes and generates their corresponding CWL.

Multiple options are provided to further extend the CWL metadata.

🚧 WIP 🚧

  • Find a better way to indicate outputs:

    Currently, a single output is generated as:

    outputs:
      results:
        outputBinding:
          glob: .
        type: Directory

    While this might work for local execution, it is not sufficient for https://github.com/crim-ca/weaver. It is expected that a File reference would be provided (i.e.: the file path resolving to ${outputs.results}/${inputs.output} in example scripts)

  • Find a way to propagate media-types or other I/O-specific metadata
    While not blocking deployment, that would help better describing the I/O and allowed contents they receive/produce.

References

@fmigneault fmigneault requested a review from huard July 22, 2025 21:27
@fmigneault fmigneault self-assigned this Jul 22, 2025
Copy link
Collaborator

@huard huard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm missing a bit of information to get this to work, since I'm not familiar with CWL.

From the README, I understand that click2cwl can convert a CLI call to a CWL workflow.

I first generated an test input file by running

pytest src/goldfinch/processes/indicator/test.py

Now I have /tmp/pytest-of-david/pytest-0/test_hdd0/in.nc

I can create a CWL file with

click2cwl --process ./src/goldfinch/processes/indicator/hdd.py -- heating_degree_days > /tmp/hdd.cwl

but I was surprised that it didn't register the argument heating_degree_days anywhere. I tried with the -j argument

 click2cwl --process ./src/goldfinch/processes/indicator/hdd.py --job /tmp/job.yml -- -i /tmp/pytest-of-david/pytest-0/test_hdd0/in.nc  heating_degree_days

which saved the -i input, but not the process name.

In any case, I didn't find the combination of operations that would allow me to run a test computation.

Note that I'm confused by the -w vs --cwl options, and why the output of this is defined using -o, while for jobs we provide -j <filename> directly.

@fmigneault
Copy link
Member Author

fmigneault commented Aug 20, 2025

@huard

Is click2cwl --process ./src/goldfinch/processes/indicator/hdd.py -- heating_degree_days [...] supposed to map roughly to python indicator/hdd.py heating_degree_days [...] ?

In other words, is the heating_degree_days expected to be a positional input?

If so, it is possible click2cwl is having trouble dealing with the click.MultiCommand or the command/arguments retrieved from xclim is not detected properly. It does not appear in the generated CWL.

Note that I'm confused by the -w vs --cwl options, and why the output of this is defined using -o, while for jobs we provide -j directly.

Option -w is only a shortcut for --cwl cwl (this will make sense when using Weaver that would have the same w query parameter from OGC API Processes). Option --cwl cwl is used for creating a CWL with class: Workflow that invokes a nested class: CommandLineTool. I've added the option because it is available, but in our case, we will most typically use --cwl clt (clt=CommandLineTool) directly, notably for single-process definitions. The cwl|ctl names are used since it is the same abbreviations for classes/types that click2cwl uses internally.

The -o and -j are different because you could generate both the CWL (Workflow/CommandLineTool) and the Job parameters file simultaneously.

@huard
Copy link
Collaborator

huard commented Aug 20, 2025

Yes, exactly. I can change the CLI if it makes your life easier.

Ok. One thing I want to point out is that we`ll probably want to chain multiple processes "in-memory". Although I've never tried it, I understand click does support chaining commands and holding a "context" in memory between these commands. Ideally, we'd be able to use this as well. Not sure how complicated it would be on your end.

@fmigneault
Copy link
Member Author

For the in-memory aspect, I think this is OK on the click side, but the generated CWL would look exactly like a normal 1-operation CLI command.

I'll take a quick look at what is happening an report back.

@fmigneault
Copy link
Member Author

fmigneault commented Aug 20, 2025

Invoking the command, it looks like the command: {heating_degree_days} is not passed, so it makes sense that the parser does not detect it.

The class reference (Cli) with extra commands seems to be passed, and the subcommand_metavar indicates it, but an "argument" corresponding to that command is not listed.

Any idea why that is? (I've tried with both click.MultiCommand and click.Group)

{25539A90-EB19-423F-9563-F5DE50A7C6F6}

@fmigneault
Copy link
Member Author

@huard Found a workaround: 6603311

There are some fixes to apply directly in click2cwl package to support click.argument and avoid the !!python/tuple output (not sure if that could break cwltool/weaver or not).

Provided that click.argument would be supported, I think the xclim.cli.cli() definition with the same click args/options could be used directly after wrapping it with the specific indicator argument.

@fmigneault
Copy link
Member Author

PRs created for above issues:

Will wait a few days to integrate a newer click2cwl version if released.
Otherwise, https://github.com/crim-ca/click2cwl combining them can be pinned in dependencies.

@huard
Copy link
Collaborator

huard commented Sep 2, 2025

Still having trouble to get this to work:

click2cwl --process ../xclim/src/xclim/cli.py --command cli --output /tmp/pytest-of-david/package.cwl

First it complained that there were many commands, even though I specified one. I think the logic below was flawed

-    click_functions = [
-        (name, member)
-        for name, member in
-        inspect.getmembers(cli_mod)
-        if (not name.startswith("_") or kwargs["command"] == name) and isinstance(member, click.Command)
-    ]
+    click_functions = []
+    for name, member in inspect.getmembers(cli_mod):
+        if isinstance(member, click.Command) and not name.startswith("_"):
+            if kwargs["command"] and kwargs["command"] != name:
+                continue
+            click_functions.append((name, member))

Now it doesn't understand some of the inputs:

(goldfinch) david@it-282:~/src/goldfinch$ cwltool /tmp/pytest-of-david/package.cwl 
INFO /home/david/.conda/envs/goldfinch/bin/cwltool 3.1.20250715140722
INFO Resolved '/tmp/pytest-of-david/package.cwl' to 'file:///tmp/pytest-of-david/package.cwl'
ERROR Tool definition failed validation:
../../../../tmp/pytest-of-david/package.cwl:5:1:  checking field 'inputs'
../../../../tmp/pytest-of-david/package.cwl:16:3:   checking object
                                                    '../../../../tmp/pytest-of-david/package.cwl#clt/dask_nthreads'
                                                      Field 'type' references unknown identifier
                                                      'None', tried
                                                      file:///tmp/pytest-of-david/package.cwl#None
../../../../tmp/pytest-of-david/package.cwl:39:3:   checking object
                                                    '../../../../tmp/pytest-of-david/package.cwl#clt/verbose'
                                                      Field 'type' references unknown identifier
                                                      'None', tried
                                                      file:///tmp/pytest-of-david/package.cwl#None

Indeed, the package.cwl looks like:

class: CommandLineTool
cwlVersion: v1.2
id: clt
inputs:
  chunks:
    inputBinding:
      position: 7
      prefix: --chunks
    type: string?
  dask_maxmem:
    inputBinding:
      position: 6
      prefix: --dask-maxmem
    type: string?
  dask_nthreads:
    inputBinding:
      position: 5
      prefix: --dask-nthreads
    type: None?
  engine:
    inputBinding:
      position: 8
      prefix: --engine
    type: string?
  input:
    type:
    - 'null'
    - inputBinding:
        position: 1
        prefix: -i
      items: string
      type: array
  output:
    inputBinding:
      position: 2
      prefix: -o
    type: string?
  verbose:
    inputBinding:
      position: 3
      prefix: -v
    type: None?
  version:
    inputBinding:
      position: 4
      prefix: -V
    type: boolean?
outputs:
  results:
    outputBinding:
      glob: .
    type: Directory
requirements:
  EnvVarRequirement:
    envDef: {}
  ResourceRequirement: {}
stderr: std.err
stdout: std.out

dask_nthreads is not recognized as an int, and verbose is not recognized as a boolean flag.

@fmigneault
Copy link
Member Author

First it complained that there were many commands

Yes. It seems I applied a or kwargs["command"] instead of and for whatever reason.
Your edit looks good.

Now it doesn't understand some of the inputs:

Will have to investigate where the None comes from. Very odd that it detects the others but not these 2 specific inputs.

@fmigneault
Copy link
Member Author

fmigneault commented Sep 2, 2025

Found the problem. They are not defined!
https://github.com/Terradue/click2cwl/blob/f13937d0e1598e29016d5014007694e2b1ec6b5f/src/click2cwl/cwlparam.py#L146-L150

Will do a follow PR with the maintainers to add the missing types supported by CWL.
Terradue/click2cwl#11

With the above PR fix applied:

click2cwl \
  -p src/goldfinch/processes/indicator/hdd.py \
  -m id=heating_degree_days \
  --cwl-version v1.2 \
  --docker birdhouse/goldfinch:0.1.0 \
  -m author=fmigneault \
  -e TEST=VALUE \
  -- \
  heating_degree_days
$namespaces:
  s: https://schema.org/
$schemas:
- http://schema.org/version/9.0/schemaorg-current-http.rdf
baseCommand: python -m goldfinch.processes.indicator.hdd
class: CommandLineTool
cwlVersion: v1.2
hints:
  DockerRequirement:
    dockerPull: birdhouse/goldfinch:0.1.0
id: heating_degree_days
inputs:
  chunks:
    inputBinding:
      position: 8
      prefix: --chunks
    type: string?
  dask_maxmem:
    inputBinding:
      position: 7
      prefix: --dask-maxmem
    type: string?
  dask_nthreads:
    inputBinding:
      position: 6
      prefix: --dask-nthreads
    type: int?
  engine:
    inputBinding:
      position: 9
      prefix: --engine
    type: string?
  help:
    inputBinding:
      position: 2
      prefix: -h
    type: boolean?
  indicator:
    inputBinding:
      position: 1
      prefix: --indicator
    type:
    - symbols:
      - HUMIDEX
      - HEAT_INDEX
      - TG
      - WIND_SPEED_FROM_VECTOR
      - WIND_VECTOR_FROM_SPEED
      - WIND_POWER_POTENTIAL
      - WIND_PROFILE
      - E_SAT
      - HURS_FROMDEWPOINT
      - HURS
      - HUSS
      - HUSS_FROMDEWPOINT
      - VAPOR_PRESSURE_DEFICIT
      - PRSN
      - PRLP
      - WIND_CHILL
      - POTENTIAL_EVAPOTRANSPIRATION
      - WATER_BUDGET_FROM_TAS
      - WATER_BUDGET
      - CORN_HEAT_UNITS
      - UTCI
      - MEAN_RADIANT_TEMPERATURE
      - SHORTWAVE_UPWELLING_RADIATION_FROM_NET_DOWNWELLING
      - LONGWAVE_UPWELLING_RADIATION_FROM_NET_DOWNWELLING
      - CLEARNESS_INDEX
      - RAIN_FRZGR
      - RX1DAY
      - MAX_N_DAY_PRECIPITATION_AMOUNT
      - WETDAYS
      - WETDAYS_PROP
      - DRY_DAYS
      - DRYNESS_INDEX
      - CWD
      - CDD
      - SDII
      - MAX_PR_INTENSITY
      - PRCPTOT
      - PRCPAVG
      - WET_PRCPTOT
      - LIQUIDPRCPTOT
      - LIQUIDPRCPAVG
      - SOLIDPRCPTOT
      - SOLIDPRCPAVG
      - xclim.core.indicator.SPI
      - xclim.core.indicator.SPEI
      - DC
      - DMC
      - CFFWIS
      - KBDI
      - DF
      - FFDI
      - LAST_SNOWFALL
      - FIRST_SNOWFALL
      - DAYS_WITH_SNOW
      - SNOWFALL_FREQUENCY
      - SNOWFALL_INTENSITY
      - DAYS_OVER_PRECIP_THRESH
      - DAYS_OVER_PRECIP_DOY_THRESH
      - HIGH_PRECIP_LOW_TEMP
      - FRACTION_OVER_PRECIP_DOY_THRESH
      - FRACTION_OVER_PRECIP_THRESH
      - LIQUID_PRECIP_RATIO
      - DRY_SPELL_FREQUENCY
      - DRY_SPELL_TOTAL_LENGTH
      - DRY_SPELL_MAX_LENGTH
      - WET_SPELL_FREQUENCY
      - WET_SPELL_TOTAL_LENGTH
      - WET_SPELL_MAX_LENGTH
      - RPRCTOT
      - COLD_AND_DRY_DAYS
      - WARM_AND_DRY_DAYS
      - WARM_AND_WET_DAYS
      - COLD_AND_WET_DAYS
      - RAIN_SEASON
      - WATER_CYCLE_INTENSITY
      - JETSTREAM_METRIC_WOOLLINGS
      - TN_DAYS_ABOVE
      - TN_DAYS_BELOW
      - TG_DAYS_ABOVE
      - TG_DAYS_BELOW
      - TX_DAYS_ABOVE
      - TX_DAYS_BELOW
      - TX_TN_DAYS_ABOVE
      - HEAT_WAVE_FREQUENCY
      - HOT_SPELL_MAX_MAGNITUDE
      - HEAT_WAVE_MAX_LENGTH
      - HEAT_WAVE_TOTAL_LENGTH
      - HEAT_WAVE_INDEX
      - HEAT_SPELL_FREQUENCY
      - HEAT_SPELL_MAX_LENGTH
      - HEAT_SPELL_TOTAL_LENGTH
      - HOT_SPELL_FREQUENCY
      - HOT_SPELL_MAX_LENGTH
      - HOT_SPELL_TOTAL_LENGTH
      - TG_MEAN
      - TG_MAX
      - TG_MIN
      - TX_MEAN
      - TX_MAX
      - TX_MIN
      - TN_MEAN
      - TN_MAX
      - TN_MIN
      - DTR
      - DTRMAX
      - DTRVAR
      - ETR
      - COLD_SPELL_DURATION_INDEX
      - COLD_SPELL_DAYS
      - COLD_SPELL_FREQUENCY
      - COLD_SPELL_MAX_LENGTH
      - COLD_SPELL_TOTAL_LENGTH
      - COOL_NIGHT_INDEX
      - DLYFRZTHW
      - FREEZETHAW_SPELL_FREQUENCY
      - FREEZETHAW_SPELL_MEAN_LENGTH
      - FREEZETHAW_SPELL_MAX_LENGTH
      - COOLING_DEGREE_DAYS
      - COOLING_DEGREE_DAYS_APPROXIMATION
      - HEATING_DEGREE_DAYS
      - HEATING_DEGREE_DAYS_APPROXIMATION
      - GROWING_DEGREE_DAYS
      - FREEZING_DEGREE_DAYS
      - THAWING_DEGREE_DAYS
      - FRESHET_START
      - FROST_DAYS
      - FROST_SEASON_LENGTH
      - LAST_SPRING_FROST
      - FIRST_DAY_TN_BELOW
      - FIRST_DAY_TG_BELOW
      - FIRST_DAY_TX_BELOW
      - FIRST_DAY_TN_ABOVE
      - FIRST_DAY_TG_ABOVE
      - FIRST_DAY_TX_ABOVE
      - ICE_DAYS
      - CONSECUTIVE_FROST_DAYS
      - FROST_FREE_SEASON_LENGTH
      - FROST_FREE_SEASON_START
      - FROST_FREE_SEASON_END
      - FROST_FREE_SPELL_MAX_LENGTH
      - CONSECUTIVE_FROST_FREE_DAYS
      - GROWING_SEASON_START
      - GROWING_SEASON_LENGTH
      - GROWING_SEASON_END
      - TROPICAL_NIGHTS
      - TG90P
      - TG10P
      - TX90P
      - TX10P
      - TN90P
      - TN10P
      - DEGREE_DAYS_EXCEEDANCE_DATE
      - WARM_SPELL_DURATION_INDEX
      - MAXIMUM_CONSECUTIVE_WARM_DAYS
      - FIRE_SEASON
      - HUGLIN_INDEX
      - BIOLOGICALLY_EFFECTIVE_DEGREE_DAYS
      - EFFECTIVE_GROWING_DEGREE_DAYS
      - LATITUDE_TEMPERATURE_INDEX
      - LATE_FROST_DAYS
      - AUSTRALIAN_HARDINESS_ZONES
      - USDA_HARDINESS_ZONES
      - CP
      - CU
      - CALM_DAYS
      - WINDY_DAYS
      - SFCWIND_MAX
      - SFCWIND_MEAN
      - SFCWIND_MIN
      - SFCWINDMAX_MAX
      - SFCWINDMAX_MEAN
      - SFCWINDMAX_MIN
      - FIT
      - RETURN_LEVEL
      - STATS
      - SND_SEASON_LENGTH
      - SNW_SEASON_LENGTH
      - SND_SEASON_START
      - SNW_SEASON_START
      - SND_SEASON_END
      - SNW_SEASON_END
      - SND_MAX_DOY
      - SNOW_MELT_WE_MAX
      - SNW_MAX
      - SNW_MAX_DOY
      - MELT_AND_PRECIP_MAX
      - SND_STORM_DAYS
      - SNW_STORM_DAYS
      - BLOWING_SNOW
      - SNOW_DEPTH
      - SND_TO_SNW
      - SNW_TO_SND
      - SND_DAYS_ABOVE
      - SNW_DAYS_ABOVE
      - HOLIDAY_SNOW_DAYS
      - HOLIDAY_SNOW_AND_SNOWFALL_DAYS
      - BASE_FLOW_INDEX
      - RB_FLASHINESS_INDEX
      - DOY_QMAX
      - DOY_QMIN
      - xclim.core.indicator.FLOW_INDEX
      - HIGH_FLOW_FREQUENCY
      - LOW_FLOW_FREQUENCY
      - xclim.core.indicator.SSI
      - xclim.core.indicator.SGI
      - SEA_ICE_EXTENT
      - SEA_ICE_AREA
      - icclim.TG
      - icclim.TX
      - icclim.TN
      - icclim.TG90P
      - icclim.TG10P
      - icclim.TGX
      - icclim.TGN
      - icclim.TX90P
      - icclim.TX10P
      - icclim.TXX
      - icclim.TXN
      - icclim.TN90P
      - icclim.TN10P
      - icclim.TNX
      - icclim.TNN
      - icclim.HI
      - icclim.BEDD
      - icclim.CSDI
      - icclim.WSDI
      - icclim.SU
      - icclim.CSU
      - icclim.TR
      - icclim.GD4
      - icclim.FD
      - icclim.CFD
      - icclim.GSL
      - icclim.ID
      - icclim.HD17
      - icclim.CDD
      - icclim.CWD
      - icclim.RR
      - icclim.PRCPTOT
      - icclim.SDII
      - icclim.ETR
      - icclim.DTR
      - icclim.VDTR
      - icclim.RR1
      - icclim.R10MM
      - icclim.R20MM
      - icclim.RX1DAY
      - icclim.RX5DAY
      - icclim.R75P
      - icclim.R95P
      - icclim.R99P
      - icclim.R75PTOT
      - icclim.R95PTOT
      - icclim.R99PTOT
      - icclim.SD
      - icclim.SD1
      - icclim.SD5CM
      - icclim.SD50CM
      - icclim.CD
      - icclim.WD
      - icclim.WW
      - icclim.CW
      - anuclim.P10_MEANTEMPWARMESTQUARTER
      - anuclim.P11_MEANTEMPCOLDESTQUARTER
      - anuclim.P12_ANNUALPRECIP
      - anuclim.P13_PRECIPWETTESTPERIOD
      - anuclim.P14_PRECIPDRIESTPERIOD
      - anuclim.P15_PRECIPSEASONALITY
      - anuclim.P16_PRECIPWETTESTQUARTER
      - anuclim.P17_PRECIPDRIESTQUARTER
      - anuclim.P18_PRECIPWARMESTQUARTER
      - anuclim.P19_PRECIPCOLDESTQUARTER
      - anuclim.P1_ANNMEANTEMP
      - anuclim.P2_MEANDIURNALRANGE
      - anuclim.P3_ISOTHERMALITY
      - anuclim.P4_TEMPSEASONALITY
      - anuclim.P5_MAXTEMPWARMESTPERIOD
      - anuclim.P6_MINTEMPCOLDESTPERIOD
      - anuclim.P7_TEMPANNUALRANGE
      - anuclim.P8_MEANTEMPWETTESTQUARTER
      - anuclim.P9_MEANTEMPDRIESTQUARTER
      - cf.CDD
      - cf.CDDCOLDTT
      - cf.CFD
      - cf.CSU
      - cf.CTMGETT
      - cf.CTMGTTT
      - cf.CTMLETT
      - cf.CTMLTTT
      - cf.CTNGETT
      - cf.CTNGTTT
      - cf.CTNLETT
      - cf.CTNLTTT
      - cf.CTXGETT
      - cf.CTXGTTT
      - cf.CTXLETT
      - cf.CTXLTTT
      - cf.CWD
      - cf.DDGTTT
      - cf.DDLTTT
      - cf.DTR
      - cf.ETR
      - cf.FG
      - cf.FXX
      - cf.GD4
      - cf.GDDGROWTT
      - cf.HD17
      - cf.HDDHEATTT
      - cf.MAXDTR
      - cf.PP
      - cf.RH
      - cf.SD
      - cf.SDII
      - cf.SS
      - cf.TG
      - cf.TMM
      - cf.TMMAX
      - cf.TMMEAN
      - cf.TMMIN
      - cf.TMN
      - cf.TMX
      - cf.TN
      - cf.TNM
      - cf.TNMAX
      - cf.TNMEAN
      - cf.TNMIN
      - cf.TNN
      - cf.TNX
      - cf.TX
      - cf.TXM
      - cf.TXMAX
      - cf.TXMEAN
      - cf.TXMIN
      - cf.TXN
      - cf.TXX
      - cf.VDTR
      type: enum
  input:
    type:
    - 'null'
    - inputBinding:
        position: 3
        prefix: -i
      items: string
      type: array
  output:
    inputBinding:
      position: 4
      prefix: -o
    type: string?
  verbose:
    inputBinding:
      position: 5
      prefix: -v
    type: int?
outputs:
  results:
    outputBinding:
      glob: .
    type: Directory
requirements:
  EnvVarRequirement:
    envDef:
      TEST: VALUE
  ResourceRequirement: {}
s:author:
- class: s:Person
  s:name: fmigneault
stderr: std.err
stdout: std.out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants