generate reference data from CloudMicrophysics.jl and Cloudy.jl within CI build instead of storing datafiles in the repo (which do not contain instructions on how to generate these data...) #1279

slayoo · 2024-02-15T15:11:31Z

In two examples in PySDM, we store in the repo data generated with CliMA Julia tools:

ARG example has data generated with CloudMicrophysics.jl (https://github.com/open-atmos/PySDM/blob/main/examples/PySDM_examples/Abdul_Razzak_Ghan_2000/data_from_CloudMicrophysics_ARG.py)
deJong_Azimi example has data generated with Cloudy.jl (https://github.com/open-atmos/PySDM/blob/ej/collisions_only/examples/PySDM_examples/deJong_Azimi/cloudy_data_0d.py)

It would be great to generate these data on the fly in CI. For example, we could write it to JSON from Julia and then load from Python.

I'd need your help here - we don't have yet even information on how to generate these data with the Julia tools, so this would improve reproducibility and clarity of the examples in many ways!

We can of course specify the exact version of the Julia tools to use.
We can also generate these datafiles once and make them available as artifacts for all other CI jobs - it's enough if this works on one single environment.

This should be also helpful for the Julia projects, as bumping versions we will have info if these output changes or not.

Help welcome!
Thanks,
S.

claresinger · 2024-02-15T17:52:09Z

For the ARG example this should be pretty simple. The same figures from the original paper are generated in the CloudMicrophysics.jl docs on the fly.

The Julia scripts to make these plots are https://github.com/CliMA/CloudMicrophysics.jl/blob/main/docs/src/plots/ARGplots_fig1.jl and https://github.com/CliMA/CloudMicrophysics.jl/blob/main/docs/src/plots/ARGplots.jl.

trontrytel · 2024-02-16T06:36:27Z

I don't have much experience with Cloudy. Thats a question to @edejong-caltech and @sajjadazimi

I can help with the ARG plots. Do you really want your CI depend on Julia packages? Or would you rather just generate the data yourself once and then save it? Or is it enough to put a link to our CI?

slayoo · 2024-02-16T09:44:34Z

@claresinger, @trontrytel, thanks for following up

Do you really want your CI depend on Julia packages? Or would you rather just generate the data yourself once and then save it?

I'd vote against having the data saved, and in favor of generating it on CI. This ensures that:

the instructions given are enough for new developers to re-generate the data (and understand what's in the data)
we are able to check if newer versions of the Julia packages still match with PySDM (e.g., if we start to diverge, it would be great to know if that applies to the current version of a Julia package, and not to an unspecified version from years ago...)

BTW, we already depend on Julia in CI to check the README Julia snippets:

PySDM/.github/workflows/readme_snippets.yml

Lines 39 to 53 in d6b0df2

    
             julia: 
        
               runs-on: ubuntu-latest 
        
               steps: 
        
                 - uses: actions/checkout@v2 
        
                 - uses: actions/setup-python@v1 
        
                   with: 
        
                     python-version: 3.9 
        
                 - run: pip install -e . 
        
                 - run: pip install pytest-codeblocks pytest 
        
                 - run: python -c "import pytest_codeblocks; code=pytest_codeblocks.extract_from_file('README.md'); f=open('readme.jl', 'w'); f.writelines(block.code for block in code if block.syntax=='Julia'); f.close()" 
        
                 - uses: julia-actions/setup-julia@v1 
        
                 - run: cat -n readme.jl 
        
                 - run: julia readme.jl 
        
                 - run: sed -i 's/CPU/GPU/g' readme.jl 
        
                 - run: julia readme.jl

In PyMPDATA, in an analogous way, we are comparing to output from libmpdata++ generated on the fly: https://github.com/open-atmos/PyMPDATA/blob/29fb3b836f5dc730e49bc4dc2075ddc52a26c667/.github/workflows/tests%2Bpypi.yml#L176-L184

It is enough to have one job executing the calculations with a Julia package, which then can upload the generated JSON files as artifacts for other jobs to be fetched, here's an example (Julia vs. Matlab vs. Python output comparison): https://github.com/open-atmos/PyPartMC/blob/main/.github/workflows/readme_listings.yml

Or is it enough to put a link to our CI?

I'm not sure if sharing artifacts across different repos and platforms isn't more difficult than generating it in PySDM's CI?

trontrytel · 2024-02-16T15:32:28Z

Sounds good. I didn't remember that you already have some Julia in your CI.

Then we just have to run those two scripts from your documentation here. I can make an mwe with the necessary package manager commands

trontrytel · 2024-02-20T23:00:47Z

Hi. Here is a code snippet that should generate the data for the 5 figures from the paper. Just change the extension from txt to jl. Apologies for not opening a proper PR, but I don't know enough about your CI.

julia --project -e 'using Pkg; Pkg.add("CloudMicrophysics"); Pkg.add("CLIMAParameters"); Pkg.add("Thermodynamics"); Pkg.add("UnicodePlots"); include("arg_plots.jl")'

arg_plots.txt

Also, I'm using some simple plotting package to see the results, but obviously that dependence and the plotting should be deleted in the end.

slayoo · 2024-02-21T23:36:12Z

Thank you @trontrytel !

github-actions · 2024-05-31T13:46:43Z

Stale issue message

github-actions bot added the no-activity label May 31, 2024

slayoo removed the no-activity label Jun 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generate reference data from CloudMicrophysics.jl and Cloudy.jl within CI build instead of storing datafiles in the repo (which do not contain instructions on how to generate these data...) #1279

generate reference data from CloudMicrophysics.jl and Cloudy.jl within CI build instead of storing datafiles in the repo (which do not contain instructions on how to generate these data...) #1279

slayoo commented Feb 15, 2024

claresinger commented Feb 15, 2024

trontrytel commented Feb 16, 2024

slayoo commented Feb 16, 2024

trontrytel commented Feb 16, 2024

trontrytel commented Feb 20, 2024 •

edited

Loading

slayoo commented Feb 21, 2024

github-actions bot commented May 31, 2024

generate reference data from CloudMicrophysics.jl and Cloudy.jl within CI build instead of storing datafiles in the repo (which do not contain instructions on how to generate these data...) #1279

generate reference data from CloudMicrophysics.jl and Cloudy.jl within CI build instead of storing datafiles in the repo (which do not contain instructions on how to generate these data...) #1279

Comments

slayoo commented Feb 15, 2024

claresinger commented Feb 15, 2024

trontrytel commented Feb 16, 2024

slayoo commented Feb 16, 2024

trontrytel commented Feb 16, 2024

trontrytel commented Feb 20, 2024 • edited Loading

slayoo commented Feb 21, 2024

github-actions bot commented May 31, 2024

trontrytel commented Feb 20, 2024 •

edited

Loading