Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate reference data from CloudMicrophysics.jl and Cloudy.jl within CI build instead of storing datafiles in the repo (which do not contain instructions on how to generate these data...) #1279

Open
slayoo opened this issue Feb 15, 2024 · 7 comments

Comments

@slayoo
Copy link
Member

slayoo commented Feb 15, 2024

@trontrytel, @edejong-caltech,

In two examples in PySDM, we store in the repo data generated with CliMA Julia tools:

It would be great to generate these data on the fly in CI. For example, we could write it to JSON from Julia and then load from Python.

I'd need your help here - we don't have yet even information on how to generate these data with the Julia tools, so this would improve reproducibility and clarity of the examples in many ways!

We can of course specify the exact version of the Julia tools to use.
We can also generate these datafiles once and make them available as artifacts for all other CI jobs - it's enough if this works on one single environment.

This should be also helpful for the Julia projects, as bumping versions we will have info if these output changes or not.

Help welcome!
Thanks,
S.

@claresinger
Copy link
Collaborator

For the ARG example this should be pretty simple. The same figures from the original paper are generated in the CloudMicrophysics.jl docs on the fly.

The Julia scripts to make these plots are https://github.com/CliMA/CloudMicrophysics.jl/blob/main/docs/src/plots/ARGplots_fig1.jl and https://github.com/CliMA/CloudMicrophysics.jl/blob/main/docs/src/plots/ARGplots.jl.

@trontrytel
Copy link
Contributor

I don't have much experience with Cloudy. Thats a question to @edejong-caltech and @sajjadazimi

I can help with the ARG plots. Do you really want your CI depend on Julia packages? Or would you rather just generate the data yourself once and then save it? Or is it enough to put a link to our CI?

@slayoo
Copy link
Member Author

slayoo commented Feb 16, 2024

@claresinger, @trontrytel, thanks for following up

Do you really want your CI depend on Julia packages? Or would you rather just generate the data yourself once and then save it?

I'd vote against having the data saved, and in favor of generating it on CI. This ensures that:

  • the instructions given are enough for new developers to re-generate the data (and understand what's in the data)
  • we are able to check if newer versions of the Julia packages still match with PySDM (e.g., if we start to diverge, it would be great to know if that applies to the current version of a Julia package, and not to an unspecified version from years ago...)

BTW, we already depend on Julia in CI to check the README Julia snippets:

julia:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v1
with:
python-version: 3.9
- run: pip install -e .
- run: pip install pytest-codeblocks pytest
- run: python -c "import pytest_codeblocks; code=pytest_codeblocks.extract_from_file('README.md'); f=open('readme.jl', 'w'); f.writelines(block.code for block in code if block.syntax=='Julia'); f.close()"
- uses: julia-actions/setup-julia@v1
- run: cat -n readme.jl
- run: julia readme.jl
- run: sed -i 's/CPU/GPU/g' readme.jl
- run: julia readme.jl

In PyMPDATA, in an analogous way, we are comparing to output from libmpdata++ generated on the fly: https://github.com/open-atmos/PyMPDATA/blob/29fb3b836f5dc730e49bc4dc2075ddc52a26c667/.github/workflows/tests%2Bpypi.yml#L176-L184

It is enough to have one job executing the calculations with a Julia package, which then can upload the generated JSON files as artifacts for other jobs to be fetched, here's an example (Julia vs. Matlab vs. Python output comparison): https://github.com/open-atmos/PyPartMC/blob/main/.github/workflows/readme_listings.yml

Or is it enough to put a link to our CI?

I'm not sure if sharing artifacts across different repos and platforms isn't more difficult than generating it in PySDM's CI?

@trontrytel
Copy link
Contributor

Sounds good. I didn't remember that you already have some Julia in your CI.

Then we just have to run those two scripts from your documentation here. I can make an mwe with the necessary package manager commands

@trontrytel
Copy link
Contributor

trontrytel commented Feb 20, 2024

Hi. Here is a code snippet that should generate the data for the 5 figures from the paper. Just change the extension from txt to jl. Apologies for not opening a proper PR, but I don't know enough about your CI.

julia --project -e 'using Pkg; Pkg.add("CloudMicrophysics"); Pkg.add("CLIMAParameters"); Pkg.add("Thermodynamics"); Pkg.add("UnicodePlots"); include("arg_plots.jl")'

arg_plots.txt

Also, I'm using some simple plotting package to see the results, but obviously that dependence and the plotting should be deleted in the end.

@slayoo
Copy link
Member Author

slayoo commented Feb 21, 2024

Thank you @trontrytel !

Copy link

Stale issue message

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants