Skip to content

HI-FRIENDS-SDC2/hi-friends

Repository files navigation

GPLv3 License Snakemake Documentation Status DOI Binder fair-software.eu CII Best Practices Contributor Covenant SQAaaS badge shields.io

Summary

This repository hosts a workflow to process HI data cubes produced by radio interferometers, in particular large data cubes produced by future instruments like the SKA. It extract radio sources and characterize their main properties.

The workflow is managed and executed using snakemake workflow management system. It uses spectral-cube based on dask parallelization tool and astropy suite to divide the large cube in smaller pieces. On each of the subcubes, we execute Sofia-2 for masking the subcubes, find sources and characterize their properties. Finally, the individual catalogs are cleaned, concatenated into a single catalog, and duplicates from the overlapping regions are eliminated. Some diagnostic plots are produced using Jupyter notebook.

HI-FRIENDS team: participation in the SKA Data Challenge 2

This repository contains the workflow used to find and characterize the HI sources in the data cube of the SKA Data Challenge 2. This is developed by the HI-FRIENDS team. The execution of the workflow was conducted in the SP-SRC cluster at the IAA-CSIC. Documentation can be found in HI-FRIENDS SDC2 Documentation (more details below).

Accessibility to the workflow

Following FAIR principles, we are trying to make the workflow as accessible as possible. The contents of this repository and the solution to participate in the SDC2 are published in this Zenodo record. The snakemake workflow is also provided as a singularity and a docker container. The workflow is also published in WorkflowHub. Installation and execution instructions can be found in the online documentation developed in this repository.

Installing

For details on installing and using HI-FRIENDS, please visit the documentation: installation, execution.

License

We are using GNU General Public License v3.0. See full license here. image

Citation

Please, use this reference (resolves to most recent version in Zenodo): https://doi.org/10.5281/zenodo.5167659

Documentation

The repository documentation can be found in the HI-FRIENDS SDC2 webpage where you can find details on:

  • The SKA Data Challenge 2
    • The HI-FRIENDS solution to the SDC2
    • Workflow general description
    • The HI-FRIENDS team
  • Methodology
    • Data exploration
    • Feedback from the workflow and logs
    • Configuration
    • Unit tests
    • Software managed and containerization
    • Check conformance to coding standards
  • Workflow Description
    • Workflow definition diagrams
    • Workflow file structure
    • Output products
    • Snakemake execution and diagrams
  • Workflow installation
    • Dependencies
    • Installation 1. Get conda 2. Get the pipeline and install snakemake
    • Deploy in containers - Docker - Singularity - Podman
    • Use tarball of the workflow
    • Use myBinder
  • Workflow execution
    • Preparation
    • Basic usage and verification of the workflow
    • Execution on a data cube
  • SDC2 HI-FRIENDS results
    • Our solution
    • Score
  • SDC2 Reproducibility award
    • Reproducibility of the solution check list
  • Developers
    • define_chunks module
    • eliminate_duplicates module
    • filter_catalog module
    • run_sofia module
    • sofia2cat module
    • split_subcube module
  • Acknowledgments

Contributing

More details in CONTRIBUTING.MD. Summary here:

Coding

Nothing fancy here, just:

  1. Fork this repo
  2. Commit you code
  3. Submit a pull request. It will be reviewed by maintainers and they'll give you proper feedback so you can iterate over it.

Considerations

Testing

As mentioned above, existing tests must pass and new features are required to be tested and fully-covered.

Documenting

Code should be self-documented. But, in case there is any code that may be hard to understand, it must include some comments to make it easier to review and maintain later on.