This repository hosts a workflow to process HI data cubes produced by radio interferometers, in particular large data cubes produced by future instruments like the SKA. It extract radio sources and characterize their main properties.
The workflow is managed and executed using snakemake workflow management system. It uses spectral-cube
based on dask
parallelization tool and astropy
suite to divide the large cube in smaller pieces. On each of the subcubes, we execute Sofia-2 for masking the subcubes, find sources and characterize their properties. Finally, the individual catalogs are cleaned, concatenated into a single catalog, and duplicates from the overlapping regions are eliminated. Some diagnostic plots are produced using Jupyter notebook.
This repository contains the workflow used to find and characterize the HI sources in the data cube of the SKA Data Challenge 2. This is developed by the HI-FRIENDS team. The execution of the workflow was conducted in the SP-SRC cluster at the IAA-CSIC. Documentation can be found in HI-FRIENDS SDC2 Documentation (more details below).
Following FAIR principles, we are trying to make the workflow as accessible as possible. The contents of this repository and the solution to participate in the SDC2 are published in this Zenodo record. The snakemake workflow is also provided as a singularity and a docker container. The workflow is also published in WorkflowHub. Installation and execution instructions can be found in the online documentation developed in this repository.
For details on installing and using HI-FRIENDS, please visit the documentation: installation, execution.
We are using GNU General Public License v3.0. See full license here.
Please, use this reference (resolves to most recent version in Zenodo): https://doi.org/10.5281/zenodo.5167659
The repository documentation can be found in the HI-FRIENDS SDC2 webpage where you can find details on:
- The SKA Data Challenge 2
- The HI-FRIENDS solution to the SDC2
- Workflow general description
- The HI-FRIENDS team
- Methodology
- Data exploration
- Feedback from the workflow and logs
- Configuration
- Unit tests
- Software managed and containerization
- Check conformance to coding standards
- Workflow Description
- Workflow definition diagrams
- Workflow file structure
- Output products
- Snakemake execution and diagrams
- Workflow installation
- Dependencies
- Installation 1. Get conda 2. Get the pipeline and install snakemake
- Deploy in containers - Docker - Singularity - Podman
- Use tarball of the workflow
- Use myBinder
- Workflow execution
- Preparation
- Basic usage and verification of the workflow
- Execution on a data cube
- SDC2 HI-FRIENDS results
- Our solution
- Score
- SDC2 Reproducibility award
- Reproducibility of the solution check list
- Developers
- define_chunks module
- eliminate_duplicates module
- filter_catalog module
- run_sofia module
- sofia2cat module
- split_subcube module
- Acknowledgments
More details in CONTRIBUTING.MD. Summary here:
Nothing fancy here, just:
- Fork this repo
- Commit you code
- Submit a pull request. It will be reviewed by maintainers and they'll give you proper feedback so you can iterate over it.
- Make sure existing tests pass
- Make sure your new code is properly tested and fully-covered
- Following The seven rules of a great Git commit message is highly encouraged
- When adding a new feature, branch from master-branch
As mentioned above, existing tests must pass and new features are required to be tested and fully-covered.
Code should be self-documented. But, in case there is any code that may be hard to understand, it must include some comments to make it easier to review and maintain later on.