diff --git a/README.md b/README.md index 79f37c7c7..42c7311eb 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,13 @@ # MDTF-diagnostics [![Build Status](https://travis-ci.org/tsjackson-noaa/MDTF-diagnostics.svg?branch=develop)](https://travis-ci.org/tsjackson-noaa/MDTF-diagnostics) -The MDTF diagnostics package is a portable framework for running process-oriented diagnostics (PODs) on climate model data. Each POD module targets a specific physical process or emergent behavior, with the goals of determining how accurately the model represents that process, ensuring that models produce the right answers for the right reasons, and identifying gaps in the understanding of phenomena. +The MDTF diagnostics package is a portable framework for running process-oriented diagnostics (PODs) on climate model data. Each POD script targets a specific physical process or emergent behavior, with the goals of determining how accurately the model represents that process, ensuring that models produce the right answers for the right reasons, and identifying gaps in the understanding of phenomena. The package provides an extensible, portable and reproducible means for running these diagnostics as part of the model development workflow. The framework handles software dependency and data handling tasks, meaning that POD developers can focus on science instead of “reinventing the wheel”. Development is community-driven and built on open-source technologies. Documentation for users and contributors is hosted on readthedocs.io. ![MDTF_logo](<./doc/img/CPO_MAPP_MDTF_Logo.jpg>) ## Diagnostics in Package + Follow the links in the table below to view sample output, including a brief description and a link to the full documentation for each diagnostic. @@ -24,32 +25,31 @@ and a link to the full documentation for each diagnostic. | [ENSO Moist Static Energy budget](http://www.cgd.ucar.edu/cms/bundy/Projects/diagnostics/mdtf/mdtf_figures/MDTF_CCSM4/MSE_diag/MSE_diag.html) (implementation in progress, example with CCSM4 data) | Hariharasubramanian Annamalai (U. Hawaii) | | [Warm Rain Microphysics](http://www.cgd.ucar.edu/cms/bundy/Projects/diagnostics/mdtf/mdtf_figures/MDTF_QBOi.EXP1.AMIP.001.save/warm_rain_microphysics/documentation) (implementation in progress) | Kentaroh Suzuki (AORI, U. Tokyo) | -### Sample Output Webpage -[Version 2.0 output](http://www.cgd.ucar.edu/cms/bundy/Projects/diagnostics/mdtf/mdtf_figures/MDTF_QBOi.EXP1.AMIP.001.save), based on a CESM-CAM run. +### Examples of package output +- [Historical run of NOAA-GFDL ESM4](https://extranet.gfdl.noaa.gov/~John.Krasting/mdtf/GFDL-ESM4/), 1980-2014 +- [Historical run of NOAA-GFDL CM4](https://extranet.gfdl.noaa.gov/~John.Krasting/mdtf/GFDL-CM4/), 1980-2014 +- [Historical run of NCAR CESM2/CAM4](http://www.cgd.ucar.edu/cms/bundy/Projects/diagnostics/mdtf/mdtf_figures/MDTF_QBOi.EXP1.AMIP.001.save/) 1977-1981, from an earlier version of the package. # Quickstart installation instructions -This document provides basic directions for downloading, installing and running a test of the MDTF diagnostic framework package using sample model data. See the [documentation site](https://mdtf-diagnostics.readthedocs.io/en/latest/) for all other information. The current MDTF package has been tested on UNIX/LINUX, Mac OS, and Windows Subsystem for Linux. - -Throughout this document, `%` indicates the UNIX/LINUX command line prompt and is followed by commands to be executed in a terminal in `fixed-width font`, and `$` indicates strings to be substituted, e.g., the string `$CODE_ROOT` in section 1.1 should be substituted by the actual path to the MDTF-diagnostics directory. +This document provides basic directions for downloading, installing and running a test of the MDTF framework using sample model data. See the [documentation site](https://mdtf-diagnostics.readthedocs.io/en/latest/) for all other information. The MDTF package has been tested on UNIX/LINUX, Mac OS, and Windows Subsystem for Linux. -### Summary of steps for running the package +Throughout this document, `%` indicates the command line prompt and is followed by commands to be executed in a terminal in `fixed-width font`. `$` indicates strings to be substituted, e.g., the string `$CODE_ROOT` in section 1.1 should be replaced by the actual path to the `MDTF-diagnostics` directory. -You will need to download a) the source code, b) digested observational data, and c) two sets of sample model data (section 1). Afterwards, we describe how to install necessary Conda environments and languages (section 2) and run the framework on the default test case (sections 3 and 4). While the package contains quite a few scripts, the most relevant for present purposes are: +**Summary of steps for installing the framework** -- `conda_env_setup.sh`: automated script for installing necessary Conda environments. -- `default_tests.jsonc`: configuration file for running the framework. +You will need to download the source code, digested observational data, and sample model data (section 1). Afterwards, we describe how to install software dependencies using the `conda `__ package manager (sections 2 and 3) and run the framework on sample model data (sections 4 and 5). -Consult the [Getting started](https://mdtf-diagnostics.readthedocs.io/en/latest/sphinx/start_toc.html) for how to run the framework on your own data and configure general settings. +Consult the [documentation](https://mdtf-diagnostics.readthedocs.io/en/latest/sphinx/start_toc.html) for more general instructions, including how to run the framework on your own data. -## 1. Download the package code and sample data for testing +## 1. Download the framework code and supporting data ### 1.1 Obtaining the code -The official repo for the MDTF code is hosted at the GFDL [GitHub account](https://github.com/NOAA-GFDL/MDTF-diagnostics). We recommend that end users download and test the [latest official release](https://github.com/NOAA-GFDL/MDTF-diagnostics/releases/tag/v3.0-beta.1). +The official repo for the MDTF code is hosted at the NOAA-GFDL [GitHub account](https://github.com/NOAA-GFDL/MDTF-diagnostics). We recommend that end users download and test the [latest official release](https://github.com/NOAA-GFDL/MDTF-diagnostics/releases/tag/v3.0-beta.2). -To install the MDTF package on a local machine, create a directory named `mdtf`, and unzip the code downloaded from the [release page](https://github.com/NOAA-GFDL/MDTF-diagnostics/releases/tag/v3.0-beta.1) there. This will create a directory titled `MDTF-diagnostics-3.0-beta.1` containing the files listed on the GitHub page. Below we refer to this MDTF-diagnostics directory as `$CODE_ROOT`. It contains the following subdirectories: +To install the MDTF package on a local machine, create a directory named `mdtf` and unzip the code downloaded from the [release page](https://github.com/NOAA-GFDL/MDTF-diagnostics/releases/tag/v3.0-beta.2) there. This will create a directory titled `MDTF-diagnostics-3.0-beta.2` containing the files listed on the GitHub page. Below we refer to this MDTF-diagnostics directory as `$CODE_ROOT`. It contains the following subdirectories: - `diagnostics/`: directory containing source code and documentation of individual PODs. - `doc/`: directory containing documentation (a local mirror of the documentation site). @@ -60,12 +60,14 @@ For advanced users interested in keeping more up-to-date on project development ### 1.2 Obtaining supporting data -Supporting observational data and sample model data are available via anonymous FTP (ftp://ftp.cgd.ucar.edu/archive/mdtf). The observational data is required for the PODs’ operation, while the sample model data is provided for default test/demonstration purposes. The files most relevant for package installation and default tests are: +Supporting observational data and sample model data are available via anonymous FTP at ftp://ftp.cgd.ucar.edu/archive/mdtf. The observational data is required for the PODs’ operation, while the sample model data is provided for default test/demonstration purposes. The required files are: - Digested observational data (159 Mb): MDTF_v2.1.a.obs_data.tar (ftp://ftp.cgd.ucar.edu/archive/mdtf/MDTF_v2.1.a.obs_data.tar). - NCAR-CESM-CAM sample data (12.3 Gb): model.QBOi.EXP1.AMIP.001.tar (ftp://ftp.cgd.ucar.edu/archive/mdtf/model.QBOi.EXP1.AMIP.001.tar). - NOAA-GFDL-CM4 sample data (4.8 Gb): model.GFDL.CM4.c96L32.am4g10r8.tar (ftp://ftp.cgd.ucar.edu/archive/mdtf/model.GFDL.CM4.c96L32.am4g10r8.tar). +Note that the above paths are symlinks to the most recent versions of the data and will be reported as zero bytes in an FTP client. + Download these three files and extract the contents in the following hierarchy under the `mdtf` directory: ``` @@ -93,120 +95,87 @@ mdtf ├── (... supporting data for individual PODs ) ``` -The default test case uses the QBOi.EXP1.AMIP.001 sample. The GFDL.CM4.c96L32.am4g10r8 sample is only for testing the MJO Propagation and Amplitude POD. - -You can put the observational data and model output in different locations (e.g., for space reasons) by changing the values of `OBS_DATA_ROOT` and `MODEL_DATA_ROOT` as described below in section 3. +The default test case uses the QBOi.EXP1.AMIP.001 data. The GFDL.CM4.c96L32.am4g10r8 data is only for testing the MJO Propagation and Amplitude POD. -## 2. Install the necessary programming languages and modules +You can put the observational data and model output in different locations (e.g., for space reasons) by changing the values of `OBS_DATA_ROOT` and `MODEL_DATA_ROOT` as described below in section 4. -*For users unfamiliar with Conda, section 2.1 can be skipped if Conda has been installed, but section 2.2 CANNOT be skipped regardless.* +## 2. Install the conda package manager, if needed -The MDTF framework code is written in Python 2.7, but supports running PODs written in a variety of scripting languages and combinations of libraries. We use [Conda](https://docs.conda.io/en/latest/), a free, open-source package manager to install and manage these dependencies. Conda is one component of the [Miniconda](https://docs.conda.io/en/latest/miniconda.html) and [Anaconda](https://www.anaconda.com/) python distribution, so having Miniconda/Anaconda is sufficient but not necessary. +The MDTF framework code is written in Python 3, but supports running PODs written in a variety of scripting languages and combinations of libraries. We use [conda](https://docs.conda.io/en/latest/), a free, open-source package manager, to install and manage these dependencies. Conda is one component of the [Miniconda](https://docs.conda.io/en/latest/miniconda.html) and [Anaconda ](https://www.anaconda.com/) Python distributions, so having Miniconda or Anaconda is sufficient but not required. -For maximum portability and ease of installation, we recommend that all users manage dependencies through Conda using the provided script `src/conda/conda_env_setup.sh`, even if they have independent installations of the required languages. A complete installation of all dependencies will take roughly 5 Gb, less if you've already installed some of the dependencies through Conda. The location of this installation can be changed with the `$CONDA_ENV_DIR` setting described below. +For maximum portability and ease of installation, we recommend that all users manage dependencies through conda, even if they have a pre-existing installations of the required languages. A complete installation of all dependencies requires roughly 5 Gb, and the location of this installation can be set with the `$CONDA_ENV_DIR` setting described below. ### 2.1 Conda installation -Here we are checking that the Conda command is available on your system. We recommend doing this via Miniconda or Anaconda installation. You can proceed directly to section 2.2 if Conda is already installed. - -- To determine if Conda is installed, run `% conda --version` as the user who will be using the framework. The framework has been tested against versions of Conda >= 4.7.5. - -- If the command doesn't return anything, i.e., you do not have a pre-existing Conda on your system, we recommend using the Miniconda installer available [here](https://docs.conda.io/en/latest/miniconda.html). Any version of Miniconda/Anaconda (2 or 3) released after June 2019 will work. Installation instructions [here](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html). - -- Toward the end of the installation process, enter “yes” at “Do you wish the installer to initialize Miniconda2 by running conda init?” (or similar) prompt. This will allow the installer to add the Conda path to the user's shell login script (e.g., `~/.bashrc` or `~/.cshrc`). - -- Restart the terminal to reload the updated shell login script. +Users with an existing conda installation should skip this section and proceed to section 3. -- Mac OS users may encounter a benign Java warning pop-up: *To use the "java" command-line tool you need to install a JDK.* It's safe to ignore it. +* To determine if conda is installed, run `% conda --version` as the user who will be using the framework. The framework has been tested against versions of conda >= 4.7.5. -The framework’s environments will co-exist with an existing Miniconda/Anaconda installation. *Do not* reinstall Miniconda/Anaconda if it's already installed for the user who will be running the framework: the installer will break the existing installation (if it's not managed with, eg., environment modules.) + - Do not install a new copy of Miniconda/Anaconda if it's already installed for the user who will be running the framework: the installer will break the existing installation (if it's not managed with, e.g., environment modules.) The framework’s environments are designed to coexist with an existing Miniconda/Anaconda installation. -### 2.2 Framework-specific environment installation +* If you do not have a pre-existing conda installation, we recommend installing Miniconda 3.x, available [here](https://docs.conda.io/en/latest/miniconda.html). This version is not required: any version of Miniconda/Anaconda (2 or 3) released after June 2019 will work equally well. Follow the [installation instructions](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html) appropriate for your system. -Here we set up the necessary environments needed for running the framework and individual PODs via the provided script. These are sometimes referred to as "Conda environments" conventionally. +## 3. Install framework dependencies with conda -After making sure that Conda is available, run `% conda info --base` as the user who will be using the framework to determine the location of your Conda installation. This path will be referred to as `$CONDA_ROOT` below. +As described above, all software dependencies for the framework and PODs are managed through conda environments. -- If this path points to `/usr/` or a subdirectory therein, we recomnend having a separate Miniconda/Anaconda installation of your own following section 2.1. +Run `% conda info --base` as the user who will be using the framework to determine the location of your conda installation. This path will be referred to as `$CONDA_ROOT` below. If you don't have write access to this location (eg, on a multi-user system), you'll need to tell conda to install files in a non-default location `$CONDA_ENV_DIR`, as described below. Next, run ``` % cd $CODE_ROOT % ./src/conda/conda_env_setup.sh --all --conda_root $CONDA_ROOT --env_dir $CONDA_ENV_DIR ``` -to install all necessary environments (and create an executable; section 4.1), which takes ~10 min (depending on machine and internet connection). The names of all framework-created environments begin with “_MDTF”, so as not to conflict with any other environments. -- Substitute the actual paths for `$CODE_ROOT`, `$CONDA_ROOT`, and `$CONDA_ENV_DIR`. - -- The `--env_dir` flag allows you to put the program files in a designated location `$CONDA_ENV_DIR` (for space reasons, or if you don’t have write access). You can omit this flag, and the environments will be installed within `$CONDA_ROOT/envs/` by default. - -- The `--all` flag makes the script install all environments prescribed by the YAML (.yml) files under `src/conda/` (one YAML for one environment). You can install the environments selectively by using the `--env` flag instead. For instance, `% ./src/conda/conda_env_setup.sh --env base --conda_root $CONDA_ROOT --env_dir $CONDA_ENV_DIR` will install the "_MDTF_base" environment prescribed by `env_base.yml`, and so on. With `--env`, the current script can install one environment at a time. Repeat the command for multiple environments. - -- Note that _MDTF_base is mandatory for the framework's operation, and the other environments are optional, see section 4.3. +to install all dependencies, which takes ~10 min (depending on machine and internet connection). The names of all framework-created environments begin with “_MDTF”, so as not to conflict with user-created environments in a preexisting conda installation. -After installing the framework-specific Conda environments, you shouldn't manually alter them (i.e., never run `conda update` on them). To update the environments after updating the framework code, re-run the above commands. These environments can be uninstalled by simply deleting "_MDTF" directories under `$CONDA_ENV_DIR` (or `$CONDA_ROOT/envs/` for default setting). +- Substitute the actual paths for `$CODE_ROOT`, `$CONDA_ROOT`, and `$CONDA_ENV_DIR`. +- The optional `--env_dir` flag directs conda to install framework dependencies in `$CONDA_ENV_DIR` (for space reasons, or if you don’t have write access). If this flag is omitted, the environments will be installed in `$CONDA_ROOT/envs/` by default. -## 3. Configuring package paths +After installing the framework-specific conda environments, you shouldn't manually alter them (eg, never run `conda update` on them). To update the environments after updating the framework code, re-run the above commands. These environments can be uninstalled by simply deleting the "_MDTF" directories under `$CONDA_ENV_DIR` (or `$CONDA_ROOT/envs/` by default). -`src/default_tests.jsonc` is a template/example for configuration options that will be passed to the executable as an input. Open it in an editor (we recommend working on a copy). The following adjustments are necessary before running the framework: +## 4. Configure framework paths -- If you've saved the supporting data in the directory structure described in section 1.2, the default values for `OBS_DATA_ROOT` and `MODEL_DATA_ROOT` pointing to `mdtf/inputdata/obs_data/` and `mdtf/inputdata/model/` will be correct. If you put the data in a different location, these values should be changed accordingly. +The MDTF framework supports setting configuration options in a file as well as on the command line. An example of the configuration file format is provided at [src/default_tests.jsonc](https://github.com/NOAA-GFDL/MDTF-diagnostics/blob/main/src/default_tests.jsonc). We recommend configuring the following settings by editing a copy of this file. -- `OUTPUT_DIR` should be set to the location you want the output files to be written to (default: `mdtf/wkdir/`; will be created by the framework). The output of each run of the framework will be saved in a different subdirectory in this location. +Relative paths in the configuration file will be interpreted relative to `$CODE_ROOT`. The following settings need to be configured before running the framework: -- `conda_root` should be set to the value of `$CONDA_ROOT` used above in section 2.2. +- If you've saved the supporting data in the directory structure described in section 1.2, the default values for `OBS_DATA_ROOT` and `MODEL_DATA_ROOT` given in `src/default_tests.jsonc` (`../inputdata/obs_data` and `../inputdata/model`, respectively) will be correct. If you put the data in a different location, these paths should be changed accordingly. +- `OUTPUT_DIR` should be set to the desired location for output files. The output of each run of the framework will be saved in a different subdirectory in this location. +- `conda_root` should be set to the value of `$CONDA_ROOT` used above in :ref:`ref-conda-env-install`. +- If you specified a non-default conda environment location with `$CONDA_ENV_DIR`, set `conda_env_root` to that value; otherwise, leave it blank. -- If you specified a custom environment location with `$CONDA_ENV_DIR`, set `conda_env_root` to that value; otherwise, leave it blank. -We recommend using absolute paths in `default_tests.jsonc`, but relative paths are also allowed and should be relative to `$CODE_ROOT`. +## 5. Run the MDTF framework on sample data -## 4. Execute the MDTF package with default test settings +### 5.1 Location of the MDTF executable -### 4.1 Location of the MDTF executable +The MDTF framework is run via a wrapper script at `$CODE_ROOT/mdtf`. -The setup script (section 2.2) will have created an executable at `$CODE_ROOT/mdtf` which sets the correct Conda environments before running the framework and individual PODs. To test the installation, `% $CODE_ROOT/mdtf --help` will print help text on the command-line options. Note that, if your current working directory is `$CODE_ROOT`, you will need to run `% ./mdtf --help`. +This is created by the conda environment setup script used in section 3. The wrapper script activates the framework's conda environment before calling the framework's code (and individual PODs). To verify that the framework and environments were installed successfully, run +``` +% cd $CODE_ROOT +% ./mdtf --version +``` -For interested users, the `mdtf` executable is also a script, which calls `src/conda/conda_init.sh` and `src/mdtf.py`. +This should print the current version of the framework. -### 4.2 Run the framework on sample data +### 5.2 Run the framework on sample data -If you've installed the Conda environments using the `--all` flag (section 2.2), you can now run the framework on the CESM sample model data: +If you've downloaded the NCAR-CESM-CAM sample data (described in section 1.2 above), you can now perform a trial run of the framework: ``` % cd $CODE_ROOT % ./mdtf -f src/default_tests.jsonc ``` -Run time may be 10-20 minutes, depending on your system. - -- If you edited/renamed `default_tests.jsonc`, pass that file instead. - -- The output files for this test case will be written to `$OUTPUT_DIR/QBOi.EXP1.AMIP.001_1977_1981`. When the framework is finished, open `$OUTPUT_DIR/QBOi.EXP1.AMIP.001_1977_1981/index.html` in a web browser to view the output report. - -- The above command will execute PODs included in `pod_list` of `default_tests.jsonc`. Skipping/adding certain PODs by uncommenting/commenting out the POD names (i.e., deleting/adding `//`). Note that entries in the list must be separated by `,` properly. Check for missing or surplus `,` if you encounter an error (e.g., "ValueError: No closing quotation"). - -- Currently the framework only analyzes data from one model run at a time. To run the MJO_prop_amp POD on the GFDL.CM4.c96L32.am4g10r8 sample data, delete or comment out the section for QBOi.EXP1.AMIP.001 in "caselist" of `default_tests.jsonc`, and uncomment the section for GFDL.CM4.c96L32.am4g10r8. - -If you re-run the above command, the result will be written to another subdirectory under `$OUTPUT_DIR`, i.e., output files saved previously will not be overwritten unless you change `overwrite` in `default_tests.jsonc` to `true`. - -### 4.3 Framework interaction with Conda environments - -As just described in section 4.2, when you run the `mdtf` executable, among other things, it reads `pod_list` in `default_tests.jsonc` and executes POD codes accordingly. For a POD included in the list (referred to as $POD_NAME): -1. The framework will first try to determine whether there is a Conda environment named `_MDTF_$POD_NAME` under `$CONDA_ENV_DIR`. If yes, the framework will switch to this environment and run the POD. - -2. If not, the framework will then look into the POD's `settings.jsonc` file in `$CODE_ROOT/diagnostics/$POD_NAME`. `runtime_requirements` in `settings.jsonc` specifies the programming language(s) adopted by the POD: - - a. If purely Python, the framework will switch to `_MDTF_python3_base` and run the POD (`_MDTF_python2_base` for ealier PODs developed in Python 2.7). - - b. If NCL is used, then `_MDTF_NCL_base`. - -Note that for the six existing PODs depending on NCL (EOF_500hPa, MJO_prop_amp, MJO_suite, MJO_teleconnection, precip_diurnal_cycle, and Wheeler_Kiladis), Python is also used but merely as a wrapper. Thus the framework will switch to `_MDTF_NCL_base` when seeing both NCL and Python in `settings.jsonc`. - -If you choose to selectively install Conda environments using the `--env` flag (section 2.2), remember to install all the environments needed for the PODs you're interested in, and that `_MDTF_base` is mandatory for the framework's operation. - -- For instance, the minimal installation for running the `EOF_500hPa` and `convective_transition_diag` PODs requres `_MDTF_base` (mandatory), `_MDTF_NCL_base` (because of b), and `_MDTF_convective_transition_diag` (because of 1). These can be installed by passing `base`, `NCL_base`, and `convective_transition_diag` to the `--env` flag one at a time (section 2.2). +Run time may be 10-20 minutes, depending on your system. +- If you edited or renamed `src/default_tests.jsonc`, as recommended in the previous section, pass the path to that configuration file instead. +- The output files for this test case will be written to `$OUTPUT_DIR/MDTF_QBOi.EXP1.AMIP.001_1977_1981`. When the framework is finished, open `$OUTPUT_DIR/QBOi.EXP1.AMIP.001_1977_1981/index.html` in a web browser to view the output report. +- The framework defaults to running all available PODs, which is overridden by the `pod_list` option in the `src/default_tests.jsonc` configuration file. Individual PODs can be specified as a comma-delimited list of POD names. +- Currently the framework only analyzes data from one model run at a time. To run the MJO_prop_amp POD on the GFDL.CM4.c96L32.am4g10r8 sample data, delete or comment out the section for QBOi.EXP1.AMIP.001 in `caselist` section of the configuration file, and uncomment the section for GFDL.CM4.c96L32.am4g10r8. -## 5. Next steps +## 6. Next steps This quickstart installation instructions is part of the "Getting started" in the [documentation site](https://mdtf-diagnostics.readthedocs.io/en/latest/). Consult the rest of Getting started for more detailed information, including how to run the framework on your own data and configure general settings. For users interested in contributing a POD module, see "Developer information" or [Developer's Walkthrough](https://mdtf-diagnostics.readthedocs.io/en/latest/_static/MDTF_walkthrough.pdf). diff --git a/diagnostics/convective_transition_diag/doc/convective_transition_diag.rst b/diagnostics/convective_transition_diag/doc/convective_transition_diag.rst index e5a389d42..012687651 100644 --- a/diagnostics/convective_transition_diag/doc/convective_transition_diag.rst +++ b/diagnostics/convective_transition_diag/doc/convective_transition_diag.rst @@ -37,7 +37,13 @@ Required programming language and libraries The is package is written in Python 2, and requires the following Python packages: os, glob, json, Dataset, numpy, scipy, matplotlib, networkx, warnings, numba, & netcdf4. These Python packages are already included in the standard Anaconda installation. -The plotting functions in this package depend on an older version of matplotlib, thus an older version of the Anaconda 2 installer (ver. 5.0.1) is recommended. +Known issue with matplotlib +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The plotting scripts of this POD may not produce the desired figures with the latest version of matplotlib (because of the default size adjustment settings). The matplotlib version comes with the Anaconda 2 installer, version 5.0.1 has been tested. The readers can switch to this older version. + +Depending on the platform and Linux distribution/version, a related error may occur with the error message "... ImportError: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory". One can find the missing object file ``libcrypto.so.1.0.0`` in the subdirectory ``~/anaconda2/pkgs/openssl-1.0.2l-h077ae2c_5/lib/``, where ``~/anaconda2/`` is where Anaconda 2 is installed. The precise names of the object file and openssl-folder may vary. Manually copying the object file to ``~/anaconda2/lib/`` should solve the error. + Required model output variables ------------------------------- diff --git a/doc/_static/MDTF_getting_started.pdf b/doc/_static/MDTF_getting_started.pdf index e354464a4..907fa0a6f 100644 Binary files a/doc/_static/MDTF_getting_started.pdf and b/doc/_static/MDTF_getting_started.pdf differ diff --git a/doc/_static/MDTF_walkthrough.pdf b/doc/_static/MDTF_walkthrough.pdf index d7cc97732..4bc648704 100644 Binary files a/doc/_static/MDTF_walkthrough.pdf and b/doc/_static/MDTF_walkthrough.pdf differ diff --git a/doc/conf.py b/doc/conf.py index 5a64d7355..80131636d 100644 --- a/doc/conf.py +++ b/doc/conf.py @@ -35,7 +35,7 @@ # -- Project information ----------------------------------------------------- -project = u'MDTF-diagnostics' +project = u'MDTF Diagnostics' copyright = u'2020, Model Diagnostics Task Force' author = u'Model Diagnostics Task Force' diff --git a/doc/sphinx/dev_checklist.rst b/doc/sphinx/dev_checklist.rst index 18e2879a8..b6e9bb93f 100644 --- a/doc/sphinx/dev_checklist.rst +++ b/doc/sphinx/dev_checklist.rst @@ -1,132 +1,91 @@ -POD Development Checklist -========================= +.. _ref-dev-checklist: -In this section, we compile a to-do list summarizing necessary steps for POD implementation, as well as a checklist for mandatory POD documentation and testing before submitting your POD. +POD development checklist +========================= -We recommend running the framework on the sample model data again with both ``save_ps`` and ``save_nc`` in the configuration input ``src/default_tests.jsonc`` set to ``true``. This will preserve directories and files created by individual PODs in the output directory, which could come in handy when you go through the instructions below, and help understand how a POD is expected to write output. +This section lists all the steps that need to be taken in order to submit a POD for inclusion in the MDTF framework. -Preparation for POD implementation -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Code and documentation submission +--------------------------------- -We assume that, at this point, you have a set of scripts, written in :doc:`languages ` consistent with the framework's open source policy, that a) read in model data, b) perform analysis, and c) output figures. Here are 3 steps to prepare your scripts for POD implementation. +The material in this section must be submitted though a `pull request `__ to the `NOAA-GFDL GitHub repo `__. This is described in :doc:`dev_git_intro`. -- Give your POD an official name (e.g., *Convective Transition*; referred to as ``long_name``) and a short name (e.g., *convective_transition_diag*). The latter will be used consistently to name the directories and files associated with your POD, so it should (1) loosely resemble the long_name, (2) avoid space bar and special characters (!@#$%^&\*), and (3) not repeat existing PODs' name (i.e., the directory names under ``diagnostics/``). Try to make your PODs name specific enough that it will be distinct from PODs contributed now or in the future by other groups working on similar phenomena. +The `example POD `__ should be used as a reference for how each component of the submission should be structured. -- If you have multiple scripts, organize them so that there is a main driver script calling the other scripts, i.e., a user only needs to execute the driver script to perform all read-in data, analysis, and plotting tasks. This driver script should be named after the POD's short name (e.g., ``convective_transition_diag.py``). +POD source code +^^^^^^^^^^^^^^^ -- You should have no problem getting scripts working as long as you have (1) the location and filenames of model data, (2) the model variable naming convention, and (3) where to output files/figures. The framework will provide these as *environment variables* that you can access (e.g., using ``os.environ`` in Python, or ``getenv`` in NCL). *DO NOT* hard code these paths/filenames/variable naming convention, etc., into your scripts. See the `complete list `__ of environment variables supplied by the framework. +All scripts should be placed in a subdirectory of ``diagnostics/``. Among the scripts, there should be 1) a main driver script, 2) a template html, and 3) a ``settings.jsonc`` file. The POD directory and html template should be named after your POD's short name. -- Your scripts should not access the internet or other networked resources. + - For instance, ``diagnostics/convective_transition_diag/`` contains its driver script ``convective_transition_diag.py``, ``convective_transition_diag.html``, and ``settings.jsonc``, etc. -An example of using framework-provided environment variables -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The framework provides a collection of environment variables, mostly in the format of strings but also some numbers, so that you can and *MUST* use in your code and make your POD portable and reusable. + - The framework will call the driver script, which calls the other scripts in the same POD directory. -For instance, using 3 of the environment variables provided by the framework, ``CASENAME``, ``DATADIR``, and ``pr_var``, the full path to the hourly precipitation file can be expressed as + - If you need a new Conda environment, add a new .yml file to ``src/conda/``, and install the environment using the ``conda_env_setup.sh`` script as described in the :doc:`Getting Started `. -:: - MODEL_OUTPUT_DIR = os.environ["DATADIR"]+"/1hr/" - pr_filename = os.environ["CASENAME"]+*."+os.environ["pr_var"]+".1hr.nc" - pr_filepath = MODEL_OUTPUT_DIR + pr_filename +POD settings file +^^^^^^^^^^^^^^^^^ -You can then use ``pr_filepath`` in your code to load the precipitation data. +The format of this file is described in :doc:`dev_settings_quick` and in more detail in :doc:`ref_settings`. -Note that in Linux shell or NCL, the values of environment variables are accessed via a ``$`` sign, e.g., ``os.environ["CASENAME"]`` in Python is equivalent to ``$CASENAME`` in Linux shell/NCL. +POD html template for output +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -.. _ref-using-env-vars: +- The html template will be copied by the framework into the output directory to display the figures generated by the POD. You should be able to create a new html template by simply copying and modifying the example templates from existing PODs even without prior knowledge about html syntax. -Relevant environment variables -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Preprocessing scripts for digested data +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The environment variables most relevant for a POD's operation are: +The "digested" supporting data policy is described in :numref:`ref-pod-digested-data`. -- ``POD_HOME``: Path to directory containing POD's scripts, e.g., ``diagnostics/convective_transition_diag/``. +For maintainability and provenance purposes, we request that you include the code used to generate your POD's "digested" data from raw data sources (any source of data that's permanently hosted). This code will not be called by the framework and will not be used by end users, so the restrictions and guidelines concerning the POD code don't apply. -- ``OBS_DATA``: Path to directory containing POD's supporting/digested observation data, e.g., ``inputdata/obs_data/convective_transition_diag/``. -- ``DATADIR``: Path to directory containing model data files for one case/experiment, e.g., ``inputdata/model/QBOi.EXP1.AMIP.001/``. +POD documentation +^^^^^^^^^^^^^^^^^ -- ``WK_DIR``: Path to directory for POD to output files. Note that **this is the only directory a POD is allowed to write its output**. E.g., ``wkdir/MDTF_QBOi.EXP1.AMIP.001_1977_1981/convective_transition_diag/``. +- The documentation for the framework is automatically generated using `sphinx `__, which works with files in `reStructured text `__ (reST, ``.rst``) format. In order to include :doc:`documentation for your POD `, we require that it be in this format. - 1. Output figures to ``$WK_DIR/obs/`` and ``$WK_DIR/model/`` respectively. + + Use the `example POD documentation `__ as a template for the information required for your POD, by modifying its .rst `source code `__. This should include a one-paragraph synopsis of the POD, developers’ contact information, required programming language and libraries, and model output variables, a brief summary of the presented diagnostics as well as references in which more in-depth discussions can be found. + + The .rst files and all linked figures should be placed in a ``doc`` subdirectory under your POD directory (e.g., ``diagnostics/convective_transition_diag/doc/``) and put the .rst file and figures inside. + + The most convenient way to write and debug reST documentation is with an online editor. We recommend `https://livesphinx.herokuapp.com/ `__ because it recognizes sphinx-specific commands as well. + + For reference, see the reStructured text `introduction `__, `quick reference `__ and `in-depth guide `__. + + Also see a reST `syntax comparison `__ to other text formats you may be familiar with. - 2. ``$WK_DIR/obs/PS`` and ``$WK_DIR/model/PS``: If a POD chooses to save vector-format figures, save them as ``EPS`` under these two directories. Files in these locations will be converted by the framework to ``PNG`` for HTML output. Caution: avoid using ``PS`` because of potential bugs in recent ``matplotlib`` and converting to PNG. +- For maintainability, all scripts should be self-documenting by including in-line comments. The main driver script (e.g., ``convective_transition_diag.py``) should contain a comprehensive header providing information that contains the same items as in the POD documentation, except for the "More about this diagnostic" section. - 3. ``$WK_DIR/obs/netCDF`` and ``$WK_DIR/model/netCDF``: If a POD chooses to save any digested data for later analysis/plotting, save them in two directories in ``NetCDF``. +- The one-paragraph POD synopsis (in the POD documentation) as well as a link to the full documentation should be placed at the top of the html template (e.g., ``convective_transition_diag.html``). -Note that (1) values of ``POD_HOME``, ``OBS_DATA``, and ``WK_DIR`` change when the framework executes different PODs; (2) the ``WK_DIR`` directory and subdirectories therein are automatically created by the framework. **Each POD should output files as described here** so that the framework knows where to find what, and also for the ease of code maintenance. +Preprocessing script documentation +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -More environment variables for specifying model variable naming convention can be found in the ``src/filedlist_$convention.jsonc`` files. Also see `the comprehensive list `__ of environment variables supplied by the framework. +The "digested" supporting data policy is described in :numref:`ref-pod-digested-data`. -To-do list for POD implementation -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +For maintainability purposes, include all information needed for a third party to reproduce your POD's digested data from its raw sources in the ``doc`` directory. This information is not published on the documentation website and can be in any format. In particular, please document the raw data sources used (DOIs/versioned references preferred) and the dependencies/build instructions (eg. conda environment) for your preprocessing script. -The following are the necessary steps for the POD module implementation and integration into the framework. You can use the PODs currently included in the code package under ``diagnostics/`` as concrete examples since they all have the same structure as described below: -1. Create your POD directory under ``diagnostics/`` and put all scripts in. Among the scripts, there should be 1) a driver script written in Python, 2) a template html, and 3) a ``settings.jsonc`` file. The POD directory, driver script, and html template should all be named after your POD's short name. +Sample and supporting data submission +------------------------------------- - - For instance, ``diagnostics/convective_transition_diag/`` contains its driver script ``convective_transition_diag.py``, ``convective_transition_diag.html``, and ``settings.jsonc``, etc. +Data hosting for the MDTF framework is currently managed manually. The data is currently hosted via anonymous FTP on UCAR's servers. Please contact the MDTF team leads via email to arrange a data transfer. - - The framework will call the driver script, which calls the other scripts in the same POD directory. +Digested observational or supporting data +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - - The html template will be copied by the framework into the output directory to display the figures generated by the POD. You can create a new html template by simply copying and modifying the example templates from existing PODs without prior knowledge about html syntax. +The "digested" supporting data policy is described in :numref:`ref-pod-digested-data`. - - ``settings.jsonc`` contains a POD's information. The framework will read this setting file to find out the driver script's name, verify the required environment and model data files are available, and prepare the necessary environment variables before executing the driver script. - -2. Create a directory under ``inputdata/obs_data/`` named after the short name, and put all your *digested* observation data in (or more generally, any quantities that are independent of the model being analyzed). +Create a directory under ``inputdata/obs_data/`` named after the short name, and put all your *digested* observation data in (or more generally, any quantities that are independent of the model being analyzed). - Digested data should be in the form of numerical data, not figures. - - - Raw data, e.g., undigested reanalysis data will be rejected. - - The data files should be small (preferably a few MB) and just enough for producing figures for model comparison. - - If you really cannot reduce the data size or require GB of space, consult with the lead team. -3. Provide the Conda environment your POD requires. Either you can use one of the Conda environments currently supplied with the framework, defined by the YAML (.yml) files in ``src/conda/``, or submit a .yml file for a new environment. - - - We recommend using existing Conda environments as much as possible. Consult with the lead team if you would like to submit a new one. - - - If you need a new Conda environment, add a new .yml file to ``src/conda/``, and install the environment using the ``conda_env_setup.sh`` script as described in the :doc:`Getting Started `. - -4. If your POD requires model data not included in the samples, prepare your own data files following instructions given in the :doc:`Getting Started `, and create a new configuration input from the template ``src/default_tests.jsonc``. - -Update ``case_list`` and ``pod_list`` in the configuration input file for your POD. Now you can try to run the framework following the :doc:`Getting Started ` and start debugging. Good luck! - -Checklist before submitting your POD -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -After getting your POD working under the framework, there are 2 additional steps regarding the mandatory POD documentation and testing before you can submit your work to the lead team. - -4. Provide documentation following the templates: - - A. Provide a comprehensive POD documentation in reStructuredText (.rst) format. This should include a one-paragraph synopsis of the POD, developers’ contact information, required programming language and libraries, and model output variables, a brief summary of the presented diagnostics as well as references in which more in-depth discussions can be found. - - - Create a ``doc`` directory under your POD directory (e.g., ``diagnostics/convective_transition_diag/doc/``) and put the .rst file and figures inside. It should be easy to copy and modify the .rst examples from existing PODs. - - B. All scripts should be self-documenting by including in-line comments. The main driver script (e.g., ``convective_transition_diag.py``) should contain a comprehensive header providing information that contains the same items as in the POD documentation, except for the "More about this diagnostic" section. - - C. The one-paragraph POD synopsis (in the POD documentation) as well as a link to the Full Documentation should be placed at the top of the html template (e.g., ``convective_transition_diag.html``). - -5. Test before distribution. It is important that you test your POD before sending it to the lead team contact. Please take the time to go through the following procedures: - - A. Test how the POD fails. Does it stop with clear errors if it doesn’t find the files it needs? How about if the dates requested are not presented in the model data? Can developers run it on data from another model? Have you added any code to scripts outside your own POD directory. Here are some simple tests you should try: - - - Move the ``inputdata`` directory around. Your POD should still work by simply updating the values of ``OBS_DATA_ROOT`` and ``MODEL_DATA_ROOT`` in the configuration input file. - - - Try to run your POD with a different set of model data. For POD development and testing, the MDTF-1 team produced the Timeslice Experiments output from the `NCAR CAM5 `__ and `GFDL AM4 (contact the lead team programmer for password) `__. - - - If you have problems getting another set of data, try changing the files' ``CASENAME`` and variable naming convention. The POD should work by updating ``CASENAME`` and ``convention`` in the configuration input. - - - Try your POD on a different machine. Check that your POD can work with reasonable machine configuration and computation power, e.g., can run on a machine with 32 GB memory, and can finish computation in 10 min. Will memory and run time become a problem if one tries your POD on model output of high spatial resolution and temporal frequency (e.g., avoid memory problem by reading in data in segments)? Does it depend on a particular version of a certain library? Consult the lead team if there's any unsolvable problems. - - B. After you have tested your POD thoroughly, make clean tar files for distribution. Make a tar file of your digested observational data (preserving the ``inputdata/obs_data/`` structure). Do the same for model data used for testing (if different from what is provided by the MDTF page). Upload your POD code to your :doc:`GitHub repo `. The tar files (and your GitHub repo) should not include any extraneous files (backups, ``pyc``, ``*~``, or ``#`` files). - - - Use ``tar -tf`` to see what is in the tar file. - C. β-test before distribution. Find people (β-testers) who are not involved in your POD's implementation and are willing to help. Give the tar files and point your GitHub repo to them. Ask them to try running the framework with your POD following the Getting Started instructions. Ask for comments on whether they can understand the documentation. +Sample model data +^^^^^^^^^^^^^^^^^ - - Possible β-tester candidates include nearby postdocs/grads and members from other POD-developing groups. +For PODs dealing with atmospheric phenomena, we recommend that you use sample data from the following sources, if applicable: -6. Submit your POD code through :doc:`GitHub pull request `, and share the tar files of digested observation (and model data if any) with the lead-team contact. Please also provide a list of tests you've conducted along with the machine configurations (e.g., memory size). +- A timeslice run of `NCAR CAM5 `__ +- A timeslice run of `GFDL AM4 `__ (contact the leads for password). diff --git a/doc/sphinx/dev_coding_tips.rst b/doc/sphinx/dev_coding_tips.rst index 8b6730530..1df4f48c9 100644 --- a/doc/sphinx/dev_coding_tips.rst +++ b/doc/sphinx/dev_coding_tips.rst @@ -1,5 +1,7 @@ -Coding best practices: avoiding common issues -============================================= +.. _ref-dev-coding-tips: + +POD coding best practices +========================= In this section we describe issues we've seen in POD code that have caused problems in the form of bugs, inefficiencies, or unintended consequences. @@ -7,9 +9,9 @@ All languages ------------- - **PS vs. EPS figures**: Save vector plots as .eps (Encapsulated PostScript), not .ps (regular PostScript). - + *Why*: Postscript (.ps) is perhaps the most common vector graphics format, and almost all plotting packages are able to output postscript files. `Encapsulated Postscript `__ (.eps) includes bounding box information that describes the physical extent of the plot's contents. This is used by the framework to generate bitmap versions of the plots correctly: the framework calls `ghostscript `__ for the conversion, and if not provided with a bounding box ghostscript assumes the graphics use an entire sheet of (letter or A4) paper. This can cause plots to be cut off if they extend outside of this region. - + Note that many plotting libraries will set the format of the output file automatically from the filename extension. The framework will process both `*.ps` and `*.eps` files. Python: General @@ -17,11 +19,11 @@ Python: General - **Whitespace**: Indent python code with four spaces per indent level. - *Why*: Python uses indentation to delineate nesting and scope within a program, and intentation that's not done consistently is a syntax error. Using four spaces is not required, but is the generally accepted standard. + *Why*: Python uses indentation to delineate nesting and scope within a program, and indentation that's not done consistently is a syntax error. Using four spaces is not required, but is the generally accepted standard. Indentation can be configured in most text editors, or fixed with scripts such as ``reindent.py`` described `here `__. We recommend using a `linter `__ such as ``pylint`` to find common bugs and syntax errors. - Beyond this, we don't impose requirements on how your code is formatted, but voluntarily following standard best practices (such as descriped in `PEP8 `__ or the Google `style guide `__\) will make it easier for you and others to understand your code, find bugs, etc. + Beyond this, we don't impose requirements on how your code is formatted, but voluntarily following standard best practices (such as described in `PEP8 `__ or the Google `style guide `__\) will make it easier for you and others to understand your code, find bugs, etc. - **Filesystem commands**: Use commands in the `os `__ and `shutil `__ modules to interact with the filesystem, instead of running unix commands using ``os.system()``, ``commands`` (which is deprecated), or ``subprocess``. @@ -57,7 +59,7 @@ Python: General Python: Arrays -------------- -To obtain acceptable performance for numerical computation, people use Python interfaces to optimized, compiled code. `NumPy `__ is the standard module for manipulating numerical arrays in Python. `xarray `__ sits on top of NumPy and provides a higher-level interface to its functionality; any advice about NumPy applies to it as well. +To obtain acceptable performance for numerical computation, people use Python interfaces to optimized, compiled code. `NumPy `__ is the standard module for manipulating numerical arrays in Python. `xarray `__ sits on top of NumPy and provides a higher-level interface to its functionality; any advice about NumPy applies to it as well. NumPy and xarray both have extensive documentation and many tutorials, such as: @@ -75,20 +77,20 @@ NumPy and xarray both have extensive documentation and many tutorials, such as: + "`Turn your conditional loops to Numpy vectors `__," by Tirthajyoti Sarkar; + "`'Vectorized' Operations: Optimized Computations on NumPy Arrays `__", part of "`Python like you mean it `__," a free resource by Ryan Soklaski. -- **Use xarray with netCDF data**: +- **Use xarray with netCDF data**: + + *Why*: This is xarray's use case. You can think of NumPy as implementing multidimensional matrices in the fully general, mathematical sense, and xarray providing the specialization to the case where the matrix contains data on a lat-lon-time-(etc.) grid. - *Why*: This is xarray's use case. You can think of NumPy as implementing multidimensional matrices in the fully general, mathematical sense, and xarray providing the specialization to the case where the matrix contains data on a lat-lon-time-(etc.) grid. - xarray lets you refer to your data with human-readable labels such as 'latitude,' rather than having to remember that that's the second dimension of your array. This bookkeeping is essential when writing code for the MDTF framework, when your POD will be run on data from models you haven't been able to test on. In particular, xarray provides seamless support for `time axes `__, with `support `__ for all CF convention calendars through the ``cftime`` library. You can, eg, subset a range of data between two dates without having to manually convert those dates to array indices. - Again, please see the xarray tutorials linked above. + See the xarray tutorials linked above for more examples of xarray's features. - **Memory use and views vs. copies**: Use scalar indexing and `slices `__ (index specifications of the form `start_index`:`stop_index`:`stride`) to get subsets of arrays whenever possible, and only use `advanced indexing `__ features (indexing arrays with other arrays) when necessary. - *Why*: When advanced indexing is used, NumPy will need to create a new copy of the array in memory, which can hurt performance if the array contains a large amount of data. By contrast, slicing or basic indexing is done in-place, without allocating a new array: the NumPy documentation calls this a "view." + *Why*: When advanced indexing is used, NumPy will need to create a new copy of the array in memory, which can hurt performance if the array contains a large amount of data. By contrast, slicing or basic indexing is done in-place, without allocating a new array: the NumPy documentation calls this a "view." Note that array slices are native `Python objects `__, so you can define a slice in a different place from the array you intend to use it on. Both NumPy and xarray arrays recognize slice objects. @@ -96,25 +98,25 @@ NumPy and xarray both have extensive documentation and many tutorials, such as: See the following references for more information: - + The numpy `documentation `__ on indexing; + + The NumPy `documentation `__ on indexing; + "`Numpy Views vs Copies: Avoiding Costly Mistakes `__," by Jessica Yung; + "`How can I tell if NumPy creates a view or a copy? `__" on stackoverflow. - **MaskedArrays instead of NaNs or sentinel values**: Use NumPy's `MaskedArrays `__ for data that may contain missing or invalid values, instead of setting those entries to NaN or a sentinel value. - + *Why*: One sometimes encounters code which sets array entries to fixed "sentinel values" (such as 1.0e+20 or `NaN `__\) to indicate missing or invalid data. This is a dangerous and error-prone practice, since it's frequently not possible to detect if the invalid entries are being used by mistake. For example, computing the variance of a timeseries with missing elements set to 1e+20 will either result in a floating-point overflow, or return zero. - NumPy provides a better solution in the form of `MaskedArrays `__, which behave identically to regular arrays but carry an extra boolean mask to indicate valid/invalid status. All the NumPy mathematical functions will automatically use this mask for error propagation. For `example `__, trying to an array element by zero or taking the square root of a negative element will mask it off, indicating that the value is invalid: you don't need to remember to do these sorts of checks explicitly. + NumPy provides a better solution in the form of `MaskedArrays `__, which behave identically to regular arrays but carry an extra boolean mask to indicate valid/invalid status. All the NumPy mathematical functions will automatically use this mask for error propagation. For `example `__, trying to divide an array element by zero or taking the square root of a negative element will mask it off, indicating that the value is invalid: you don't need to remember to do these sorts of checks explicitly. Python: Plotting ---------------- - **Use the 'Agg' backend when testing your POD**: For reproducibility, set the shell environment variable ``MPLBACKEND`` to ``Agg`` when testing your POD outside of the framework. - - *Why*: Matplotlib can use a variety of `backends `__\: interfaces to low-level graphics libraries. Some of these are platform-dependent, or require additional libraries that the MDTF framework doesn't install. In order to achieve cross-platform portability and reproducibility, the framework specifies the ``'Agg'`` non-interactive (ie, writing files only) backend for all PODs, by setting the ``MPLBACKEND`` environment variable. - + + *Why*: Matplotlib can use a variety of `backends `__\: interfaces to low-level graphics libraries. Some of these are platform-dependent, or require additional libraries that the MDTF framework doesn't install. In order to achieve cross-platform portability and reproducibility, the framework specifies the ``'Agg'`` non-interactive (ie, writing files only) backend for all PODs, by setting the ``MPLBACKEND`` environment variable. + When developing your POD, you'll want an interactive backend -- for example, this is automatically set up for you in a Jupyter notebook. When it comes to testing your POD outside of the framework, however, you should be aware of this backend difference. @@ -122,7 +124,7 @@ NCL --- - **Deprecated calendar functions**: Check the `function reference `__ to verify that the functions you use are not deprecated in the current version of `NCL `__. This is especially necessary for `date/calendar functions `__. - + *Why*: The framework uses a current version of `NCL `__ (6.6.x), to avoid plotting bugs that were present in earlier versions. This is especially relevant for calendar functions: the ``ut_*`` set of functions have been deprecated in favor of counterparts beginning with ``cd_`` that take identical arguments (so code can be updated using find/replace). For example, use `cd_calendar `__ instead of the deprecated `ut_calendar `__. - + This change is necessary because only the ``cd_*`` functions support all calendars defined in the CF conventions, which is needed to process data from some models (eg, weather or seasonal models are typically run with a Julian calendar.) diff --git a/doc/sphinx/dev_extra_tips.rst b/doc/sphinx/dev_extra_tips.rst deleted file mode 100644 index 0e9661cbb..000000000 --- a/doc/sphinx/dev_extra_tips.rst +++ /dev/null @@ -1,46 +0,0 @@ -Extra tips for POD implementation -================================= - -Scope of your POD’s code ------------------------- - -As described above, your POD should accept model data as input and express the results of its analysis in a series of figures, which are presented to the user in a web page. Input model data will be in the form of one netCDF file (with accompanying dimension information) per variable, as requested in your POD’s :doc:`settings file `. Because your POD may be run on the output of any model, you should be careful about the assumptions your code makes about the layout of these files. Supporting data may be in any format and will not be modified by the framework. - -The above data sources are your POD’s only input: you may provide options in the settings file for the user to configure when the POD is installed, but these cannot be changed each time the POD is run. Furthermore, your POD should not access the internet or other networked resources. - -The output of your POD should be a series of figures in vector format (.eps or .ps), written to a specific working directory (described below). Optionally, we encourage POD developers to also save relevant output data (eg, the output data being plotted) as netcdf files, to give users the ability to take the POD’s output and perform further analysis on it. - -Observational and supporting data; code organization. ------------------------------------------------------ - -.. figure:: ../img/dev_obs_data.jpg - :align: center - :width: 100 % - -In order to make your code run faster for the users, we request that you separate any calculations that don’t depend on the model data (eg. pre-processing of observational data), and instead save the end result of these calculations in data files for your POD to read when it is run. We refer to this as “digested observational data,” but it refers to any quantities that are independent of the model being analyzed. For purposes of data provenance, reproducibility, and code maintenance, we request that you include all the pre-processing/data reduction scripts used to create the digested data in your POD’s code base, along with references to the sources of raw data these scripts take as input (yellow box in the figure). - -Digested data should be in the form of numerical data, not figures, even if the only thing the POD does with the data is produce an unchanging reference plot. We encourage developers to separate their “number-crunching code” and plotting code in order to give end users the ability to customize output plots if needed. In order to keep the amount of supporting data needed by the framework manageable, we request that you limit the total amount of digested data you supply to no more than a few gigabytes. - -In collaboration with PCMDI, a framework is being advanced that can help systematize the provenance of observational data used for POD development. Some frequently used datasets have been prepared with this framework, known as PCMDIobs. Please check to see if the data you require is available via PCMDIobs. If it is, we encourage you to use it, otherwise proceed as described above. - -Other tips on implementation: ------------------------------ - -#. Structure of the code package: Implementing the constituent PODs in accordance with the structure described in sections 2 and 3 makes it easy to pass the package (or just part of it) to other groups. - -#. Robustness to model file/variable names: Each POD should be robust to modest changes in the file/variable names of the model output; see section 5 regarding the model output filename structure, and section 6 regarding using the environment variables and robustness tests. Also, it would be easier to apply the code package to a broader range of model output. - -#. Save intermediate output: Can be used, e.g. to save time when there is a substantial computation that can be re-used when re-running or re-plotting diagnostics. See section 3.I regarding where to save the output. - -#. Self-documenting: For maintenance and adaptation, to provide references on the scientific underpinnings, and for the code package to work out of the box without support. See step 5 in section 2. - -#. Handle large model data: The spatial resolution and temporal frequency of climate model output have increased in recent years. As such, developers should take into account the size of model data compared with the available memory. For instance, the example POD precip_diurnal_cycle and Wheeler_Kiladis only analyze part of the available model output for a period specified by the environment variables ``FIRSTYR`` and ``LASTYR``, and the convective_transition_diag module reads in data in segments. - -#. Basic vs. advanced diagnostics (within a POD): Separate parts of diagnostics, e.g, those might need adjustment when model performance out of obs range. - -#. Avoid special characters (``!@#$%^&*``) in file/script names. - - -See section 3 of the Getting Started for more details on how the package is called. See the :doc:`command line reference ` for documentation on command line options (or run ``mdtf --help``). - -Avoid making assumptions about the machine on which the framework will run beyond what’s listed here; a development priority is to interface the framework with cluster and cloud job schedulers to enable individual PODs to run in a concurrent, distributed manner. diff --git a/doc/sphinx/dev_general.rst b/doc/sphinx/dev_general.rst deleted file mode 100644 index ec127bc35..000000000 --- a/doc/sphinx/dev_general.rst +++ /dev/null @@ -1,27 +0,0 @@ -General developer resources -=========================== - -The following links to third-party pages contain information that may be helpful. - -Git tutorials/references ------------------------- - -- The official git `tutorial `__. -- A more verbose `introduction `__ to the ideas behind git and version control. -- A still more detailed `walkthrough `__ which assumes no prior knowledge. - -Python coding style -------------------- - -- `PEP8 `__, the officially recognized Python style guide. -- Google's `Python style guide `__. - -Code documentation ------------------- - -Documentation for the framework's code is managed using `sphinx `__, which works with files in `reStructured text `__ (reST, ``.rst``) format. The framework uses Google style conventions for python docstrings. - -- Sphinx `quickstart `__. -- reStructured text `introduction `__, `quick reference `__ and `in-depth guide `__. -- reST `syntax comparison `__ to other text formats you may be familiar with. -- Style guide for google-style python `docstrings `__ and quick `examples `__. diff --git a/doc/sphinx/dev_git_intro.rst b/doc/sphinx/dev_git_intro.rst index f65a6d677..fa13aad39 100644 --- a/doc/sphinx/dev_git_intro.rst +++ b/doc/sphinx/dev_git_intro.rst @@ -1,7 +1,7 @@ -.. _ref-dev-git: +.. _ref-dev-git-intro: -Developing for MDTF Diagnostics through GitHub website and git command -====================================================================== +Git-based development workflow +============================== We recommend developers to manage the MDTF package using the GitHub webpage interface: diff --git a/doc/sphinx/dev_guidelines.rst b/doc/sphinx/dev_guidelines.rst new file mode 100644 index 000000000..93c53eff6 --- /dev/null +++ b/doc/sphinx/dev_guidelines.rst @@ -0,0 +1,116 @@ +.. _ref-dev-guidelines: + +POD development guidelines +========================== + +Admissible languages +-------------------- + +The framework itself is written in Python, and can call PODs written in any scripting language. However, Python support by the lead team will be “first among equals” in terms of priority for allocating developer resources, etc. + +- To achieve portability, the MDTF **cannot** accept PODs written in closed-source languages (e.g., MATLAB and IDL; try `Octave `__ and `GDL `__ if possible). We also **cannot** accept PODs written in compiled languages (e.g., C or Fortran): installation would rapidly become impractical if users had to check compilation options for each POD. + +- Python is strongly encouraged for new PODs; PODs funded through the CPO grant are requested to be developed in Python. Python version >= 3.6 is required. Official support for Python 2 was discontinued as of January 2020. + +- If your POD was previously developed in NCL or R (and development is *not* funded through a CPO grant), you do not need to re-write existing scripts in Python 3 if doing so is likely to introduce new bugs into stable code, especially if you’re unfamiliar with Python. + +- If scripts were written in closed-source languages, translation to Python 3.6 or above is required. + +Preparation for POD implementation +---------------------------------- + +We assume that, at this point, you have a set of scripts, written in languages consistent with the framework's open source policy, that a) read in model data, b) perform analysis, and c) output figures. Here are 3 steps to prepare your scripts for POD implementation. + +We recommend running the framework on the sample model data again with both ``save_ps`` and ``save_nc`` in the configuration input ``src/default_tests.jsonc`` set to ``true``. This will preserve directories and files created by individual PODs in the output directory, which could come in handy when you go through the instructions below, and help understand how a POD is expected to write output. + +- Give your POD an official name (e.g., *Convective Transition*; referred to as ``long_name``) and a short name (e.g., *convective_transition_diag*). The latter will be used consistently to name the directories and files associated with your POD, so it should (1) loosely resemble the long_name, (2) avoid space bar and special characters (!@#$%^&\*), and (3) not repeat existing PODs' name (i.e., the directory names under ``diagnostics/``). Try to make your POD's name specific enough that it will be distinct from PODs contributed now or in the future by other groups working on similar phenomena. + +- If you have multiple scripts, organize them so that there is a main driver script calling the other scripts, i.e., a user only needs to execute the driver script to perform all read-in data, analysis, and plotting tasks. This driver script should be named after the POD's short name (e.g., ``convective_transition_diag.py``). + +- You should have no problem getting scripts working as long as you have (1) the location and filenames of model data, (2) the model variable naming convention, and (3) where to output files/figures. The framework will provide these as *environment variables* that you can access (e.g., using ``os.environ`` in Python, or ``getenv`` in NCL). *DO NOT* hard code these paths/filenames/variable naming convention, etc., into your scripts. See the `complete list `__ of environment variables supplied by the framework. + +- Your scripts should not access the internet or other networked resources. + +.. _ref-example-env-vars: + +An example of using framework-provided environment variables +------------------------------------------------------------ + +The framework provides a collection of environment variables, mostly in the format of strings but also some numbers, so that you can and *MUST* use in your code to make your POD portable and reusable. + +For instance, using 3 of the environment variables provided by the framework, ``CASENAME``, ``DATADIR``, and ``pr_var``, the full path to the hourly precipitation file can be expressed as + +:: + + MODEL_OUTPUT_DIR = os.environ["DATADIR"]+"/1hr/" + pr_filename = os.environ["CASENAME"]+"."+os.environ["pr_var"]+".1hr.nc" + pr_filepath = MODEL_OUTPUT_DIR + pr_filename + +You can then use ``pr_filepath`` in your code to load the precipitation data. + +Note that in Linux shell or NCL, the values of environment variables are accessed via a ``$`` sign, e.g., ``os.environ["CASENAME"]`` in Python is equivalent to ``$CASENAME`` in Linux shell/NCL. + +.. _ref-using-env-vars: + +Relevant environment variables +------------------------------ + +The environment variables most relevant for a POD's operation are: + +- ``POD_HOME``: Path to directory containing POD's scripts, e.g., ``diagnostics/convective_transition_diag/``. + +- ``OBS_DATA``: Path to directory containing POD's supporting/digested observation data, e.g., ``inputdata/obs_data/convective_transition_diag/``. + +- ``DATADIR``: Path to directory containing model data files for one case/experiment, e.g., ``inputdata/model/QBOi.EXP1.AMIP.001/``. + +- ``WK_DIR``: Path to directory for POD to output files. Note that **this is the only directory a POD is allowed to write its output**. E.g., ``wkdir/MDTF_QBOi.EXP1.AMIP.001_1977_1981/convective_transition_diag/``. + + 1. Output figures to ``$WK_DIR/obs/`` and ``$WK_DIR/model/`` respectively. + + 2. ``$WK_DIR/obs/PS/`` and ``$WK_DIR/model/PS/``: If a POD chooses to save vector-format figures, save them as ``EPS`` under these two directories. Files in these locations will be converted by the framework to ``PNG`` for HTML output. Caution: avoid using ``PS`` because of potential bugs in recent ``matplotlib`` and converting to PNG. + + 3. ``$WK_DIR/obs/netCDF/`` and ``$WK_DIR/model/netCDF/``: If a POD chooses to save digested data for later analysis/plotting, save them in these two directories in ``NetCDF``. + +Note that (1) values of ``POD_HOME``, ``OBS_DATA``, and ``WK_DIR`` change when the framework executes different PODs; (2) the ``WK_DIR`` directory and subdirectories therein are automatically created by the framework. **Each POD should output files as described here** so that the framework knows where to find what, and also for the ease of code maintenance. + +More environment variables for specifying model variable naming convention can be found in the ``src/filedlist_$convention.jsonc`` files. Also see the `list `__ of environment variables supplied by the framework. + + +Guidelines for testing your POD +------------------------------- + +Test before distribution. Find people (eg, nearby postdocs/grads and members from other POD-developing groups) who are not involved in your POD's implementation and are willing to help. Give the tar files and point your GitHub repo to them. Ask them to try running the framework with your POD following the Getting Started instructions. Ask for comments on whether they can understand the documentation. + +Test how the POD fails. Does it stop with clear errors if it doesn’t find the files it needs? How about if the dates requested are not presented in the model data? Can developers run it on data from another model? Here are some simple tests you should try: + + - Move the ``inputdata`` directory around. Your POD should still work by simply updating the values of ``OBS_DATA_ROOT`` and ``MODEL_DATA_ROOT`` in the configuration input file. + + - Try to run your POD with a different set of model data. + + - If you have problems getting another set of data, try changing the files' ``CASENAME`` and variable naming convention. The POD should work by updating ``CASENAME`` and ``convention`` in the configuration input. + + - Try your POD on a different machine. Check that your POD can work with reasonable machine configuration and computation power, e.g., can run on a machine with 32 GB memory, and can finish computation in 10 min. Will memory and run time become a problem if one tries your POD on model output of high spatial resolution and temporal frequency (e.g., avoid memory problem by reading in data in segments)? Does it depend on a particular version of a certain library? Consult the lead team if there's any unsolvable problems. + + +Other tips on implementation +---------------------------- + +#. Structure of the code package: Implementing the constituent PODs in accordance with the structure described in earlier sections makes it easy to pass the package (or just part of it) to other groups. + +#. Robustness to model file/variable names: Each POD should be robust to modest changes in the file/variable names of the model output; see :doc:`Getting Started ` regarding the model data filename structure, :ref:`ref-example-env-vars` and :ref:`ref-dev-checklist` regarding using the environment variables and robustness tests. Also, it would be easier to apply the code package to a broader range of model output. + +#. Save digested data after analysis: Can be used, e.g., to save time when there is a substantial computation that can be re-used when re-running or re-plotting diagnostics. See :ref:`ref-output-cleanup` regarding where to save the output. + +#. Self-documenting: For maintenance and adaptation, to provide references on the scientific underpinnings, and for the code package to work out of the box without support. See :ref:`ref-dev-checklist`. + +#. Handle large model data: The spatial resolution and temporal frequency of climate model output have increased in recent years. As such, developers should take into account the size of model data compared with the available memory. For instance, the example POD precip_diurnal_cycle and Wheeler_Kiladis only analyze part of the available model output for a period specified by the environment variables ``FIRSTYR`` and ``LASTYR``, and the convective_transition_diag module reads in data in segments. + +#. Basic vs. advanced diagnostics (within a POD): Separate parts of diagnostics, e.g, those might need adjustment when model performance out of obs range. + +#. Avoid special characters (``!@#$%^&*``) in file/script names. + + +See :ref:`ref-execute` and :doc:` framework operation walkthrough ` for details on how the package is called. See the :doc:`command line reference ` for documentation on command line options (or run ``mdtf --help``). + +Avoid making assumptions about the machine on which the framework will run beyond what’s listed here; a development priority is to interface the framework with cluster and cloud job schedulers to enable individual PODs to run in a concurrent, distributed manner. + diff --git a/doc/sphinx/dev_instruct.rst b/doc/sphinx/dev_instruct.rst deleted file mode 100644 index bdaf75711..000000000 --- a/doc/sphinx/dev_instruct.rst +++ /dev/null @@ -1,101 +0,0 @@ -Language choice and managing library dependencies -================================================= - -In this section, we discuss restrictions on coding languages and how to manage library dependencies. These are important points to be aware of when developing your POD, and may require you to modify existing code. - -You **must** manage your POD's language/library dependencies through `Conda `__, since the dependencies of the framework are so managed by design, and this is also how the end-users are instructed to set up and manage their own environments for the framework. Note that Conda is not Python-specific, but allows coexisting versioned environments of most scripting languages, including, `R `__, `NCL `__, `Ruby `__, `PyFerret `__, and more. - -To prevent the proliferation of dependencies, we suggest that new POD development use existing Conda environments whenever possible, e.g., `python3_base `__, `NCL_base `__, and `R_base `__ for Python, NCL, and R, respectively. - -Choice of language(s) ---------------------- - -The framework itself is written in Python, and can call PODs written in any scripting language. However, Python support by the lead team will be “first among equals” in terms of priority for allocating developer resources, etc. - -- To achieve portability, the MDTF **cannot** accept PODs written in closed-source languages (e.g., MATLAB and IDL; try `Octave `__ and `GDL `__ if possible). We also **cannot** accept PODs written in compiled languages (e.g., C or Fortran): installation would rapidly become impractical if users had to check compilation options for each POD. - -- Python is strongly encouraged for new PODs; PODs funded through the CPO grant are requested to be developed in Python. Python version >= 3.6 is required (official support for Python 2 was discontinued as of January 2020). - -- If your POD was previously developed in NCL or R (and development is *not* funded through a CPO grant), you do not need to re-write existing scripts in Python 3 if doing so is likely to introduce new bugs into stable code, especially if you’re unfamiliar with Python. - -- If scripts were written in closed-source languages, translation to Python 3.6 or above is required. - -We do not allow new PODs using Python 2 in principle. However, for a POD primarily coded in NCL and R, and uses Python only for the main driver script, an exception can be made on the basis of better managing existing Conda environments, after consulting with the lead team. - -POD development using exiting Conda environment ------------------------------------------------ - -We assume that you've followed the :ref:`instructions ` in the Getting Started to set up the Conda environments for the framework. We recommend developing POD and managing POD's dependencies following the same approach. - -Developers working with Python -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The framework provides the `_MDTF_python3_base `__ Conda environment (recall the ``_MDTF`` prefix for framework-specific environment) as the generic Python environment, which you can install following the :ref:`instructions `. You can then activate this environment by running in a terminal: - -:: - -% source activate $CONDA_ENV_DIR/_MDTF_python3_base - -where ``$CONDA_ENV_DIR`` is the path you used to install the Conda environments. - -- For developers' convenience, `JupyterLab `__ (including `Jupyter Notebook `__) has been included in python3_base. Run ``% jupyter lab`` or ``% jupyter notebook``, and you can start working on development. - -- If there are any `commonly used Python libraries `__ that you'd like to add to python3_base, e.g., ``jupyterlab``, run ``% conda install -c conda-forge jupyterlab``. - - a. Only add libraries when necessary. We'd like to keep the environment small. - - b. Include the ``-c`` flag and specify using the `conda-forge `__ channel as the library source. Combining packages from different channels (in particular, conda-forge and anaconda's channel) may create incompatibilities. Consult with the lead team if encounter any problem. - - c. After installation, run ``% conda clean --a`` to clear cache. - - d. Don't forget to update ``src/conda/env_python3_base.yml`` accordingly. - -After you've finished working under this environment, run ``% conda deactivate`` or simply close the terminal. - -In case you need any exotic third-party libraries, e.g., a storm tracker, consult with the lead team and create your own Conda environment following :ref:`instructions ` below. - -Developers working with NCL or R -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The framework also provides the `_MDTF_NCL_base `__ and `_MDTF_R_base `__ Conda environments as the generic NCL and R environments. You can install, activate/deactivate or add common NCL-/R-related libraries (or ``jupyterlab``) to them using commands similar to those listed above. - -.. _ref-create-conda-env: - -Create a new Conda environment ------------------------------- - -If your POD requires languages that aren't available in an existing environment or third-party libraries unavailable through the common `conda-forge `__ and `anaconda `__ channels, we ask that you notify us (since this situation may be relevant to other developers) and submit a `YAML (.yml) file `__ that creates the environment needed for your POD. - -- The new YAML file should be added to ``src/conda/``, where you can find templates for existing environments from which you can create your own. - -- The YAML filename should be ``env_$your_POD_short_name.yml``. - -- The first entry of the YAML file, name of the environment, should be ``_MDTF_$your_POD_short_name``. - -- We recommend listing conda-forge as the first channel to search, as it's entirely open source and has the largest range of packages. Note that combining packages from different channels (in particular, conda-forge and anaconda channels) may create incompatibilities. - -- We recommend constructing the list of packages manually, by simply searching your POD's code for ``import`` statements referencing third-party libraries. Please do *not* exporting your development environment with ``% conda env export``, which gives platform-specific version information and will not be fully portable in all cases; it also does so for every package in the environment, not just the "top-level" ones you directly requested. - -- We recommend specifying versions as little as possible, out of consideration for end-users: if each POD specifies exact versions of all its dependencies, conda will need to install multiple versions of the same libraries. In general, specifying a version should only be needed in cases where backward compatibility was broken (e.g., Python 2 vs. 3) or a bug affecting your POD was fixed (e.g., postscript font rendering on MacOS with older NCL). Conda installs the latest version of each package that's consistent with all other dependencies. - -Testing with new Conda environment ----------------------------------- - -If you've updated an existing environment or created a new environment (with corresponding changes to the YAML file), verify that your POD works. - -Recall :ref:`how ` the framework finds a proper Conda environment for a POD. First, it searches for an environment matching the POD's short name. If this fails, it then looks into the POD's ``settings.jsonc`` and prepares a generic environment depending on the language(s). Therefore, no additional steps are needed to specify the environment if your new YAML file follows the naming conventions above (in case of a new environment) or your ``settings.jsonc`` correctly lists the language(s) (in case of updating an existing environment). - -- For an updated environment, first, uninstall it by deleting the corresponding directory under ``$CONDA_ENV_DIR``. - -- Re-install the environment using the ``conda_env_setup.sh`` script as described in the :ref:`installation instructions `, or create the new environment for you POD: - - :: - - % cd $CODE_ROOT - % ./src/conda/conda_env_setup.sh --env $your_POD_short_name --conda_root $CONDA_ROOT --env_dir $CONDA_ENV_DIR - -- Have the framework run your POD on suitable test data. - - 1. Add your POD's short name to the ``pod_list`` section of the configuration input file (template: ``src/default_tests.jsonc``). - - 2. Prepare the test data as described in :doc:`start_config`. diff --git a/doc/sphinx/dev_migration.rst b/doc/sphinx/dev_migration.rst index e4b2f4d59..c3ba7e42d 100644 --- a/doc/sphinx/dev_migration.rst +++ b/doc/sphinx/dev_migration.rst @@ -1,7 +1,9 @@ +.. _ref-dev-migration: + Migration from framework v2.0 ============================= -In this section we summarize issues to be aware of for developers familiar with the organization of version 2.0 of the framework. New developers can skip this section, as the rest of this documentation is self-contained. +In this section we describe the major changes made from v2.0 to v3.0 of the framework that are relevant for POD developers. The scope of the framework has expanded in version 3.0, which required changes in the way the PODs and framework interact. New developers can skip this section, as the rest of this documentation is self-contained. Getting Started and Developer's Walkthrough ------------------------------------------- @@ -11,17 +13,18 @@ A main source of documentation for v2.0 of the framework were the "Getting Start - `Getting Started v3.0 (PDF) `__ - `Developer's Walkthrough v3.0 (PDF) `__ -**Note**: these documents contain a *subset* of information available on this website, rather than new material: the text is reorganized to be placed in the same order as the v2.0 documents, for ease of comparison. +.. note:: + These documents contain a subset of information available on this website, rather than new material: the text is reorganized to be placed in the same order as the v2.0 documents, for ease of comparison. Checklist for migrating a POD from v2.0 --------------------------------------- -Here we list the broad set of tasks needed to update a diagnostic written for v2.0 of the framework to v3.0. +Here we list the broad set of tasks needed to update a POD written for v2.0 of the framework to v3.0. -- **Update settings and varlist files**: In v3.0 these have been combined into a single ``settings.jsonc`` file. See the settings file :doc:`format guide <./dev_settings_quick>`, example POD, or :doc:`reference documentation <./ref_settings>` for a description of the new format. -- **Update references to framework environment variables**: See the table below for an overview, and the :doc:`reference documentation <./ref_envvars>` for complete information on what environment variables the framework sets. *Note* that your diagnostic should not use any hard-coded paths or variable names, but should read this information in from the framework's environment variables. -- **Resubmit digested observational data**: To minimize the size of supporting data users need to download, we ask that you only supply observational data specifically needed for plotting, as well as any code used to perform that data reduction from raw sources. -- **Remove HTML templating code**: Version 2.0 of the framework required that your POD's top-level driver script take particular steps to assemble its HTML file. In v3.0 these tasks are done by the framework: all that your POD needs to do is generate figures of the appropriate names in the specified folders, and the framework will convert and link them appropriately. +- **Update settings and varlist files**: In v3.0 these have been combined into a single ``settings.jsonc`` file. See the settings file :doc:`guide <./dev_settings_quick>`, :doc:`reference <./ref_settings>`, and `example `__ for descriptions of the new format. +- **Update references to framework environment variables**: See the table below for an overview, and the :doc:`reference <./ref_envvars>` for complete information on what environment variables the framework sets. *Note* that your POD should not use any hard-coded paths or variable names, but should read this information in from the framework's environment variables. +- **Resubmit digested observational data**: To minimize the size of supporting data users need to download, we ask that you only supply observational data specifically needed for plotting (preferably size within MB range), as well as any code used to perform that data reduction from raw sources. +- **Remove HTML templating code**: Version 2.0 of the framework required that your POD's top-level driver script take particular steps to assemble its HTML file. In v3.0 these tasks are done by the framework: all that your POD needs to do is generate figures of the appropriate formats and names in the specified folders, and the framework will convert and link them appropriately. Conversion from v2.0 environment variables ------------------------------------------ @@ -46,7 +49,7 @@ In v3.0, the paths referred to by the framework's environment variables have bee * - POD's working directory - ``$variab_dir``/ - ``$WK_DIR`` - * - Path to requested netcdf data file for at date frequency + * - Path to requested NetCDF data file for at date frequency - Currently unchanged: ``$DATADIR``//``$CASENAME``...nc - * - Other v2.0 paths diff --git a/doc/sphinx/dev_overview.rst b/doc/sphinx/dev_overview.rst new file mode 100644 index 000000000..b6261dc58 --- /dev/null +++ b/doc/sphinx/dev_overview.rst @@ -0,0 +1,41 @@ +Introduction for POD developers +=============================== + +This walkthrough contains information for developers wanting to contribute a process-oriented diagnostic (POD) module to the MDTF framework. There are two tracks through the material: one for developers who have an existing analysis script they want to adapt for use in the framework, and one for developers who are writing a POD from scratch. + +:numref:`ref-dev-start` provides instructions for setting up POD development, in particular managing language and library dependencies through conda. For developers already familiar with version 2.0 of the framework, :numref:`ref-dev-migration` summarizes changes from v2.0 to facilitate migration to v3.0. New developers can skip this section, as the rest of this walkthrough is self-contained. + +:numref:`ref-dev-checklist` Provides a list of instructions for submitting a POD for inclusion in the framework. We require developers to submit PODs through `GitHub `__. See :numref:`ref-dev-git-intro` for how to manage code through the GitHub website. + +:numref:`ref-dev-guidelines` provides overall guidelines for POD development. :numref:`ref-dev-settings-quick` is a reference for the POD's settings file format. In :numref:`ref-dev-walkthrough`, we walk the developers through the workflow of the framework, focusing on aspects that are relevant for the operation of individual PODs, and using the `Example Diagnostic POD `__ as a concrete example to illustrate how a POD works under the framework :numref:`ref-dev-coding-tips` provides coding best practices to address common issues encountered in submitted PODs.. + + + +Scope of a process-oriented diagnostic +-------------------------------------- + +The MDTF framework imposes requirements on the types of data your POD outputs and takes as input. In addition to the scientific scope of process-oriented diagnostics, the analysis that you intend to do needs to fit the following model: + +Your POD should accept model data as input and express the results of its analysis in a series of figures, which are presented to the user in a web page. Input model data will be in the form of one NetCDF file (with accompanying dimension information) per variable, as requested in your POD’s :doc:`settings file `. Because your POD may be run on the output of any model, you should be careful about the assumptions your code makes about the layout of these files (eg, the range of longitude or the `positive `__ convention for vertical coordinates). Supporting data may be in any format and will not be modified by the framework (see next section). + +The above data sources are your POD’s only input: your POD should not access the internet or other networked resources. You may provide options in the settings file for the user to configure when the POD is installed, but these cannot be changed each time the POD is run. + +To achieve portability, the MDTF cannot accept PODs written in closed-source languages (eg, MATLAB or IDL). We also cannot accept PODs written in compiled languages (eg, C or Fortran): installation would rapidly become impractical if users had to check compilation options for each POD. + +The output of your POD should be a series of figures in vector format (.eps or .ps). Optionally, we encourage POD developers to also save relevant output data (e.g., the output data being plotted) as netcdf files, to give users the ability to take the POD’s output and perform further analysis on it. + +.. _ref-pod-digested-data: + +POD code organization and supporting data +----------------------------------------- + +.. figure:: ../img/dev_obs_data.jpg + :align: center + :width: 100 % + +In order to make your code run faster for the users, we request that you separate any calculations that don’t depend on the model data (e.g., pre-processing of observational data), and instead save the end result of these calculations in data files for your POD to read when it is run. We refer to this as “digested observational data,” but it refers to any quantities that are independent of the model being analyzed. For purposes of data provenance, reproducibility, and code maintenance, we request that you include all the pre-processing/data reduction scripts used to create the digested data in your POD’s code base, along with references to the sources of raw data these scripts take as input (yellow box in the figure). + +Digested data should be in the form of numerical data, not figures, even if the only thing the POD does with the data is produce an unchanging reference plot. We encourage developers to separate their “number-crunching code” and plotting code in order to give end users the ability to customize output plots if needed. In order to keep the amount of supporting data needed by the framework manageable, we request that you limit the total amount of digested data you supply to no more than a few gigabytes. + +In collaboration with PCMDI, a framework is being advanced that can help systematize the provenance of observational data used for POD development. This section will be updated when this data source is ready for public use. + diff --git a/doc/sphinx/dev_quickstart_guide.rst b/doc/sphinx/dev_quickstart_guide.rst deleted file mode 100644 index a6a9ecd79..000000000 --- a/doc/sphinx/dev_quickstart_guide.rst +++ /dev/null @@ -1,19 +0,0 @@ -Developer's quickstart guide -============================ - -This walkthrough contains information for developers wanting to contribute a process-oriented diagnostic (POD) module to the MDTF framework. We assume that you have read the `Getting Started `__, and have followed the instructions therein for installing and testing the MDTF package, thus having some idea about the package structure and how it works. We further recommend running the framework on the sample model data with both ``save_ps`` and ``save_nc`` in the configuration input ``src/default_tests.jsonc`` set to ``true``. This will preserve directories and files created by individual PODs, and help you understand how a POD is expected to write output. - -For developers already familiar with version 2.0 of the framework, :doc:`section 2 ` concisely summarizes changes from v2.0 to facilitate migration to v3.0. New developers can skip this section, as the rest of this walkthrough is self-contained. - -For new developers, :doc:`section 3 ` provides a to-do list of steps for implementing and integrating a POD into the framework, with more technical details in subsequent sections. :doc:`Section 4 ` discusses the choice of programming languages, managing language and library dependencies through Conda, how to make use of and extend an existing Conda environment for POD development, and create a new Conda environment if necessary. In :doc:`section 5 `, we walk the developers through the workflow of the framework, focusing on aspects that are relevant for the operation of individual PODs, and using the `Example Diagnostic POD `__ as a concrete example to illustrate how a POD works under the framework. - -We require developers to manage POD codes and submit them through `GitHub `__. See :doc:`section 8 ` for how to manage code through the GitHub website and, for motivated developers, how to manage using the ``git`` command. - -[@@@Moved from instruct: - -Scope of the analysis your POD conducts ---------------------------------------- - -See the `BAMS article `__ describing version 2.0 of the framework for a description of the project’s scientific goals and what we mean by a “process oriented diagnostic” (POD). We encourage PODs to have a specific, focused scope. - -PODs should be relatively lightweight in terms of computation and memory requirements (eg, run time measured in minutes, not hours): this is to enable rapid feedback and iteration cycles to assist users in model development. Bear in mind that your POD may be run on model output of potentially any date range and spatial resolution. Your POD should not require strong assumptions about these quantities, or other details of the model’s operation. @@@] diff --git a/doc/sphinx/dev_settings_quick.rst b/doc/sphinx/dev_settings_quick.rst index 67ecb299f..3df055ac4 100644 --- a/doc/sphinx/dev_settings_quick.rst +++ b/doc/sphinx/dev_settings_quick.rst @@ -1,7 +1,9 @@ -Diagnostic settings file quickstart -=================================== +.. _ref-dev-settings-quick: -This page gives a quick introduction to how to write the settings file for your diagnostic. See the full :doc:`documentation <./ref_settings>` on this file format for a complete list of all the options you can specify. +POD settings file summary +========================= + +This page gives a quick introduction to how to write the settings file for your POD. See the full :doc:`documentation <./ref_settings>` on this file format for a complete list of all the options you can specify. Overview -------- @@ -9,7 +11,7 @@ Overview The MDTF framework can be viewed as a "wrapper" for your code that handles data fetching and munging. Your code communicates with this wrapper in two ways: - The *settings file* is where your code talks to the framework: when you write your code, you document what model data your code uses and what format it expects it in. When the framework is run, it will fulfill the requests you make here (or tell the user what went wrong). -- When your code is run, the framework talks to it by setting :doc:`environment variables ` containing paths to the data files and other information specific to the run (not covered on this page, follow the link for details). +- When your code is run, the framework talks to it by setting :doc:`environment variables ` containing paths to the data files and other information specific to the run. In the settings file, you specify what model data your diagnostic uses in a vocabulary you're already familiar with: diff --git a/doc/sphinx/dev_start.rst b/doc/sphinx/dev_start.rst new file mode 100644 index 000000000..04496eced --- /dev/null +++ b/doc/sphinx/dev_start.rst @@ -0,0 +1,117 @@ +.. _ref-dev-start: + +Developer quickstart guide +========================== + +This section contains instructions for beginning to + +Developer installation instructions +----------------------------------- + +To download and install the framework for development, follow the instructions for end users given in :doc:`start_install`, with the following developer-specific modifications: + +Obtaining the source code +^^^^^^^^^^^^^^^^^^^^^^^^^ + +POD developers should work from the `develop branch `__ of the framework code. This is the "beta test" version, used for testing changes before releasing them to end users. + +Developers may download the code from GitHub as described in :ref:`ref-download`, but we strongly recommend that you clone the repo in order to keep up with changes in the develop branch, and to simplify submitting pull requests with your POD's code. Instructions for how to do this are given in :doc:`dev-git-intro`. + +Installing dependencies via conda +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Regardless of development language, we strongly recommend that developers use conda to manage their language and library versions. Note that Conda is not Python-specific, but allows coexisting versioned environments of most scripting languages, including, `R `__, `NCL `__, `Ruby `__, `PyFerret `__, and more. + + +We recommend that new PODs be written in Python 3. We provide a developer version of the python3_base environment (described below) that includes Jupyter and other developer-specific tools. This is not installed by default, and must be requested by passing the ``-all-dev`` flag to the conda setup script: + +:: + +% cd $CODE_ROOT +% ./src/conda/conda_env_setup.sh --all-dev --conda_root $CONDA_ROOT --env_dir $CONDA_ENV_DIR + + +POD development using existing Conda environments +------------------------------------------------- + +To prevent the proliferation of dependencies, we suggest that new POD development use existing Conda environments whenever possible, e.g., `python3_base `__, `NCL_base `__, and `R_base `__ for Python, NCL, and R, respectively. + +In case you need any exotic third-party libraries, e.g., a storm tracker, consult with the lead team and create your own Conda environment following :ref:`instructions ` below. + +Python +^^^^^^ + +The framework provides the `_MDTF_python3_base `__ Conda environment (recall the ``_MDTF`` prefix for framework-specific environment) as the generic Python environment, which you can install following the :ref:`instructions `. You can then activate this environment by running in a terminal: + +:: + +% source activate $CONDA_ENV_DIR/_MDTF_python3_base + +where ``$CONDA_ENV_DIR`` is the path you used to install the Conda environments. After you've finished working under this environment, run ``% conda deactivate`` or simply close the terminal. + +Other languages +^^^^^^^^^^^^^^^ + +The framework also provides the `_MDTF_NCL_base `__ and `_MDTF_R_base `__ Conda environments as the generic NCL and R environments. + +.. _ref-create-conda-env: + +POD development using a new Conda environment +--------------------------------------------- + +If your POD requires languages that aren't available in an existing environment or third-party libraries unavailable through the common `conda-forge `__ and `anaconda `__ channels, we ask that you notify us (since this situation may be relevant to other developers) and submit a `YAML (.yml) file `__ that creates the environment needed for your POD. + +- The new YAML file should be added to ``src/conda/``, where you can find templates for existing environments from which you can create your own. + +- The YAML filename should be ``env_$your_POD_short_name.yml``. + +- The first entry of the YAML file, name of the environment, should be ``_MDTF_$your_POD_short_name``. + +- We recommend listing conda-forge as the first channel to search, as it's entirely open source and has the largest range of packages. Note that combining packages from different channels (in particular, conda-forge and anaconda channels) may create incompatibilities. + +- We recommend constructing the list of packages manually, by simply searching your POD's code for ``import`` statements referencing third-party libraries. Please do *not* exporting your development environment with ``% conda env export``, which gives platform-specific version information and will not be fully portable in all cases; it also does so for every package in the environment, not just the "top-level" ones you directly requested. + +- We recommend specifying versions as little as possible, out of consideration for end-users: if each POD specifies exact versions of all its dependencies, conda will need to install multiple versions of the same libraries. In general, specifying a version should only be needed in cases where backward compatibility was broken (e.g., Python 2 vs. 3) or a bug affecting your POD was fixed (e.g., postscript font rendering on Mac OS with older NCL). Conda installs the latest version of each package that's consistent with all other dependencies. + +Framework interaction with conda environments +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +As described in :ref:`ref-execute`, when you run the ``mdtf`` executable, among other things, it reads ``pod_list`` in ``default_tests.jsonc`` and executes POD codes accordingly. For a POD included in the list (referred to as $POD_NAME): + +1. The framework will first try to look for the YAML file ``src/conda/env_$POD_NAME.yml``. If it exists, the framework will assume that the corresponding conda environment ``_MDTF_$POD_NAME`` has been installed under ``$CONDA_ENV_DIR``, and will switch to this environment and run the POD. + +2. If not, the framework will then look into the POD's ``settings.jsonc`` file in ``$CODE_ROOT/diagnostics/$POD_NAME/``. The ``runtime_requirements`` section in ``settings.jsonc`` specifies the programming language(s) adopted by the POD: + + a). If purely Python 3, the framework will look for ``src/conda/env_python3_base.yml`` and check its content to determine whether the POD's requirements are met, and then switch to ``_MDTF_python3_base`` and run the POD. + + b). Similarly, if NCL or R is used, then ``NCL_base`` or ``R_base``. + +Note that for the 6 existing PODs depending on NCL (EOF_500hPa, MJO_prop_amp, MJO_suite, MJO_teleconnection, precip_diurnal_cycle, and Wheeler_Kiladis), Python is also used but merely as a wrapper. Thus the framework will switch to ``_MDTF_NCL_base`` when seeing both NCL and Python in ``settings.jsonc``. + +The framework verifies PODs' requirements via looking for the YAML files and their contents. Thus if you choose to selectively install conda environments using the ``--env`` flag (:ref:`ref-conda-env-install`), remember to install all the environments needed for the PODs you're interested in, and that ``_MDTF_base`` is mandatory for the framework's operation. + +- For instance, the minimal installation for running the ``EOF_500hPa`` and ``convective_transition_diag PODs`` requres ``_MDTF_base`` (mandatory), ``_MDTF_NCL_base`` (because of b), and ``_MDTF_convective_transition_diag`` (because of 1). These can be installed by passing ``base``, ``NCL_base``, and ``convective_transition_diag`` to the ``--env`` flag one at a time (:ref:`ref-conda-env-install`). + + +Testing with a new Conda environment +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +If you've updated an existing environment or created a new environment (with corresponding changes to the YAML file), verify that your POD works. + +Recall how the framework finds a proper Conda environment for a POD. First, it searches for an environment matching the POD's short name. If this fails, it then looks into the POD's ``settings.jsonc`` and prepares a generic environment depending on the language(s). Therefore, no additional steps are needed to specify the environment if your new YAML file follows the naming conventions above (in case of a new environment) or your ``settings.jsonc`` correctly lists the language(s) (in case of updating an existing environment). + +- For an updated environment, first, uninstall it by deleting the corresponding directory under ``$CONDA_ENV_DIR``. + +- Re-install the environment using the ``conda_env_setup.sh`` script as described in the :ref:`installation instructions `, or create the new environment for you POD: + + :: + + % cd $CODE_ROOT + % ./src/conda/conda_env_setup.sh --env $your_POD_short_name --conda_root $CONDA_ROOT --env_dir $CONDA_ENV_DIR + +- Have the framework run your POD on suitable test data. + + 1. Add your POD's short name to the ``pod_list`` section of the configuration input file (template: ``src/default_tests.jsonc``). + + 2. Prepare the test data as described in :doc:`start_config`. + diff --git a/doc/sphinx/dev_toc.rst b/doc/sphinx/dev_toc.rst index d6c2c4e49..2afe201bd 100644 --- a/doc/sphinx/dev_toc.rst +++ b/doc/sphinx/dev_toc.rst @@ -5,14 +5,12 @@ Developer information :maxdepth: 1 :numbered: 2 - dev_quickstart_guide + dev_overview dev_migration dev_checklist - dev_instruct + dev_start + dev_guidelines + dev_settings_quick dev_walkthrough dev_coding_tips - dev_extra_tips - dev_extra_tips - dev_settings_quick - dev_general dev_git_intro diff --git a/doc/sphinx/dev_walkthrough.rst b/doc/sphinx/dev_walkthrough.rst index 99467e192..a982528e4 100644 --- a/doc/sphinx/dev_walkthrough.rst +++ b/doc/sphinx/dev_walkthrough.rst @@ -1,7 +1,9 @@ +.. _ref-dev-walkthrough: + Walkthrough of framework operation ================================== -We now describe in greater detail the actions that are taken when the framework is run, focusing on aspects that are relevant for the operation of individual PODs. The `Example Diagnostic POD `__ (short name: ``example``) is used as a concrete example here to illustrate how a POD is implemented and integrated into the framework. +In this section, we describe the actions that are taken when the framework is run, focusing on aspects that are relevant for the operation of individual PODs. The `Example Diagnostic POD `__ (short name: ``example``) is used as a concrete example here to illustrate how a POD is implemented and integrated into the framework. .. figure:: ../img/dev_flowchart.jpg :align: center @@ -12,7 +14,6 @@ We begin with a reminder that there are 2 essential files for the operation of t - ``src/default_tests.jsonc``: configuration input for the framework. - ``diagnostics/example/settings.jsonc``: settings file for the example POD. -To setup for running the example POD, (1) download the necessary supporting and NCAR-CAM5.timeslice sample data @@@hyperlinks required@@@ and unzip them under ``inputdata/``, and (2) open ``default_tests.jsonc``, uncomment the whole ``NCAR-CAM5.timeslice`` section in ``case_list``, and comment out the other cases in the list. We also recommend setting both ``save_ps`` and ``save_nc`` to ``true``. Step 1: Framework invocation ---------------------------- @@ -39,7 +40,7 @@ Each POD describes the model data it requires as input in the ``varlist`` sectio - The most important features of ``settings.jsonc`` are described in the :doc:`settings documentation ` and full detail on the :doc:`reference page `. -- Variables are specified in ``varlist`` following `CF convention `__ wherever possible. If your POD requires derived quantities that are not part of the standard model output (e.g., column weighted averages), incorporate necessary preprocessings for computing these from standard output variables into your code. PODs are allowed to request variables outside of the CF conventions (by requiring an exact match on the variable name), but this will severely limit the POD's application. +- Variables are specified in ``varlist`` following `CF convention `__ wherever possible. If your POD requires derived quantities that are not part of the standard model output (e.g., column weighted averages), incorporate necessary preprocessing for computing these from standard output variables into your code. PODs are allowed to request variables outside of the CF conventions (by requiring an exact match on the variable name), but this will severely limit the POD's application. - Some of the requested variables may be unavailable or without the requested characteristics (e.g., frequency). You can specify a *backup plan* for this situation by designating sets of variables as *alternates* if feasible: when the framework is unable to obtain a variable that has the ``alternates`` attribute in ``varlist``, it will then (and only then) query the model data source for the variables named as alternates. @@ -52,7 +53,7 @@ Once the framework has determined which PODs are able to run given the model dat Example diagnostic ^^^^^^^^^^^^^^^^^^ -The example POD uses only one model variable in its `varlist `__: surface air temperature, recorded at monthly frequency. +The example POD uses only one model variable in its `varlist `__: surface air temperature, recorded at monthly frequency. - In the beginning of ``example.log``, the framework reports finding the requested model data file under ``Found files``. @@ -61,7 +62,7 @@ The example POD uses only one model variable in its `varlist `__: Python 3, and the matplotlib, xarray and netCDF4 third-party libraries for Python. In this case, the framework assigns the POD to run in the generic `python3_base `__ environment provided by the framework. +In its ``settings.jsonc``, the example POD lists its `requirements `__: Python 3, and the matplotlib, xarray and netCDF4 third-party libraries for Python. In this case, the framework assigns the POD to run in the generic `python3_base `__ environment provided by the framework. -- In ``example.log``, under ``Env vars:`` is a comprehensive list of environment variables prepared for the POD by the framework. A great part of them are defined as in ``src/filedlist_$convention.jsonc`` via ``convention`` in ``default_tests``. Some of the environment variables are POD-specific as defined under ''pod_env_vars'' in the POD's ``settings.jsonc``, e.g., ``EXAMPLE_FAV_COLOR``. +- In ``example.log``, under ``Env vars:`` is a comprehensive list of environment variables prepared for the POD by the framework. A great part of them are defined as in `src/fieldlist_CMIP.jsonc `__ via setting ``convention`` in ``default_tests.jsonc`` to ``CMIP``. Some of the environment variables are POD-specific as defined under `pod_env_vars `__ in the POD's ``settings.jsonc``, e.g., ``EXAMPLE_FAV_COLOR``. -- In ``example.log``, after ``--- MDTF.py calling POD example``, the framework verifies the Conda-related paths, and makes sure that the ``runtime_requirements`` in ``settings.jsonc`` are met by the Conda environment assigned to the POD. +- In ``example.log``, after ``--- MDTF.py calling POD example``, the framework verifies the Conda-related paths, and makes sure that the ``runtime_requirements`` in ``settings.jsonc`` are met by the python3_base environment via checking `env_python3_base.yml `__. Step 4: POD execution --------------------- @@ -99,9 +106,9 @@ At this point, your POD’s requirements have been met, and the environment vari - The framework contains additional exception handling so that if a POD experiences a fatal or unrecoverable error, the rest of the tasks and POD-calls by the framework can continue. The error messages, if any, will be included in the POD's log file. -In case your POD requires derived quantities that are not part of the standard model output, and you've incorporated necessary preprocessings into your code (e.g., compute column average temperature from a vertically-resolved temperature field), one might be interested in saving these derived quantities as intermediate output for later use, and you may include this functionality in your code. +In case your POD requires derived quantities that are not part of the standard model output, and you've incorporated necessary preprocessing into your code (e.g., compute column average temperature from a vertically-resolved temperature field), one might be interested in saving these derived quantities as intermediate output for later use, and you may include this functionality in your code. -- Here we are referring to derived quantities similarly gridded as model output, instead of highly-digested data that is just enough for making figures. +- Here we are referring to derived quantities gridded in a similar way to model output, instead of highly-digested data that is just enough for making figures. - Save these as NetCDF files to the same directory containing the original model files. One file for one variable, following the filename convention spelled out in :doc:`Getting Started `. @@ -130,6 +137,8 @@ In code block 7) of ``example-diag.py``, we include an example of exception hand - The last few lines of ``example.log`` demonstrate the script is able to finish execution despite an error having occurred. Exception handling makes code robust. +.. _ref-output-cleanup: + Step 5: Output and cleanup -------------------------- diff --git a/doc/sphinx/pod_summary.rst b/doc/sphinx/pod_summary.rst index efb096875..1c6657393 100644 --- a/doc/sphinx/pod_summary.rst +++ b/doc/sphinx/pod_summary.rst @@ -1,15 +1,9 @@ Summary of MDTF process-oriented diagnostics -========================================================== +============================================ -The MDTF diagnostic package is portable, extensible, usable, and open for contribution from the community. A goal is to allow diagnostics to be repeatable inside, or outside, of modeling center workflows. These are diagnostics focused on model improvement, and as such a slightly different focus from other efforts. The code runs on CESM model output, as well as on GFDL and CF-compliant model output. +The MDTF diagnostics package is a portable framework for running process-oriented diagnostics (PODs) on climate model data. Each POD script targets a specific physical process or emergent behavior, with the goals of determining how accurately the model represents that process, ensuring that models produce the right answers for the right reasons, and identifying gaps in the understanding of phenomena. -The MDTF Diagnostic Framework consists of multiple modules, each of which is developed by an individual research group or user. Modules are independent of each other, each module: - -- Produces its own html file (webpage) as the final product - -- Consists of a set of process-oriented diagnostics - -- Produces a figures or multiple figures that can be displayed by the html in a browser +The scientific motivation and content behind the framework was described in E. D. Maloney et al. (2019): Process-Oriented Evaluation of Climate and Weather Forecasting Models. *BAMS*, **100** (9), 1665–1686, `doi:10.1175/BAMS-D-18-0042.1 `__. Convective Transition Diagnostics --------------------------------- diff --git a/doc/sphinx/ref_cli.rst b/doc/sphinx/ref_cli.rst index 933ba66e3..ed0b90ff5 100644 --- a/doc/sphinx/ref_cli.rst +++ b/doc/sphinx/ref_cli.rst @@ -32,7 +32,7 @@ Paths Parent directories of input and output data. **Note** that all the paths below should be on a *local* filesystem. Environment variables in paths (eg ``$HOME``) are resolved according to the shell context ``mdtf`` was called from. Relative paths are resolved relative to the repo directory. * ``--MODEL-DATA-ROOT, --MODEL_DATA_ROOT ``: Directory to store input data from different models. Depending on the choice of ``data_manager`` (see below), input model data will typically be copied from a remote filesystem to this location. -* ``--OBS-DATA-ROOT, --OBS_DATA_ROOT ``: Directory containing observational data used by individual PODs. Currently, this must be downloaded manually as part of the framework installation. See :numref:`ref-supporting-data` of the :doc:`installation guide ` for instructions. +* ``--OBS-DATA-ROOT, --OBS_DATA_ROOT ``: Directory containing observational data used by individual PODs. Currently, this must be downloaded manually as part of the framework installation. See :numref:`ref-download` of the :doc:`installation guide ` for instructions. * ``--WORKING-DIR, --WORKING_DIR ``: Working directory. * ``--OUTPUT-DIR, --OUTPUT_DIR, -o ``: Destination for output files. Currently this must be on the same filesystem as ``WORKING_DIR``. diff --git a/doc/sphinx/start_config.rst b/doc/sphinx/start_config.rst index 9c99c0a16..cce5cf0ac 100644 --- a/doc/sphinx/start_config.rst +++ b/doc/sphinx/start_config.rst @@ -1,24 +1,23 @@ -Using customized model data & framework configuration -===================================================== +Framework configuration for user model data +=========================================== In this section we describe how to run the framework with your own model data, and more configuration options than the test case described in :doc:`start_install`. The complete set of configuration options is described in :doc:`ref_cli`, or by running ``% ./mdtf --help``. All options can be specified as a command-line flag (e.g., ``--OUTPUT_DIR``) or as a JSON configuration input file of the form provided in `src/default_tests.jsonc `__. We recommend using this file as a template, making copies and customizing it as needed. -Options given on the command line always take precedence over the input file. This is so you can store options that don't frequently change in the file (e.g., the input/output data paths) and use command-line flags to only set the options you want to change from run to run (e.g., the analysis period start and end years). In all cases, the complete set of option values used in each run of the framework will be included in the log file as part of the output, for reproducibility and provenance. +Options given on the command line always take precedence over the input file. This is so you can store options that don't frequently change in the file (e.g., the input/output data paths) and use command-line flags to set only those options you want to change from run to run (e.g., the analysis period start and end years). In all cases, the complete set of option values used in each run of the framework will be included in the log file as part of the output, for reproducibility and provenance. -Summary of steps for using customized model data -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +**Summary of steps for running the framework on user data** -1. Have your data files ready in NetCDF format. -2. Save the files following the specified directory hierarchy and filename convention. -3. Check the variable name convention. -4. Edit the configuration input file accordingly, then run the framework. +1. Save or link model data files following the framework's filename convention. +2. Select the variable name convention used by the model. +3. Edit the configuration input file accordingly, then +4. Run the framework. Adding your model data ---------------------- -Currently the framework is only able to run on model data in the form of NetCDF files on a locally mounted disk following a specific directory hierarchy and filename convention. We hope to offer more flexibility in this area in the near future. +Currently the framework is only able to run on model data in the form of NetCDF files on a locally mounted disk following a specific directory hierarchy and filename convention, with one variable per file. We hope to offer more flexibility in this area in the near future. The directory/filename convention we use is @@ -55,7 +54,7 @@ As an example, here's how the sample model data is organized: └── obs_data ( = $OBS_DATA_ROOT) ├── (... supporting data for individual PODs ) -If your model data is available on a locally mounted disk, you can make `symlinks `__ that have the needed filenames and point to the data, rather than making copies of the files. For example, +If your model data is available on a locally mounted disk, we recommend creating `symlinks `__ that have the needed filenames and point to the data, rather than making copies of the files. For example, :: @@ -74,39 +73,38 @@ will create a link to the file in the first argument that can be accessed normal │ └── day │ └── my_new_experiment.pr.day.nc -As implicitly demonstrated by the examples above, the current framework does not support having multiple variables in one file. +Select the model's variable name convention +------------------------------------------- -Check variable name convention ------------------------------- +The framework requires specifying a convention for variable names used in the model data. Currently recognized conventions are -The variable names have to be consistent with the convention expected by the framework, as defined in the JSON files ``src/fieldlist_$convention.jsonc``. +- ``CMIP``, for CF-compliant output produced as part of CMIP6; +- ``CESM``, for the NCAR `community earth system model `__; +- ``AM4``, for the NOAA-GFDL `atmosphere model `__; +- ``SPEAR``, for the NOAA-GFDL `seasonal model `__. -Currently, the convention templates provided by the framework include ``CMIP``, for CF-compliant output produced as part of CMIP6 (e.g.,, by post-processing with `CMOR `__) and ``CESM``, ``AM4`` and ``SPEAR``. We hope to offer support for the native variable naming conventions of a wider range of models in the future. +We hope to offer support for the variable naming conventions of a wider range of models in the future. For the time being, please process output of models not on this list with `CMOR `__ to make them CF-compliant. -- For instance, ``src/fieldlist_CESM.jsonc`` specifies the convention adopted by the NCAR CESM. Open this file, the line ``"pr_var" : "PRECT",`` means that total precipitation rate (*pr* for CF-compliant output) should be saved as *PRECT* (case sensitive). In addition, ``"pr_conversion_factor" : 1000,`` makes the units of precipitation CF-compliant. - -- You can either change the NetCDF variable/file names following the provided ``fieldlist_$convention.jsonc`` files, or edit and rename the files to fit your model data. - -Note that entries in the JSON files must be properly separated by ``,``. Check for missing or surplus `,` if you encounter an error +Alternatively, the framework will load any lookup tables of the form ``src/fieldlist_$convention.jsonc`` and use them for variable name conversion. Users can add new files in this format to specify new conventions. For example, in ``src/fieldlist_CESM.jsonc`` the line ``"pr_var" : "PRECT"`` means that the CESM name for the precipitation rate is PRECT (case sensitive). In addition, ``"pr_conversion_factor" : 1000`` specifies the conversion factor to CF standard units for this variable. Running the code on your data ----------------------------- -After adding your model data to the directory hierarchy as described above, you can run the framework on that data using the following options. These can either be set in the "caselist" section of the configuration input file (see `src/default_tests.jsonc `__ for an example/template), or individually as command-line flags (e.g., ``--CASENAME my_new_experiment``). Required settings are: +After adding your model data to the directory hierarchy as described above, you can run the framework on that data using the following options. These can either be set in the ``caselist`` section of the configuration input file (see `src/default_tests.jsonc `__ for an example/template), or individually as command-line flags (e.g., ``--CASENAME my_new_experiment``). Required settings are: -- ``CASENAME`` should be the same string used to label your model run, -- ``convention`` describes the variable naming convention your model uses. With the string specified here (referred to as $convention), the framework will look for the corresponding ``src/fieldlist_$convention.jsonc`` +- ``CASENAME`` should be the same string used to label your model run. +- ``convention`` describes the variable naming convention your model uses, determined in the previous section. - ``FIRSTYR`` and ``LASTYR`` specify the analysis period. - ``model`` and ``experiment`` are recorded if given, but not currently used. -When the framework is run, it determines if the variables each POD analyzes are present in the experiment data. Currently, the framework doesn't have the ability to transform data (e.g.,, to average daily data to monthly frequency), so the match between your model data and each POD's requirements will need to be exact in order for the POD to run (see `Diagnostics reference `__ for variables required by each POD). If the framework can't find data requested by a POD, an error message will be logged in place of that POD's output that should help you diagnose the problem. - +When the framework is run, it determines whether the data each POD needs to run is present in the model data being provided. Specifically, the model must provide all variables needed by a POD at the required frequency. Consult the :doc:`documentation ` for a POD to determine the data it requires. +If the framework can't find data requested by a POD, an error message will be logged in place of that POD's output that should help you diagnose the problem. We hope to add the ability to transform data (eg, to average daily data to monthly frequency) in order to simplify this process. Other framework settings ------------------------ -The paths to input and output data described in :ref:`ref-configure` only need to be modified if the corresponding data is moved (or if you'd like to send output to a new location). Note that the framework doesn't retain default values for paths, so if you run it without an input file, all required paths will need to be given explicitly on the command line. +The paths to input and output data (described in :ref:`ref-configure`) only need to be modified if the corresponding data is moved, or if you'd like to send output to a new location. Note that the framework doesn't retain default values for paths, so if you don't specify a configuration file, all required paths will need to be given explicitly on the command line. Other relevant flags controlling the framework's output are: @@ -120,4 +118,4 @@ These can be set as command-line flags each time the framework is run (e.g.,. `` Modifying POD settings ---------------------- -Individual PODs may provide user-configurable options in their ``settings.jsonc`` file (under ``$CODE_ROOT/diagnostics/$POD_NAME/``), in the ``"pod_env_vars"`` section. These only need to be changed in rare or specific cases. Consult the POD's :doc:`documentation ` for details. +Individual PODs may provide user-configurable options in the ``"pod_env_vars"`` section of their ``settings.jsonc`` file, which is located in each POD's source code directory under ``/diagnostics``. These only need to be changed in rare or specific cases. Consult the POD's :doc:`documentation ` for details. diff --git a/doc/sphinx/start_install.rst b/doc/sphinx/start_install.rst index 6c1842fdd..83aff421e 100644 --- a/doc/sphinx/start_install.rst +++ b/doc/sphinx/start_install.rst @@ -1,29 +1,25 @@ Quickstart installation instructions ==================================== -This section provides basic directions for downloading, installing and running a test of the MDTF diagnostic framework package using sample model data. The current MDTF package has been tested on UNIX/LINUX, Mac OS, and Windows Subsystem for Linux. +This section provides instructions for downloading, installing and running a test of the MDTF framework using sample model data. The MDTF framework has been tested on UNIX/LINUX, Mac OS, and Windows Subsystem for Linux. -Throughout this document, ``%`` indicates the UNIX/LINUX command line prompt and is followed by commands to be executed in a terminal in ``fixed-width font``, and ``$`` indicates strings to be substituted, e.g., the string ``$CODE_ROOT`` below should be substituted by the actual path to the MDTF-diagnostics directory. While the package contains quite a few scripts, the most relevant for present purposes are: +Throughout this document, ``%`` indicates the command line prompt and is followed by commands to be executed in a terminal in ``fixed-width font``. ``$`` indicates strings to be substituted, e.g., the string ``$CODE_ROOT`` below should be replaced by the actual path to the ``MDTF-diagnostics`` directory. -- ``conda_env_setup.sh``: automated script for installing necessary Conda environments. -- ``default_tests.jsonc``: configuration file for running the framework. +**Summary of steps for installing the framework** -Summary of steps for running the package -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -You will need to download a) the source code, b) digested observational data, and c) two sets of sample model data (:numref:`ref-download`). Afterwards, we describe how to install necessary Conda environments and languages (:numref:`ref-install`) and run the framework on the default test case (:numref:`ref-configure` and :numref:`ref-execute`). +You will need to download the source code, digested observational data, and sample model data (:numref:`ref-download`). Afterwards, we describe how to install software dependencies using the `conda `__ package manager (:numref:`ref-install`, :numref:`ref-conda-env-install`) and run the framework on sample model data (:numref:`ref-configure` and :numref:`ref-execute`). .. _ref-download: -Download the package code and sample data for testing ------------------------------------------------------ +Download the framework code and supporting data +----------------------------------------------- Obtaining the code ^^^^^^^^^^^^^^^^^^ -The official repo for the MDTF code is hosted at the GFDL `GitHub account `__. We recommend that end users download and test the `latest official release `__. +The official repo for the MDTF code is hosted at the NOAA-GFDL `GitHub account `__. We recommend that end users download and test the `latest official release `__. -To install the MDTF package on a local machine, create a directory named ``mdtf``, and unzip the code downloaded from the `release page `__ there. This will create a directory titled ``MDTF-diagnostics-3.0-beta.1`` containing the files listed on the GitHub page. Below we refer to this MDTF-diagnostics directory as ``$CODE_ROOT``. It contains the following subdirectories: +To install the MDTF framework, create a directory named ``mdtf`` and unzip the code downloaded from the `release page `__ there. This will create a directory titled ``MDTF-diagnostics-3.0-beta.2`` containing the files listed on the GitHub page. Below we refer to this MDTF-diagnostics directory as ``$CODE_ROOT``. It contains the following subdirectories: - ``diagnostics/``: directory containing source code and documentation of individual PODs. - ``doc/``: directory containing documentation (a local mirror of the documentation site). @@ -32,19 +28,21 @@ To install the MDTF package on a local machine, create a directory named ``mdtf` For advanced users interested in keeping more up-to-date on project development and contributing feedback, the ``main`` branch contains features that haven’t yet been incorporated into an official release, which are less stable or thoroughly tested. -For POD developers, the ``develop`` branch is the “beta test” version of the framework. POD developers should begin work on this branch as described in :ref:`ref-dev-git`. +For POD developers, the ``develop`` branch is the “beta test” version of the framework. POD developers should begin by locally cloning the repo and checking out this branch, as described in :ref:`ref-dev-git-intro`. .. _ref-supporting-data: Obtaining supporting data ^^^^^^^^^^^^^^^^^^^^^^^^^ -Supporting observational data and sample model data are available via anonymous FTP at ftp://ftp.cgd.ucar.edu/archive/mdtf. The observational data is required for the PODs’ operation, while the sample model data is provided for default test/demonstration purposes. The files most relevant for package installation and default tests are: +Supporting observational data and sample model data are available via anonymous FTP at ftp://ftp.cgd.ucar.edu/archive/mdtf. The observational data is required for the PODs’ operation, while the sample model data is provided for default test/demonstration purposes. The required files are: - Digested observational data (159 Mb): `MDTF_v2.1.a.obs_data.tar `__. - NCAR-CESM-CAM sample data (12.3 Gb): `model.QBOi.EXP1.AMIP.001.tar `__. - NOAA-GFDL-CM4 sample data (4.8 Gb): `model.GFDL.CM4.c96L32.am4g10r8.tar `__. +Note that the above paths are symlinks to the most recent versions of the data and will be reported as zero bytes in an FTP client. + Download these three files and extract the contents in the following hierarchy under the ``mdtf`` directory: :: @@ -73,51 +71,48 @@ Download these three files and extract the contents in the following hierarchy u ├── (... supporting data for individual PODs ) -The default test case uses the QBOi.EXP1.AMIP.001 sample. The GFDL.CM4.c96L32.am4g10r8 sample is only for testing the MJO Propagation and Amplitude POD. +The default test case uses the QBOi.EXP1.AMIP.001 data. The GFDL.CM4.c96L32.am4g10r8 data is only for testing the MJO Propagation and Amplitude POD. -You can put the observational data and model output in different locations (e.g., for space reasons) by changing the values of ``OBS_DATA_ROOT`` and ``MODEL_DATA_ROOT`` as described below in :numref:`ref-configure`. +You can put the observational data and model output in different locations (e.g., for space reasons) by changing the values of ``OBS_DATA_ROOT`` and ``MODEL_DATA_ROOT`` as described in :numref:`ref-configure`. .. _ref-install: -Install the necessary programming languages and modules -------------------------------------------------------- - -*For users unfamiliar with Conda, :numref:`ref-conda-install` can be skipped if Conda has been installed, but :numref:`ref-conda-env-install` CANNOT be skipped regardless.* +Install the conda package manager, if needed +-------------------------------------------- -The MDTF framework code is written in Python 2.7, but supports running PODs written in a variety of scripting languages and combinations of libraries. We use `Conda `__, a free, open-source package manager to install and manage these dependencies. Conda is one component of the `Miniconda `__ and `Anaconda `__ python distribution, so having Miniconda/Anaconda is sufficient but not necessary. +The MDTF framework code is written in Python 3, but supports running PODs written in a variety of scripting languages and combinations of libraries. We use `conda `__, a free, open-source package manager, to install and manage these dependencies. Conda is one component of the `Miniconda `__ and `Anaconda `__ Python distributions, so having Miniconda or Anaconda is sufficient but not required. -For maximum portability and ease of installation, we recommend that all users manage dependencies through Conda using the provided script ``src/conda/conda_env_setup.sh``, even if they have independent installations of the required languages. A complete installation of all dependencies will take roughly 5 Gb, less if you've already installed some of the dependencies through Conda. The location of this installation can be changed with the ``$CONDA_ENV_DIR`` setting described below. +For maximum portability and ease of installation, we recommend that all users manage dependencies through conda, even if they have a pre-existing installations of the required languages. A complete installation of all dependencies requires roughly 5 Gb, and the location of this installation can be set with the ``$CONDA_ENV_DIR`` setting described below. Note that conda does not create duplicates of dependencies that are already installed (instead using hard links by default). If these space requirements are prohibitive, we provide an alternate method of operation which makes no use of conda and relies on the user to install external dependencies, at the expense of portability. This is documented in a :doc:`separate section `. -.. _ref-conda-install: - Conda installation ^^^^^^^^^^^^^^^^^^ -Here we are checking that the Conda command is available on your system. We recommend doing this via Miniconda or Anaconda installation. You can proceed directly to section 2.2 if Conda is already installed. + +Users with an existing conda installation should skip this section and proceed to :numref:`ref-conda-env-install`. - To determine if conda is installed, run ``% conda --version`` as the user who will be using the framework. The framework has been tested against versions of conda >= 4.7.5. -- If the command doesn't return anything, i.e., you do not have a pre-existing Conda on your system, we recommend using the Miniconda installer available `here `__. Any version of Miniconda/Anaconda (2 or 3) released after June 2019 will work. Installation instructions `here `__. + .. warning:: + Do not install a new copy of Miniconda/Anaconda if it's already installed for the user who will be running the framework: the installer will break the existing installation (if it's not managed with, e.g., environment modules.) The framework’s environments are designed to coexist with an existing Miniconda/Anaconda installation. -- Toward the end of the installation process, enter “yes” at “Do you wish the installer to initialize Miniconda2 by running conda init?” (or similar) prompt. This will allow the installer to add the Conda path to the user's shell login script (e.g., ``~/.bashrc`` or ``~/.cshrc``). +- If you do not have a pre-existing conda installation, we recommend installing Miniconda 3.x, available `here `__. This version is not required: any version of Miniconda/Anaconda (2 or 3) released after June 2019 will work equally well. -- Restart the terminal to reload the updated shell login script. + + Follow the `installation instructions `__ appropriate for your system. Toward the end of the installation process, enter “yes” at “Do you wish the installer to initialize Miniconda3 by running conda init?” (or similar) prompt. This will allow the installer to add the conda path to the user's shell startup script (e.g., ``~/.bashrc`` or ``~/.cshrc``). -- Mac OS users may encounter a benign Java warning pop-up: *To use the "java" command-line tool you need to install a JDK.* It's safe to ignore it. + + Restart the terminal to reload the updated shell startup script. -The framework’s environments will co-exist with an existing Miniconda/Anaconda installation. *Do not* reinstall Miniconda/Anaconda if it's already installed for the user who will be running the framework: the installer will break the existing installation (if it's not managed with, e.g., environment modules.) + + Mac OS users may encounter a message directing them to install the Java JDK. This can be ignored. -.. _ref-conda-env-install: -Framework-specific environment installation -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +.. _ref-conda-env-install: -Here we set up the necessary environments needed for running the framework and individual PODs via the provided script. These are sometimes referred to as "Conda environments" conventionally. +Install framework dependencies with conda +----------------------------------------- -After making sure that Conda is available, run ``% conda info --base`` as the user who will be using the framework to determine the location of your Conda installation. This path will be referred to as ``$CONDA_ROOT`` below. +As described above, all software dependencies for the framework and PODs are managed through conda environments. -- If this path points to ``/usr/`` or a subdirectory therein, we recomnend having a separate Miniconda/Anaconda installation of your own following :ref:`ref-conda-install`. +Run ``% conda info --base`` as the user who will be using the framework to determine the location of your conda installation. This path will be referred to as ``$CONDA_ROOT`` below. If you don't have write access to this location (eg, on a multi-user system), you'll need to tell conda to install files in a non-default location ``$CONDA_ENV_DIR``, as described below. Next, run :: @@ -125,55 +120,58 @@ Next, run % cd $CODE_ROOT % ./src/conda/conda_env_setup.sh --all --conda_root $CONDA_ROOT --env_dir $CONDA_ENV_DIR -to install all necessary environments (and create an executable; :ref:`ref-location-execute`), which takes ~10 min (depending on machine and internet connection). The names of all framework-created environments begin with “_MDTF”, so as not to conflict with any other environments. +to install all dependencies, which takes ~10 min (depending on machine and internet connection). The names of all framework-created environments begin with “_MDTF”, so as not to conflict with user-created environments in a preexisting conda installation. - Substitute the actual paths for ``$CODE_ROOT``, ``$CONDA_ROOT``, and ``$CONDA_ENV_DIR``. -- The ``--env_dir`` flag allows you to put the program files in a designated location ``$CONDA_ENV_DIR`` (for space reasons, or if you don’t have write access). You can omit this flag, and the environments will be installed within ``$CONDA_ROOT/envs/`` by default. +- The optional ``--env_dir`` flag directs conda to install framework dependencies in ``$CONDA_ENV_DIR`` (for space reasons, or if you don’t have write access). If this flag is omitted, the environments will be installed in ``$CONDA_ROOT/envs/`` by default. -- The ``--all`` flag makes the script install all environments prescribed by the YAML (.yml) files under ``src/conda/`` (one YAML for one environment). You can install the environments selectively by using the ``--env`` flag instead. For instance, ``% ./src/conda/conda_env_setup.sh --env base --conda_root $CONDA_ROOT --env_dir $CONDA_ENV_DIR`` will install the "_MDTF_base" environment prescribed by ``env_base.yml``, and so on. With ``--env``, the current script can install one environment at a time. Repeat the command for multiple environments. +- The ``--all`` flag makes the script install all dependencies for all PODs. To selectively update individual conda environments after installation, use the ``--env`` flag instead. For instance, ``% ./src/conda/conda_env_setup.sh --env base --conda_root $CONDA_ROOT --env_dir $CONDA_ENV_DIR`` will update the environment named "_MDTF_base" defined in ``src/conda/env_base.yml``, and so on. -- Note that _MDTF_base is mandatory for the framework's operation, and the other environments are optional, see :ref:`ref-interaction-conda-env`. +.. note:: + After installing the framework-specific conda environments, you shouldn't manually alter them (eg, never run ``conda update`` on them). To update the environments after updating the framework code, re-run the above commands. These environments can be uninstalled by simply deleting the "_MDTF" directories under ``$CONDA_ENV_DIR`` (or ``$CONDA_ROOT/envs/`` by default). -After installing the framework-specific Conda environments, you shouldn't manually alter them (i.e., never run ``conda update`` on them). To update the environments after updating the framework code, re-run the above commands. These environments can be uninstalled by simply deleting "_MDTF" directories under ``$CONDA_ENV_DIR`` (or ``$CONDA_ROOT/envs/`` for default setting). .. _ref-configure: -Configure package paths ------------------------ +Configure framework paths +------------------------- -``src/default_tests.jsonc`` is a template/example for configuration options that will be passed to the executable as an input. Open it in an editor (we recommend working on a copy). The following adjustments are necessary before running the framework: +The MDTF framework supports setting configuration options in a file as well as on the command line. An example of the configuration file format is provided at `src/default_tests.jsonc `__. We recommend configuring the following settings by editing a copy of this file. -- If you've saved the supporting data in the directory structure described in :ref:`ref-supporting-data`, the default values for ``OBS_DATA_ROOT`` and ``MODEL_DATA_ROOT`` pointing to ``mdtf/inputdata/obs_data/`` and ``mdtf/inputdata/model/`` will be correct. If you put the data in a different location, these values should be changed accordingly. +Relative paths in the configuration file will be interpreted relative to ``$CODE_ROOT``. The following settings need to be configured before running the framework: -- ``OUTPUT_DIR`` should be set to the location you want the output files to be written to (default: ``mdtf/wkdir/``; will be created by the framework). The output of each run of the framework will be saved in a different subdirectory in this location. +- If you've saved the supporting data in the directory structure described in :ref:`ref-supporting-data`, the default values for ``OBS_DATA_ROOT`` and ``MODEL_DATA_ROOT`` given in ``src/default_tests.jsonc`` (``../inputdata/obs_data`` and ``../inputdata/model``, respectively) will be correct. If you put the data in a different location, these paths should be changed accordingly. -- ``conda_root`` should be set to the value of ``$CONDA_ROOT`` used above in :ref:`ref-conda-env-install`. +- ``OUTPUT_DIR`` should be set to the desired location for output files. The output of each run of the framework will be saved in a different subdirectory in this location. -- If you specified a custom environment location with ``$CONDA_ENV_DIR``, set ``conda_env_root`` to that value; otherwise, leave it blank. +- ``conda_root`` should be set to the value of ``$CONDA_ROOT`` used above in :ref:`ref-conda-env-install`. -We recommend using absolute paths in ``default_tests.jsonc``, but relative paths are also allowed and should be relative to ``$CODE_ROOT``. +- If you specified a non-default conda environment location with ``$CONDA_ENV_DIR``, set ``conda_env_root`` to that value; otherwise, leave it blank. .. _ref-execute: -Run the MDTF package with default test settings ------------------------------------------------ - -.. _ref-location-execute: +Run the MDTF framework on sample data +------------------------------------- Location of the MDTF executable ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The setup script (:ref:`ref-conda-env-install`) will have created an executable at ``$CODE_ROOT/mdtf`` which sets the correct Conda environments before running the framework and individual PODs. To test the installation, ``% $CODE_ROOT/mdtf --help`` will print help text on the command-line options. Note that, if your current working directory is ``$CODE_ROOT``, you will need to run ``% ./mdtf --help``. +The MDTF framework is run via a wrapper script at ``$CODE_ROOT/mdtf``. -For interested users, the ``mdtf`` executable is also a script, which calls ``src/conda/conda_init.sh`` and ``src/mdtf.py``. +This is created by the conda environment setup script used in :numref:`ref-conda-env-install`. The wrapper script activates the framework's conda environment before calling the framework's code (and individual PODs). To verify that the framework and environments were installed successfully, run -.. _ref-framework-sample: +:: + +% cd $CODE_ROOT +% ./mdtf --version + +This should print the current version of the framework. Run the framework on sample data ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -If you've installed the Conda environments using the ``--all`` flag (:ref:`ref-conda-env-install`), you can now run the framework on the CESM sample model data: +If you've downloaded the NCAR-CESM-CAM sample data (described in :ref:`ref-supporting-data` above), you can now perform a trial run of the framework: :: @@ -182,34 +180,10 @@ If you've installed the Conda environments using the ``--all`` flag (:ref:`ref-c Run time may be 10-20 minutes, depending on your system. -- If you edited/renamed ``default_tests.jsonc``, pass that file instead. - -- The output files for this test case will be written to ``$OUTPUT_DIR/QBOi.EXP1.AMIP.001_1977_1981``. When the framework is finished, open ``$OUTPUT_DIR/QBOi.EXP1.AMIP.001_1977_1981/index.html`` in a web browser to view the output report. - -- The above command will execute PODs included in ``pod_list`` of ``default_tests.jsonc``. Skipping/adding certain PODs by uncommenting/commenting out the POD names (i.e., deleting/adding ``//``). Note that entries in the list must be separated by ``,`` properly. Check for missing or surplus ``,`` if you encounter an error (e.g., "ValueError: No closing quotation"). - -- Currently the framework only analyzes data from one model run at a time. To run the MJO_prop_amp POD on the GFDL.CM4.c96L32.am4g10r8 sample data, delete or comment out the section for QBOi.EXP1.AMIP.001 in "caselist" of ``default_tests.jsonc``, and uncomment the section for GFDL.CM4.c96L32.am4g10r8. - -.. _ref-interaction-conda-env: - -Framework interaction with Conda environments -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -As just described in :ref:`ref-framework-sample`, when you run the ``mdtf`` executable, among other things, it reads ``pod_list`` in ``default_tests.jsonc`` and executes POD codes accordingly. For a POD included in the list (referred to as $POD_NAME): - -1. The framework will first try to determine whether there is a Conda environment named ``_MDTF_$POD_NAME`` under ``$CONDA_ENV_DIR``. If yes, the framework will switch to this environment and run the POD. - -2. If not, the framework will then look into the POD's ``settings.jsonc`` file in ``$CODE_ROOT/diagnostics/$POD_NAME``. ``runtime_requirements`` in ``settings.jsonc`` specifies the programming language(s) adopted by the POD: - - a). If purely Python, the framework will switch to ``_MDTF_python3_base`` and run the POD (`_MDTF_python2_base` for ealier PODs developed in Python 2.7). - - b). If NCL is used, then ``_MDTF_NCL_base``. - -If you choose to selectively install Conda environments using the ``--env`` flag (:ref:`ref-conda-env-install`), remember to install all the environments needed for the PODs you're interested in, and that ``_MDTF_base`` is mandatory for the framework's operation. +- If you edited or renamed ``src/default_tests.jsonc``, as recommended in the previous section, pass the path to that configuration file instead. -- For instance, the minimal installation for running the ``EOF_500hPa`` and ``convective_transition_diag PODs`` requres ``_MDTF_base`` (mandatory), ``_MDTF_NCL_base`` (because of b), and ``_MDTF_convective_transition_diag`` (because of 1). These can be installed by passing ``base``, ``NCL_base``, and ``convective_transition_diag`` to the ``--env`` flag one at a time (:ref:`ref-conda-env-install`). +- The output files for this test case will be written to ``$OUTPUT_DIR/MDTF_QBOi.EXP1.AMIP.001_1977_1981``. When the framework is finished, open ``$OUTPUT_DIR/QBOi.EXP1.AMIP.001_1977_1981/index.html`` in a web browser to view the output report. -Next steps ----------- +- The framework defaults to running all available PODs, which is overridden by the ``pod_list`` option in the ``src/default_tests.jsonc`` configuration file. Individual PODs can be specified as a comma-delimited list of POD names. -Consult the :doc:`next section ` for how to run the framework on your own data and configure general settings. +- Currently the framework only analyzes data from one model run at a time. To run the MJO_prop_amp POD on the GFDL.CM4.c96L32.am4g10r8 sample data, delete or comment out the section for QBOi.EXP1.AMIP.001 in ``caselist`` section of the configuration file, and uncomment the section for GFDL.CM4.c96L32.am4g10r8. diff --git a/doc/sphinx/start_nonconda.rst b/doc/sphinx/start_nonconda.rst index cbf394cb5..b9ceaefdd 100644 --- a/doc/sphinx/start_nonconda.rst +++ b/doc/sphinx/start_nonconda.rst @@ -23,3 +23,15 @@ Configuring this mode of operation requires adding additional settings to the `` - Any values for ``conda_root`` and ``conda_env_root`` will be ignored. - The framework will use ``pip`` to install required python modules in new virtualenvs, which will be installed in the default location for your system's python. To put the files in a different location, create a new setting ``"venv_root": ``. - Likewise, to install packages needed by R in a location other than your system default, create a new setting ``"r_lib_root": ``. + +Known issues with standalone NCL installation +--------------------------------------- + +Many Linux distributions (Ubuntu, Mint, etc.) have offered a way of installing `NCL `__ through their system package manager (apt, yum, etc.) This method of installation is not recommended: users may encounter errors when running the example PODs provided by NCAR, even if the environment variables and search path have been added. + +The recommended method to install standalone NCL is by downloading the pre-compiled binaries from https://www.ncl.ucar.edu/Download/install_from_binary.shtml. Choose a download option according to the Linux distribution and hardware, unzip the file (results in 3 folders: ``bin``, ``include``, ``lib``), create a folder ncl under the directory ``/usr/local`` (requires permission) and move the 3 unzipped folders into ``/usr/local/ncl``. Then add the following lines to the ``.bashrc`` script (under the user’s home directory; may be different if using shells other than bash, e.g., ``.cshrc`` for csh): + +:: + + export NCARG_ROOT=/usr/local/ncl + export PATH:$NCARG_ROOT/bin:$PATH diff --git a/doc/sphinx/start_overview.rst b/doc/sphinx/start_overview.rst index 0493bc076..1496ac145 100644 --- a/doc/sphinx/start_overview.rst +++ b/doc/sphinx/start_overview.rst @@ -1,7 +1,7 @@ Overview ======== -Welcome! In this section we'll describe what the MDTF diagnostics framework is, how it works, and how you can contribute your own diagnostic scripts. +Welcome! In this section we'll describe what the Model Diagnostics Task Force (MDTF) framework is, how it works, and how you can contribute your own diagnostic scripts. Purpose ------- @@ -19,32 +19,30 @@ The design goal of the MDTF framework is to provide a portable and adaptable mea :align: center :width: 100 % -The MDTF Diagnostic Framework consists of multiple Process-Oriented Diagnostic (POD) modules, each of which is developed by an individual research group. For clarity, the framework is the structure provided by the Model Diagnostics Task Force, and the PODs (or modules) are developed by individual groups (or developers). PODs are developed and run independently of each other. Each POD takes as input (1) requested variables from the model run, along with (2) any required observational or supporting data, performs an analysis, and produces (3) a set of figures which are presented to the user in a series of .html files. (We do not include or require a mechanism for publishing these webpages on the internet; html is merely used as a convenient way to present a multimedia report to the user.) +As shown in the figure above, the MDTF framework itself performs common data management and support tasks (gray boxes) before and after the individual POD scripts are run. The PODs (colored boxes) are developed by different research groups and run independently of one another. Each POD takes as input + +1. requested variables from the model run, along with +2. any required observational or supporting data, performs an analysis, and produces +3. a set of figures which are presented to the user in a series of .html files. + +We do not include or require a mechanism for publishing these webpages on the internet; html is merely used as a convenient way to present a multimedia report to the user. Getting started for users ------------------------- The rest of the documentation in this section describes next steps for end users of the framework: -- We provide instructions on how to :doc:`download and install ` the framework and run it on some sample data. -- We describe the most common :doc:`configuration options ` for running the framework on your own model data. -- See also the list of :doc:`command-line options `. -- Known :doc:`troubleshooting issues `; also see the GitHub `issue tracker `__. +- We provide instructions on how to :doc:`download and install ` the framework and run it on sample model data. +- We describe the most common :doc:`configuration options ` for running the framework on your own model data. Also see the full list of :doc:`command-line options `. +- If you encounter a bug, check the GitHub `issue tracker `__. + +Getting started for POD developers +---------------------------------- -Getting started for developers ------------------------------- +Information for researchers wishing to contribute a POD to the framework is provided in the :doc:`Developer ` section; consult the :doc:`quickstart guide ` for an overview and the :doc:`checklist ` of items needed for submitting your POD. -As summarized in the figure above, the changes needed to convert an existing analysis script for use in the framework are: +The framework is designed to require minimal changes to existing analysis scripts. We recommend that developers of new PODs start independently of the framework and adapt it for the framework's use once it's fully debugged. As summarized in the figure above, the changes needed to convert an existing analysis script for use in the framework are: - Provide a settings file which tells the framework what it needs to do: what languages and libraries your code need to run, and what model data your code takes as input. - Adapt your code to load data files from locations set in unix shell environment variables (we use this as a language-independent way for the framework to communicate information to the POD). - Provide a template web page which links to, and briefly describes, the plots generated by the script. - -Each of these are described in more detail in the developer-specific sections: - -- We provide instructions on :doc:`working with git ` for people who haven't used it before. -- :doc:`Instructions ` and framework policies to keep in mind when developing your POD. -- Description of the :doc:`settings file ` needed by the framework to process your POD's requirements. -- A more detailed :doc:`walkthrough ` that elaborates on the flowchart above and describes the steps taken by the framework in order to run your POD. -- A :doc:`checklist ` of items needed for submitting your POD for inclusion in the framework. -- A collection of :doc:`links ` to relevant tutorials and resources. \ No newline at end of file diff --git a/doc/sphinx/start_toc.rst b/doc/sphinx/start_toc.rst index 4de38403f..c4b29956e 100644 --- a/doc/sphinx/start_toc.rst +++ b/doc/sphinx/start_toc.rst @@ -8,4 +8,3 @@ Getting started start_overview start_install start_config - start_troubleshoot diff --git a/doc/sphinx/start_troubleshoot.rst b/doc/sphinx/start_troubleshoot.rst deleted file mode 100644 index 63055ce7f..000000000 --- a/doc/sphinx/start_troubleshoot.rst +++ /dev/null @@ -1,33 +0,0 @@ -Troubleshooting -=============== - -Here we provide a short list of problems the MDTF team had previously encountered. - -The error message "convert: not authorized ..." shows up --------------------------------------------------------- - -The MDTF package generates figures in the PostScript (PS) format, and then uses the ``convert`` command (from the `ImageMagick `__ software suite) to convert the PS files to PNG files. The convert error can occur after recent updates and can be solved as follows (requires permission): - -In the file ``/etc/ImageMagick/policy.xml``, change the ```` to -````. - -The folder name ``ImageMagick`` may depend on its version, e.g., ``ImageMagick-6``. - -Issues with standalone NCL installation ---------------------------------------- - -Many Linux distributions (Ubuntu, Mint, etc.) have offered a way of installing `NCL `__ through their system package manager (apt, yum, etc.) This method of installation is not recommended: users may encounter errors when running the example PODs provided by NCAR, even if the environment variables and search path have been added. - -The recommended method to install standalone NCL is by downloading the pre-compiled binaries from https://www.ncl.ucar.edu/Download/install_from_binary.shtml. Choose a download option according to the Linux distribution and hardware, unzip the file (results in 3 folders: ``bin``, ``include``, ``lib``), create a folder ncl under the directory ``/usr/local`` (requires permission) and move the 3 unzipped folders into ``/usr/local/ncl``. Then add the following lines to the ``.bashrc`` script (under the user’s home directory; may be different if using shells other than bash, e.g., ``.cshrc`` for csh): - -:: - - export NCARG_ROOT=/usr/local/ncl - export PATH:$NCARG_ROOT/bin:$PATH - -Issues with the convective transition POD ------------------------------------------ - -The plotting scripts of this POD may not produce the desired figures with the latest version of matplotlib (because of the default size adjustment settings). The matplotlib version comes with the Anaconda 2 installer, version 5.0.1 has been tested. The readers can switch to this older version. - -Depending on the platform and Linux distribution/version, a related error may occur with the error message "... ImportError: libcrypto.so.1.0.0: cannot open shared object file: No such file or directory". One can find the missing object file ``libcrypto.so.1.0.0`` in the subdirectory ``~/anaconda2/pkgs/openssl-1.0.2l-h077ae2c_5/lib/``, where ``~/anaconda2/`` is where Anaconda 2 is installed. The precise names of the object file and openssl-folder may vary. Manually copying the object file to ``~/anaconda2/lib/`` should solve the error.