Skip to content

Investigation: How best to use MIP Convert in CMEW

mo-tgeddes edited this page Mar 23, 2023 · 17 revisions

Introduction

##Note

  • This has been tested using a list of 10 variables required for the radiation budget recipe.

How much information is required by MIP Convert to process data from u-bg466? What does the MIP Convert user configuration file require now?

  • To run MIP Convert with raw suite output, the whole CDDS package has to be run.
  • MIP convert can be run on its own, however it need to have the .pp files already in the input directory and a configuration file setup. Overall CDDS takes less configuration and will generate a config file for MIP Convert according to the input specifications.
  • CDDS requires a request.json file and a text file input with the required variables (we need it but not technically necessary to run the package).
    • The text file contains a list of the variables we want to convert.
    • The .json is generated on the command line, and has default variables that can be overwritten, for example --end_date 1860-01-01.
  • A directory structure has to be created in the working directory
  • The package is then run with a command line instruction

How long does it take MIP Convert to process 10, 50 and 100 years of data from u-bg466? (/usr/bin/time might be useful here!)

  • Couldn't use usr/bin/time/ since its a cylc workflow
  • 10 years took approximately 3 minutes.
  • 50 years takes approximately 56 minutes, 53 of which were spent downloading data from MASS, everything else took 3 minutes.
    • There is a script written to allow the CDDS package to run with data already downloaded from MASS path_reformatter, although it is not well documented and further discussion with Piotr from the CDDS team would be required to implement it (if needed).
  • 100 years took 2 hours and 11 minutes.

- Running only MIP Convert on 50 years of data took approximately 4 minutes.

The CDDS Rose suite breaks data down into chunks for MIP Convert; how long does this take?

  • Cylc review shows that the extract task took approximately 53 minutes, and all remaining task then take a total of approximately 3 minutes for 50 years.

Is it possible to run a Cylc 7 Rose suite from a Cylc 8 workflow?

  • Given that I have been running this from the command line, I feel like it would be best run from a bash script with variables passed in (suite, different start and end dates, variable text file).
  • rose suite-run -- --no-detach will allow a cylc workflow to run and then the bigger cylc workflow it is called from will know whether it is running, failed, succeeded etc.
  • Further discussion with the MISS team would be required before implementing this.

What how much of the CDDS package is needed to run MIP convert and can pieces be run on their own?

  • The individual components can be run on the command line. Running the instructions for the whole CDDS package with a few modifications to the directory means that the output from the extract task can be used by MIP_Convert.
  • Run the instruction for the CDDS package as written below, excluding the last step.
  • Run cdds_extract request.json --root_proc_dir /your/decided/process/dir --root_data_dir /your/output/for/extract--stream ap5 --skip_extract_validation
  • Then configure MIP Convert by downloading and editing the default configuration file on trac.
  • At a minimum the stream, variables required, suite id and start and end date should be modified as well as the input and output directories.
  • The files must be copied into a directory labelled input/u-bg466/ap5 - you can't just configure the extract task to stick them there as it will automatically produce a different directory structure.
  • Then run mip_convert edited_config_file.cfg

Pros and cons of using the CDDS package as a whole vs using the minimum and MIP Convert

  • CDDS Package
    • Pros
      • Fully supported
      • All metadata produced by MIP_Convert will be correct as it will have been generated by the CDDS package.
      • No configuration of MIP Convert is needed as it is done by CDDS.
      • All commands to configure and run the suite are given on the command line.
    • Cons
      • Automatically downloads the data from MASS, and while there is a script to change this, it is currently undocumented.
      • Currently a Cylc 7 suite, which might be a bit difficult to run from a Cylc 8 suite.
  • MIP_Convert
    • Pros
      • Not a Cylc suite so can be run from the command line and easier to implement in a Cylc 8 suite.
      • Data only has to be downloaded once for it to work.
    • Cons
      • For the way we are intending to use it, it is not supported i.e. CDDS package is recommended to use MIP Convert with suite runs.
      • The is a chance for metadata to be wrong if a mistake is made in the config file for MIP Convert.
      • The config file has to be configured manually (or generated through Cylc or something) can't be done automatically from the command line.
      • The files from the extract task have to be copied into the input directory of MIP Convert as it doesn't like the directory layout produced by just the extract task.

Can CDDS or MIP Convert be shared with outside parties? And will CDDS be updated to Cylc 8?

  • The whole CDDS package has been licensed under BSD v3, so could be shared however it would need to be looked over and approved by CDDS team first.
  • The CDDS package is currently being upgraded to Cylc 8.

Instructions for running CDDS


Create a working directory

mkdir cdds_bg466_processing
cd cdds_bg466_processing
mkdir proc data

Activate CDDS

source ~cdds/bin/setup_env_for_cdds 2.4.1

Create request.json (can specify start and end dates here. See write_rose_suite_request_json --help) Will default to what is in the suite.

write_rose_suite_request_json u-bg466 cdds 120306 round-1 ap5

Create directory structure locally using -c and -t options

create_cdds_directory_structure request.json -c proc -t data

Create variables file. Can add more variables if needed.

echo Amon/tas > variables.txt

Bypass the datarequest and inventory by using -r option to only u produce variables in the variables.txt we just made.

prepare_generate_variable_list request.json -p -c proc -t data -r variables.txt

Run cdds convert which essentially configures the u-ak283 suite https://code.metoffice.gov.uk/trac/roses-u/browser/a/k/2/8/3/trunk

cdds_convert request.json -c proc -t data --skip_transfer

The above instructions can be found in the original gist.


Clone this wiki locally