-
Notifications
You must be signed in to change notification settings - Fork 1
Investigation: How best to use MIP Convert in CMEW
##Note
- This has been tested using a list of 10 variables required for the radiation budget recipe.
How much information is required by MIP Convert to process data from u-bg466? What does the MIP Convert user configuration file require now?
- To run MIP Convert with raw suite output, the whole CDDS package has to be run.
- MIP convert can be run on its own, however it need to have the .pp files already in the input directory and a configuration file setup. Overall CDDS takes less configuration and will generate a config file for MIP Convert according to the input specifications.
- CDDS requires a request.json file and a text file input with the required variables (we need it but not technically necessary to run the package).
- The text file contains a list of the variables we want to convert.
- The .json is generated on the command line, and has default variables that can be overwritten, for example
--end_date 1860-01-01.
- A directory structure has to be created in the working directory
- The package is then run with a command line instruction
How long does it take MIP Convert to process 10, 50 and 100 years of data from u-bg466? (/usr/bin/time might be useful here!)
- Couldn't use usr/bin/time/ since its a cylc workflow
- 10 years took approximately 3 minutes.
- 50 years takes approximately 56 minutes, 53 of which were spent downloading data from MASS, everything else took 3 minutes.
- There is a script written to allow the CDDS package to run with data already downloaded from MASS
path_reformatter, although it is not well documented and further discussion with Piotr from the CDDS team would be required to implement it (if needed).
- There is a script written to allow the CDDS package to run with data already downloaded from MASS
- 100 years took 2 hours and 11 minutes.
- Running only MIP Convert on 50 years of data took approximately 4 minutes.
- Cylc review shows that the extract task took approximately 53 minutes, and all remaining task then take a total of approximately 3 minutes for 50 years.
- Given that I have been running this from the command line, I feel like it would be best run from a bash script with variables passed in (suite, different start and end dates, variable text file).
-
rose suite-run -- --no-detachwill allow a cylc workflow to run and then the bigger cylc workflow it is called from will know whether it is running, failed, succeeded etc. - Further discussion with the MISS team would be required before implementing this.
- The individual components can be run on the command line. Running the instructions for the whole CDDS package with a few modifications to the directory means that the output from the extract task can be used by MIP_Convert.
- Run the instruction for the CDDS package as written below, excluding the last step.
- Run
cdds_extract request.json --root_proc_dir /your/decided/process/dir --root_data_dir /your/output/for/extract--stream ap5 --skip_extract_validation - Then configure MIP Convert by downloading and editing the default configuration file on trac.
- At a minimum the stream, variables required, suite id and start and end date should be modified as well as the input and output directories.
- The files must be copied into a directory labelled
input/u-bg466/ap5- you can't just configure the extract task to stick them there as it will automatically produce a different directory structure. - Then run
mip_convert edited_config_file.cfg
- CDDS Package
- Pros
- Fully supported
- All metadata produced by MIP_Convert will be correct as it will have been generated by the CDDS package.
- No configuration of MIP Convert is needed as it is done by CDDS.
- All commands to configure and run the suite are given on the command line.
- Cons
- Automatically downloads the data from MASS, and while there is a script to change this, it is currently undocumented.
- Currently a Cylc 7 suite, which might be a bit difficult to run from a Cylc 8 suite.
- Pros
- MIP_Convert
- Pros
- Not a Cylc suite so can be run from the command line and easier to implement in a Cylc 8 suite.
- Data only has to be downloaded once for it to work.
- Cons
- For the way we are intending to use it, it is not supported i.e. CDDS package is recommended to use MIP Convert with suite runs.
- The is a chance for metadata to be wrong if a mistake is made in the config file for MIP Convert.
- The config file has to be configured manually (or generated through Cylc or something) can't be done automatically from the command line.
- The files from the extract task have to be copied into the input directory of MIP Convert as it doesn't like the directory layout produced by just the extract task.
- Pros
- The whole CDDS package has been licensed under BSD v3, so could be shared however it would need to be looked over and approved by CDDS team first.
- The CDDS package is currently being upgraded to Cylc 8.
Create a working directory
mkdir cdds_bg466_processing
cd cdds_bg466_processing
mkdir proc data
Activate CDDS
source ~cdds/bin/setup_env_for_cdds 2.4.1
Create request.json (can specify start and end dates here. See write_rose_suite_request_json --help) Will default to what is in the suite.
write_rose_suite_request_json u-bg466 cdds 120306 round-1 ap5
Create directory structure locally using -c and -t options
create_cdds_directory_structure request.json -c proc -t data
Create variables file. Can add more variables if needed.
echo Amon/tas > variables.txt
Bypass the datarequest and inventory by using -r option to only u produce variables in the variables.txt we just made.
prepare_generate_variable_list request.json -p -c proc -t data -r variables.txt
Run cdds convert which essentially configures the u-ak283 suite https://code.metoffice.gov.uk/trac/roses-u/browser/a/k/2/8/3/trunk
cdds_convert request.json -c proc -t data --skip_transfer
The above instructions can be found in the original gist.