Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elaborate on DPS output folder usage #455

Open
wildintellect opened this issue Oct 29, 2024 · 4 comments
Open

Elaborate on DPS output folder usage #455

wildintellect opened this issue Oct 29, 2024 · 4 comments
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@wildintellect
Copy link
Collaborator

Turns out passing the output directory location to scripts is trickier than outlined in
https://docs.maap-project.org/en/latest/technical_tutorials/dps_tutorial/dps_tutorial_demo.html#Output-folder
Q: why do users need to make the output folder, if it's always needed as the correct location for outputs, should it exist already? @sujen1412

Specifically, this does not show how to pass the correct path to scripts, e.g. python and R.
Also is the case of R, R is very sensitive to working directories when importing other R files, so the script can't always be run from the default path DPS starts at, which means the output folder is actually up one level from where the script executes.

Here's an R example (We're going to need R examples on par with the python dps_tutorial)

#!/bin/bash
basedir=$( cd "$(dirname "$0")" ; pwd -P )

mkdir -p output
outdir=${PWD}/ouput

country=${1}

#Rscript global_process_part1_2024_MAAP_step123.R $country
cd ${basedir}
conda run --no-capture-output --name gedi_pa_env Rscript global_process_part3_2024_MAAP_step5.r $country outdir
@wildintellect wildintellect added documentation Improvements or additions to documentation enhancement New feature or request labels Oct 29, 2024
@chuckwondo
Copy link

chuckwondo commented Oct 29, 2024

@wildintellect, are you sure the code snippet is correct? I think there are at least 2 problems: (1) mkdir created an output dir before the outdir var is defined, and (2) the last line uses outdir which simply translates as the literal string "outdir", so it should change to $outdir (or ${outdir}).

I think this is what you want:

#!/bin/bash
basedir=$( cd "$(dirname "$0")" ; pwd -P )

outdir=${PWD}/output
mkdir -p "${outdir}"

country=${1}

#Rscript global_process_part1_2024_MAAP_step123.R $country
cd ${basedir}
conda run --no-capture-output --name gedi_pa_env Rscript global_process_part3_2024_MAAP_step5.r $country "${outdir}"

I've also surrounded ${outdir} in double quotes in both spots, just to avoid potential problems with space in the path.

@wildintellect
Copy link
Collaborator Author

(1) works either way, you're right that setting the var 1st is probably best practices
(2) I don't know how that typo crept in , I didn't give it to the user that way

So yes @chuckwondo 's example is technically cleaner

@chuckwondo
Copy link

chuckwondo commented Oct 29, 2024

(1) works either way, you're right that setting the var 1st is probably best practices

It doesn't work when PWD is not the directory in which the script is sitting.

In other words, these 2 lines are not equivalent when PWD is not the directory containing the script containing these lines:

mkdir -p output
mkdir -p "${PWD}/output"

@chuckwondo
Copy link

In the context of DPS jobs, we should always use "${PWD}/output" and "${PWD}/input" as the locations of the output and input directories that the DPS system expects us to use.

There is also a typo in the original code in the first comment of this issue (ouput should be output), so here's what should be a fully corrected example, with the addition of an input dir variable, for reference:

#!/bin/bash
basedir=$( cd "$(dirname "$0")" ; pwd -P )

# If your `algorithm_config.yaml` file has any inputs configured as type `file` (rather
# than `positional`), the DPS will automatically download such inputs (which must be
# URLs) and place the downloaded files into the input directory.
indir=${PWD}/input

# All outputs must be written to the output directory. We must create the full path to
# the output directory (including parent directories using the `-p` flag).
outdir=${PWD}/output
mkdir -p "${outdir}"

# Collect positional inputs
country=${1}

#Rscript global_process_part1_2024_MAAP_step123.R ${country}
cd "${basedir}"
conda run --no-capture-output --name gedi_pa_env Rscript global_process_part3_2024_MAAP_step5.r ${country} "${outdir}"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants