Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

env parameter in DDPJobDefinition doesn't pass env variables to Ray #408

Open
sutaakar opened this issue Nov 21, 2023 · 3 comments
Open
Assignees

Comments

@sutaakar
Copy link
Contributor

Describe the Bug

I want to submit Ray job with environment variables specified, however provided environment variables aren't passed into the Ray.

SDK doc specifies that DDPJobDefinition contains property env. I tried to pass there environment variables:

jobdef = DDPJobDefinition(
    name="mnisttest",
    script="mnist.py",
    scheduler_args={"requirements": "requirements.txt"},
    env={"PIP_INDEX_URL": "http://some-hostname/root/pypi/+simple/",
         "PIP_TRUSTED_HOST": "some-hostname"}
)
job = jobdef.submit(cluster)

However submitted job didn't contain passed environment variables.

Is this a correct way of passing environment variables using SDK?

Codeflare Stack Component Versions

Please specify the component versions in which you have encountered this bug.

Codeflare SDK: 0.12.1
Ray image: quay.io/project-codeflare/ray:latest-py39-cu118

Steps to Reproduce the Bug

  1. Start ODH with default science notebook,
  2. import SDK Git repo into the Notebook
  3. Open 2_basic_jobs.ipynb
  4. Add env entry into the job definition:
jobdef = DDPJobDefinition(
    name="mnisttest",
    script="mnist.py",
    # script="mnist_disconnected.py", # training script for disconnected environment
    scheduler_args={"requirements": "requirements.txt"},
    env={"PIP_INDEX_URL": "http://some-hostname/root/pypi/+simple/",
         "PIP_TRUSTED_HOST": "some-hostname"}
)
job = jobdef.submit(cluster)
  1. Run the notebook until you submit the job
  2. Query Ray REST API to get submitted job definition, i.e. curl -X GET -i 'http://<dashboard_hostname>/api/jobs/'
  3. Check response - env variables are missing in submitted job

What Have You Already Tried to Debug the Issue?

N/A

Expected Behavior

Submitted job contains environment variables, for example:

{
  "type": "SUBMISSION",
  "job_id": null,
  "submission_id": "raysubmit_qtYVHfiyC7VhAPN7",
  "driver_info": null,
  "status": "FAILED",
  "entrypoint": "python /home/ray/jobs/mnist.py",
  "message": "Job entrypoint command failed with exit code 2, last available logs (truncated to 20,000 chars):\npython: can't open file '/home/ray/jobs/mnist.py': [Errno 2] No such file or directory\n",
  "error_type": null,
  "start_time": 1700576474095,
  "end_time": 1700576476706,
  "metadata": null,
  "runtime_env": {
    "pip": {
      "packages": ["pytorch_lightning==1.5.10", "ray_lightning", "torchmetrics==0.9.1", "torchvision==0.12.0"],
      "pip_check": false
    },
    "env_vars": {
      "PIP_INDEX_URL": "http://some-hostname/root/pypi/+simple/",
      "PIP_TRUSTED_HOST": "some-hostname"
    }
  },
  "driver_agent_http_address": "http://10.129.3.14:52365",
  "driver_node_id": "c3af4445c3cabfdc2291fb2fd6393da5850717eb3fd2aaeda3abe5f8"
}

Screenshots, Console Output, Logs, etc.

Affected Releases

SDK 0.12.1

Additional Context

Add as applicable and when known:

  • OS: 1) MacOS, 2) Linux, 3) Windows: [1 - 3]
  • OS Version: [e.g. RedHat Linux X.Y.Z, MacOS Monterey, ...]
  • Browser (UI issues): 1) Chrome, 2) Safari, 3) Firefox, 4) Other (describe): [1 - 4 + description?]
  • Browser Version (UI issues): [e.g. Firefix 97.0]
  • Cloud: 1) AWS, 2) IBM Cloud, 3) Other (describe), or 4) on-premise: [1 - 4 + description?]
  • Kubernetes: 1) OpenShift, 2) Other K8s [1 - 2 + description]
  • OpenShift or K8s version: [e.g. 1.23.1]
  • Other relevant info

Add any other information you think might be useful here.

@KPostOffice
Copy link
Collaborator

KPostOffice commented Nov 21, 2023

That env is passed directly to the ddp function in torchx.components. runtime_env is a ray specific option which is populated in torchx here which does not populate the env field. Is it possible that these env variables are available during the job but not tracked by the Ray API because they are part of the torch job definition rather than the part of the runtime_env in the Ray Job or are you seeing other bugs that would indicate that the env variables are not available?

@sutaakar
Copy link
Contributor Author

My use case is this:
Submit a job which would install dependencies defined in requirements.txt using pip (and then run mnist.py script). Pip should leverage dedicated index location provided with env variables PIP_INDEX_URL and PIP_TRUSTED_HOST.

Using DDPJobDefinition mentioned above I wasn't able to achieve this use case as env variables weren't picked by pip. Pip used default index location.

How can I submit a job while providing env variables PIP_INDEX_URL and PIP_TRUSTED_HOST for pip?

@KPostOffice
Copy link
Collaborator

KPostOffice commented Nov 22, 2023

This might be a bug in torchx. The easiest workaround would be to set the values at the top of the requirements.txt file:

--trusted-host doubly.so
--index-url https://doubly.so/pub/py/simple
<packageA>
<packageB>
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging a pull request may close this issue.

3 participants