-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU DPS build #5
Comments
@nemo794 can you ask the ISCE3 team if it safely falls back on CPU if no GPU is available so we only have to worry about building 1 image/algorithm? |
Line 6 in abac269
|
Looks like this is handled the correct deps are included:
Now I just need to write a test DPS algorithm that checks the GPU status based on this private ticket https://github.com/NASA-IMPACT/veda-analytics/issues/130 which uses Tensorflow to test since the ISCE3 test is not good. https://github.com/isce-framework/isce3/blob/release-v0.23/tests/python/packages/isce3/core/gpu_check.py @chuckwondo if you have ideas of another way to test let me know. |
@wildintellect, in VEDA, I launched a TF instance, created a custom conda env with only isce3 installed w/cuda (
Doing the same in a non-NVIDIA instance, produced this error, as expected:
In the MAAP ADE, I followed the same steps and also produced the RuntimeError. I think in DPS, an algorithm run bash script with the following line should suffice for determining the availability of a GPU: # Adjust conda env name, if necessary
conda run -n isce3 python -c "import isce3; print(isce3.cuda.core.get_device_count())" If the job fails, then no GPU is available. If is succeeds, you have at least 1 GPU available. |
@chuckwondo Since this is an ISCE3 specific way, should it go in this repo as alternate algorithm? |
We could make the script succeed no matter what, if we want to avoid failing the job. All we need to do is append
That should ensure that we always get an exit code of 0, but if there's no GPU, we'll see the error message captured in the Of course, there are alternatives, including your suggestion for exposing a boolean input. We could even do as you said and wrap things in a try/except block and avoid generating a traceback altogether, and simply output Up to you, really. What are you wanting to achieve more generally? Do you want users to be able to use this as a means of checking that a particular queue does or does not launch GPU instances?
This is isce3-specific only because that's how this all began, but if we want to not use isce3, then perhaps we find a more general library for testing for the availability of a GPU. Either way, I'd lean towards not adding this as an alternate algorithm in this repo, particularly if you want to be able to use this more generally as a GPU test, especially since the logic is not tied to this repo in any way, other than that it currently uses isce3 to perform the GPU test, but that's not a necessity. |
Building GPU enabled image on the regular MAAP build infrastucture (CPU) should be possible.
We might need to insert some env variables to ensure conda solves to the GPU versions of packages.
https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-virtual.html#overriding-detected-packages
CONDA_OVERRIDE_CUDA=12
ENV variableor potentially specify the cuda version of a conda package with
packagename=*=*cuda*
noting isce3 could be one of the following, which might avoid the need to set the ENV variable above
Note maap-py is only in the dev yml right now, does that need to be moved to the regular one @nemo794 ?
The text was updated successfully, but these errors were encountered: