-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
Hello,
I am trying to run these models to evaluate the results, however I am not able to do that due to errors at runtime.
The best "result" I could get is by with this Dockerfile (at the root of the project):
FROM nvidia/cuda:11.4.3-cudnn8-devel-ubuntu18.04
ARG DEBIAN_FRONTEND=noninteractive
ENV TZ=Etc/UTC
ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8
# Install system dependencies
RUN apt-get update && \
apt-get install -y \
git \
wget \
python3-pip \
python3-dev \
python3-opencv \
python3-six
RUN python3 -m pip install --upgrade pip
RUN pip3 install setuptools openmim
# Install PyTorch and torchvision
RUN pip3 install torch torchvision torchaudio -f https://download.pytorch.org/whl/cu111/torch_stable.html
RUN python3 -m pip install h5py albumentations tensorboardX gdown scipy
RUN python3 -m mim install mmcv
# Upgrade pip
WORKDIR /
RUN wget http://horatio.cs.nyu.edu/mit/silberman/nyu_depth_v2/nyu_depth_v2_labeled.mat -O nyu_depth_v2_labeled.mat
RUN git clone https://github.com/vinvino02/GLPDepth.git --depth 1
RUN mv GLPDepth/code/utils/logging.py GLPDepth/code/utils/glp_depth_logging.py
# Set the working directory
WORKDIR /app
RUN python3 ../GLPDepth/code/utils/extract_official_train_test_set_from_mat.py ../nyu_depth_v2_labeled.mat ../GLPDepth/datasets/splits.mat ./data/nyu_depth_v2/official_splits/
# RUN ln -s data ait/data
COPY requirements.txt requirements.txt
RUN python3 -m pip install -r requirements.txt
COPY . .
RUN rm -rf .git
Built the Dockerfile with:
sudo docker build -t mde . -f Dockerfile
And run with:
sudo docker run --name mde-test --gpus all --ipc=host -it --rm -v $(pwd):/app mde
Finally running the evaluation command. For example:
cd ait
python3 -m torch.distributed.launch --nproc_per_node=1 code/train.py configs/swinv2b_480reso_parallel_depthonly.py --cfg-options model.task_heads.depth.vae_cfg.pretrained=../models/vqvae_depth_2bp.pt --eval ../models/ait_depth_swinv2b_parallel.pth
In this way, the inference process is launched, eventually an anonymous error happen:
eval task depth
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 654/654, 2.5 task/s, elapsed: 262s, ETA: 0sERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 34) of binary: /usr/bin/python3
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 193, in <module>
main()
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/run.py", line 713, in run
)(*cmd_args)
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launcher/api.py", line 131, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launcher/api.py", line 261, in launch_agent
failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
===================================================
code/train.py FAILED
---------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
---------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-08-26_03:01:18
host : f50427e7ad50
rank : 0 (local_rank: 0)
exitcode : -9 (pid: 34)
error_file: <N/A>
traceback : Signal 9 (SIGKILL) received by PID 34
===================================================
Are the authors able to provide the versions of all the software they are using? In particular:
- Linux version and distribution
- CUDA version
- Python version
- Packages version (in the requirements, some versions are missing)
- Any other relevant information about
Thanks.
Metadata
Metadata
Assignees
Labels
No labels