-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Deal with PermissionError when setting up the working_dir #21690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
| try: | ||
| self._setup_working_directory(job=job) | ||
| except PermissionError: | ||
| log.warning("Could not setup the working directory") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That has to fail though, we cannot silently ignore this.
Note also that for the traceback you provided:
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: galaxy.jobs.handler ERROR 2025-12-19 15:30:18,228 [pN:handler_0,p:895201,tN:MainThread] Error while recovering job 171 during application startup.
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: Traceback (most recent call last):
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/jobs/handler.py", line 312, in __check_jobs_at_startup
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: self._check_job_at_startup(job)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/jobs/handler.py", line 359, in _check_job_at_startup
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: job_wrapper = self.__recover_job_wrapper(job)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/jobs/handler.py", line 365, in __recover_job_wrapper
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: job_wrapper = self.job_wrapper(job)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/jobs/handler.py", line 297, in job_wrapper
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: return JobWrapper(job, self, use_persisted_destination=use_persisted_destination)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/jobs/__init__.py", line 2776, in __init__
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: super().__init__(
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/jobs/__init__.py", line 1042, in __init__
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: self._setup_working_directory(job=job)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/jobs/__init__.py", line 1340, in _setup_working_directory
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: safe_makedirs(self.tool_working_directory)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/util/path/__init__.py", line 137, in safe_makedirs
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: makedirs(path)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: File "/usr/lib64/python3.9/os.py", line 225, in makedirs
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: mkdir(name, mode)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: PermissionError: [Errno 13] Permission denied: '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/171/working'
safe_makedirs first checks if the directory exists, so either the directory doesn't exist anymore in which case this is bad one way or the other, or the galaxy user can't see the directory, in which case I think all jobs should fail ?
I guess my question then is, can your galaxy user see the directory ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My galaxy user is usr_m_galaxy1 and currently do not belong to the group unige. When I run the job:
$ ls -alh /srv/beegfs/scratch/shares/galaxy/common/jobs/000/
total 1.5K
drwxr-xr-- 3 usr_m_galaxy1 hpc_users 1 Jan 30 08:36 .
drwxr-xr-- 3 usr_m_galaxy1 hpc_users 1 Jan 30 08:35 ..
drwxr-xr-- 12 delislel unige 21 Jan 30 08:36 188
$ ls -alh /srv/beegfs/scratch/shares/galaxy/common/jobs/000/188
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/galaxy_188.o': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/outputs': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/__instrument_core_container': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/working': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/tmp': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/__instrument_core_galaxy_memory_mb': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/188.jt_json': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/metadata': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/_outputs': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/.': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/..': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/galaxy_188.e': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/__instrument_core_epoch_start': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/__instrument_core_galaxy_slots': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/home': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/inputs': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/galaxy_188.sh': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/memory_statement.log': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/_working': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/_configs': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/tool_script.sh': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/configs': Permission denied
total 0
d????????? ? ? ? ? ? .
d????????? ? ? ? ? ? ..
-????????? ? ? ? ? ? 188.jt_json
d????????? ? ? ? ? ? _configs
d????????? ? ? ? ? ? configs
-????????? ? ? ? ? ? galaxy_188.e
-????????? ? ? ? ? ? galaxy_188.o
-????????? ? ? ? ? ? galaxy_188.sh
d????????? ? ? ? ? ? home
d????????? ? ? ? ? ? inputs
-????????? ? ? ? ? ? __instrument_core_container
-????????? ? ? ? ? ? __instrument_core_epoch_start
-????????? ? ? ? ? ? __instrument_core_galaxy_memory_mb
-????????? ? ? ? ? ? __instrument_core_galaxy_slots
-????????? ? ? ? ? ? memory_statement.log
d????????? ? ? ? ? ? metadata
d????????? ? ? ? ? ? _outputs
d????????? ? ? ? ? ? outputs
d????????? ? ? ? ? ? tmp
-????????? ? ? ? ? ? tool_script.sh
d????????? ? ? ? ? ? _working
d????????? ? ? ? ? ? workingAnd indeed os.path.exists is False...
I guess the best would be to create the job dir with drwx--x--x instead of drwxr-xr-- , no?
I guess this is controlled by the umask of gravity, I should put 022 to make sure the galaxyuser (other) always can check if the working directory exists, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or I could modify the script external_chown_script.py to use a group where they both belong to.
See #21496