Skip to content

Conversation

@lldelisle
Copy link
Contributor

See #21496

@github-actions github-actions bot added this to the 26.1 milestone Jan 29, 2026
try:
self._setup_working_directory(job=job)
except PermissionError:
log.warning("Could not setup the working directory")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That has to fail though, we cannot silently ignore this.
Note also that for the traceback you provided:

Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: galaxy.jobs.handler ERROR 2025-12-19 15:30:18,228 [pN:handler_0,p:895201,tN:MainThread] Error while recovering job 171 during application startup.
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: Traceback (most recent call last):
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]:   File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/jobs/handler.py", line 312, in __check_jobs_at_startup
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]:     self._check_job_at_startup(job)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]:   File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/jobs/handler.py", line 359, in _check_job_at_startup
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]:     job_wrapper = self.__recover_job_wrapper(job)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]:   File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/jobs/handler.py", line 365, in __recover_job_wrapper
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]:     job_wrapper = self.job_wrapper(job)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]:   File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/jobs/handler.py", line 297, in job_wrapper
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]:     return JobWrapper(job, self, use_persisted_destination=use_persisted_destination)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]:   File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/jobs/__init__.py", line 2776, in __init__
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]:     super().__init__(
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]:   File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/jobs/__init__.py", line 1042, in __init__
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]:     self._setup_working_directory(job=job)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]:   File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/jobs/__init__.py", line 1340, in _setup_working_directory
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]:     safe_makedirs(self.tool_working_directory)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]:   File "/home/shares/galaxy/common/galaxy_root/server/lib/galaxy/util/path/__init__.py", line 137, in safe_makedirs
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]:     makedirs(path)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]:   File "/usr/lib64/python3.9/os.py", line 225, in makedirs
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]:     mkdir(name, mode)
Dec 19 15:30:18 app11.bamboo galaxyctl[895201]: PermissionError: [Errno 13] Permission denied: '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/171/working'

safe_makedirs first checks if the directory exists, so either the directory doesn't exist anymore in which case this is bad one way or the other, or the galaxy user can't see the directory, in which case I think all jobs should fail ?

I guess my question then is, can your galaxy user see the directory ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My galaxy user is usr_m_galaxy1 and currently do not belong to the group unige. When I run the job:

$ ls -alh /srv/beegfs/scratch/shares/galaxy/common/jobs/000/
total 1.5K
drwxr-xr--  3 usr_m_galaxy1 hpc_users  1 Jan 30 08:36 .
drwxr-xr--  3 usr_m_galaxy1 hpc_users  1 Jan 30 08:35 ..
drwxr-xr-- 12 delislel      unige     21 Jan 30 08:36 188
$ ls -alh /srv/beegfs/scratch/shares/galaxy/common/jobs/000/188
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/galaxy_188.o': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/outputs': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/__instrument_core_container': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/working': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/tmp': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/__instrument_core_galaxy_memory_mb': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/188.jt_json': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/metadata': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/_outputs': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/.': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/..': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/galaxy_188.e': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/__instrument_core_epoch_start': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/__instrument_core_galaxy_slots': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/home': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/inputs': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/galaxy_188.sh': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/memory_statement.log': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/_working': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/_configs': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/tool_script.sh': Permission denied
ls: cannot access '/srv/beegfs/scratch/shares/galaxy/common/jobs/000/188/configs': Permission denied
total 0
d????????? ? ? ? ?            ? .
d????????? ? ? ? ?            ? ..
-????????? ? ? ? ?            ? 188.jt_json
d????????? ? ? ? ?            ? _configs
d????????? ? ? ? ?            ? configs
-????????? ? ? ? ?            ? galaxy_188.e
-????????? ? ? ? ?            ? galaxy_188.o
-????????? ? ? ? ?            ? galaxy_188.sh
d????????? ? ? ? ?            ? home
d????????? ? ? ? ?            ? inputs
-????????? ? ? ? ?            ? __instrument_core_container
-????????? ? ? ? ?            ? __instrument_core_epoch_start
-????????? ? ? ? ?            ? __instrument_core_galaxy_memory_mb
-????????? ? ? ? ?            ? __instrument_core_galaxy_slots
-????????? ? ? ? ?            ? memory_statement.log
d????????? ? ? ? ?            ? metadata
d????????? ? ? ? ?            ? _outputs
d????????? ? ? ? ?            ? outputs
d????????? ? ? ? ?            ? tmp
-????????? ? ? ? ?            ? tool_script.sh
d????????? ? ? ? ?            ? _working
d????????? ? ? ? ?            ? working

And indeed os.path.exists is False...
I guess the best would be to create the job dir with drwx--x--x instead of drwxr-xr-- , no?
I guess this is controlled by the umask of gravity, I should put 022 to make sure the galaxyuser (other) always can check if the working directory exists, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or I could modify the script external_chown_script.py to use a group where they both belong to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants