-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Hello,
we experienced a weird bug with sshfs if used as fsspec by dbt. Everything works fine until our main.py terminates or calls sys.exit(). The python runtime never finishes and we see 2 stopped python threads: the main thread (since python runtime is shutting down) and fsspecIO thread (since only daemon threads were left).
The underlying is issue is two-folded:
- the python GC cleans up the SSHFileSystem object and calls its _finalize() method, this method eventually uses the syncwrapper, which queues it into the fsspecIO thread via the eventloop and waits for it to finish, stalling the python runtime
- the fsspecioIO thread is daemonized, meaning it will be stopped at some point in the python runtime cleanup. I didn't check if the fsspecIO thread gets stopped before/after the GC or maybe it's even random.
We saw this issue pop-up in python3.12.12. It happens often, but not always! Sometimes everything stops fine.
Python3.10.12 seems to work always, meaning this race-condition doesn't appear to happen.
I believe this is why https://github.com/fsspec/sshfs/actions/runs/18882297037/job/53888458798 failed after stalling for 6h.
The code location:
https://github.com/fsspec/sshfs/blob/main/sshfs/spec.py#L104
Note: Our bug also happend before this PR #62 , meaning the PR #62 did'nt cause it. I believe this PR just made it more likely to happen somehow.
https://github.com/fsspec/filesystem_spec/blob/master/fsspec/asyn.py#L135
This daemon thread is supposed to process _finalize() and close() calls, which happen in the garbage collection phase after sys.exit()
Tbh I'm not sure how to fix it, maybe check inside _finalize() and close() if the fsspecIO thread is already stopped and then stop doing any syncwrapped calls?
