- 
                Notifications
    You must be signed in to change notification settings 
- Fork 247
Description
I have a github repo where I define some models and a hubconf.py file for access via the torch.hub.load() API. The models work fine when there is no multi-processing (ie num_workers=0 for dataloader), but fail with an error about pickling and Module Not Found if two conditions are true: (1) num_workers>0 and (2) the module that the model object is defined in uses an absolute import for another module in my repo.
For example, with the structure:
my_repo/
    utils.py
    a/
        model_a.py # contains class ModelA
    ...
if model_a.py has from my_repo import utils, and we import ModelA and try to use >0 workers in a DataLoader, error looks like this:
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File ".../python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File ".../python3.10/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
ModuleNotFoundError: No module named 'my_repo'
This makes some sense, because we never installed my_repo as a package when loading ModelA - somewhere, it seems multiprocessing tries to recreate or reimport things and does not find my_repo.
Edit: I thought relative imports might be a workaround, but they don't fix the issue
Is there a solution to this? Thanks!