-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Jeff, you may not have succeeded in reproducing Josh's failure mode, but you are getting the same message as I am seeing when bespoke fails for me on HPC systems with busy networks. That is: [ERROR] failed to connect to the bespoke executor - please make sure one is running and your connection settings are correct. This error appeared suddenly for me as our IT organizaiton has been shutting down an old HPC cluster and migrating more users to the one I am using. I get failures of the same sort, coming from an inability to access the executor through the gateway interface from a node other than where the executor is running. It behaves as though the gateway has been shut down. The same bespoke application runs fine for me on less busy clusters. The way you get this error should give a clue to how the settings should be adjusted for more robust behavior on clusters with busy networks. Based on your understanding of the code, could you suggest settings I might try to see they fix my problem?
Originally posted by @BillSwope in #414 (comment)