-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can not run OpenMM simulations with the plugin on CUDA platform #12
Comments
Perhaps your GPU is set to exclusive mode? If so, it will only allow one context to be created on it at a time. You can check and set the compute mode with |
Good point, I will try to change the compute mode when I get the |
The compute mode has been changed to "Default", now running simulations with this plugin on CPU and OpenCL platforms are OK, but another error occurs if I run simulation with the plugin on CUDA platform. The output from stderr reads:
The last line appears during If the platform is CPU or OpenCL, then the last line won't appear. Instead, there will be these two lines:
Running normal simulations on the CUDA platform does not lead to any problem. So I guess maybe there's still something wrong. |
I haven't seen that one before. I'm trying to puzzle out what it means. Here's my best guess. The error description it provides is "incompatible driver context". I assume that means a CUDA function returned the error code
CUDA has two different APIs: the Driver API and the Runtime API. OpenMM uses the Driver API. I gather that TensorFlow must use the RuntimeAPI. The two APIs can interoperate as described at https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__DRIVER.html#group__CUDART__DRIVER. According to that documentation, if you've already created a context with the Driver API (which OpenMM will have done), then use the Runtime API, it will automatically use that existing context. But apparently it's finding that context is incompatible in some way. The error description above lists three reasons the context might be incompatible. I suspect the first one is probably the case: "the Driver context was created using an older version of the API." That is (perhaps), OpenMM and TensorFlow were compiled against different versions of CUDA. We provide versions of OpenMM that were compiled against various versions of CUDA. Which one are you using? And what version of CUDA do you actually have installed on your computer? |
Thanks for replying! I built the OpenMM 7.3.1, this plugin and TensorFlow 1.14 all with CUDA version 9.0.176 and gcc 4.9.2. Also tried to build and run the plugin with the official version of OpenMM 7.3.1 from omnia/cuda90. Unfortunately the same error occurred. I can not test out building with official TF C API, as it is provided implicitly for CUDA 10.1, which is not compatible to the current setup of my workstation. I have to note that the plugin was built with CMake option |
I've successfully compiled and installed your plugin, and it works out fine when I select the
platform
asCPU
for simulation. For this kind of setup, the network force calculation is performed on GPU, while the main simulation routine is on CPU.The problem is that I would like to run the simulation also on GPU to accelerate the whole procedure. But I got an
Exception
at the line where theopenmm.app.Simulation
object is initialized, when thesystemI
contained network force andplatform
is 'CUDA'. Here is the traceback:Also tried with the 'OpenCL' platform of OpenMM. It failed with the same traceback information. From my perspective, it may be because the simulation and the plugin can not run on the same graphics card.
I did find any explicit message about the platform choices in the README. I would like to know, if this is something wrong with my compilation/simulation setups, or it is just a feature of the current plugin.
For your information, my workstation has 8 CPU cores and 1x GTX1080. CUDA version is 9.0. If I run my simulation on the 'CPU' platform,
nvidia-smi
shows that the GPU usage is around 15%.The text was updated successfully, but these errors were encountered: