Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA toolkit now provided via anaconda? #813

Open
jchodera opened this issue Sep 16, 2017 · 29 comments
Open

CUDA toolkit now provided via anaconda? #813

jchodera opened this issue Sep 16, 2017 · 29 comments

Comments

@jchodera
Copy link
Member

I just noticed that anaconda now appears to provide packages for different CUDA toolkit versions.

@jchodera
Copy link
Member Author

The numba version of cudatoolkit appears to be more up-to-date.

@peastman
Copy link
Contributor

Keep in mind that the toolkit isn't much use without the driver. You can't even compile, since libcuda.so comes with the driver, not the toolkit.

@jchodera
Copy link
Member Author

I was under the impression we could use multiple toolkit versions if a recent driver (capable of supporting the most recent toolkit version) was installed. The big complaint we've had from users is that we might have compiled/released OpenMM conda packages against a CUDA version that is not available on their systems.

Presuming we used a docker build environment with a recent CUDA driver, would it be possible to build several versions of the OpenMM conda package using these toolkit versions (7.5, 8.0, 9.0) and make packages for all three (with appropriate toolkit package dependencies) available that would run on a range of drivers? Presumably, only kernels that support the CUDA toolkit we used to compile would work, but I'm not sure if there is more precise matching of CUDA toolkit x driver x compiled code that needs to happen for this to work.

@peastman
Copy link
Contributor

I believe the requirement is that driver version >= toolkit version. So if someone had CUDA 9.0 (both driver and toolkit) installed, but we had compiled against CUDA 8.0, this would let it automatically install the older toolkit. But it wouldn't help the more common situation where they have an older CUDA version installed. For example, if they're running on a cluster that has CUDA 7.5, that wouldn't work because the 8.0 toolkit isn't compatible with the 7.5 driver.

That's separate from the question of providing several different OpenMM packages compiled against different CUDA versions. We could do that either way. But as far as I know, conda can't detect what driver they have installed, so the user would have to manually specify which version they wanted.

@emigmo
Copy link

emigmo commented Dec 16, 2017

how to install cuda toolkit in anaconda

@jchodera
Copy link
Member Author

@emigmo : According to this page, you could install the CUDA 8.0 toolkit with

conda install -c anaconda cudatoolkit==8.0

@Sharuk06
Copy link

try this it will detect your pc specs and install the required toolkits i guess!

conda install -c anaconda cudatoolkit

https://anaconda.org/anaconda/cudatoolkit

@fastlater
Copy link

@Sharuk06 @jchodera I installed the cuda and cudnn packages in my environment I get ImportError: Could not find 'cudart64.80.dll' . If I used conda and not the nvidia website, how do I set the required paths or where to find this dll missing file.

@jchodera
Copy link
Member Author

We don't officially support installing CUDA via conda, so this should be considered experimental. But I'd love to see if we could get this to work.

Can you do a "conda list" and paste all the packages you have installed here? And which platform are you on? Sounds like win?

@fastlater
Copy link

fastlater commented Feb 26, 2018

I am using win 7. I created this environment to run a tf code. I installed tf 1.4, cuda 8.0 and cudnn 6.0 using anaconda.
This list of installed packages is as below:

packages in environment at C:\ProgramData\Anaconda3\envs\tfenv:

Name Version Build Channel
bleach 1.5.0
bokeh 0.12.14 py35_0
ca-certificates 2017.08.26 h94faf87_0 anaconda
certifi 2018.1.18 py35_0 anaconda
click 6.7 py35h10df73f_0
cloudpickle 0.5.2 py35_1
cudatoolkit 8.0 3 anaconda
cudnn 6.0 0 anaconda

cycler 0.10.0 py35hcc71164_0
cython 0.27.3 py35h82876f0_0
dask 0.17.1 py35_0
dask-core 0.17.1 py35_0
decorator 4.2.1 py35_0
distributed 1.21.1 py35_0
enum34 1.1.6
freetype 2.8 vc14h17c9bdf_0 [vc14] anaconda
h5py 2.7.1 py35hb2c3add_0
hdf5 1.10.1 vc14hb361328_0 [vc14] anaconda
heapdict 1.0.0 py35_2
html5lib 0.9999999
icc_rt 2017.0.4 h97af966_0
icu 58.2 vc14hc45fdbb_0 [vc14] anaconda
imageio 2.2.0 py35hcd4b9a4_0
intel-openmp 2018.0.0 hd92c6cd_8
jinja2 2.10 py35hdf652bb_0
jpeg 9b vc14h4d7706e_1 [vc14] anaconda
libpng 1.6.32 vc14h5163883_3 [vc14] anaconda
libtiff 4.0.8 vc14h04e2a1e_10 [vc14] anaconda
locket 0.2.0 py35h0dfcdd0_1
Markdown 2.6.11
markupsafe 1.0 py35hc253e08_1
matplotlib 2.1.2 py35h016c42a_0
mkl 2018.0.1 h2108138_4
msgpack-python 0.5.5 py35he980bc4_0
networkx 2.1 py35_0
numpy 1.14.1
numpy 1.14.1 py35h4a99626_1
olefile 0.45.1 py35_0
openssl 1.0.2n h74b6da3_0 anaconda
packaging 16.8 py35h5fb721f_1
pandas 0.22.0 py35h6538335_0
partd 0.3.8 py35h894d1e4_0
pillow 4.2.1 py35hd7da350_0 anaconda
pip 9.0.1 py35h691316f_4
protobuf 3.5.1
psutil 5.4.3 py35hfa6e2cd_0
pycocotools 2.0
pyparsing 2.2.0 py35hcabcaab_1
pyqt 5.6.0 py35hd46907b_5
python 3.5.4 h1357f44_23
python-dateutil 2.6.1 py35h6b299a3_1
pytz 2018.3 py35_0
pywavelets 0.5.2 py35h7c47ace_0
pyyaml 3.12 py35h4bf9689_1
qt 5.6.2 vc14h6f8c307_12 [vc14] anaconda
scikit-image 0.13.1 py35hfa6e2cd_1
scipy 1.0.0 py35h75710e8_0
setuptools 38.5.1
setuptools 38.5.1 py35_0
sip 4.18.1 py35h01cbaa7_2
six 1.11.0 py35hc1da2df_1
six 1.11.0
sortedcontainers 1.5.9 py35_0
sqlite 3.20.1 vc14h7ce8c62_1 [vc14] anaconda
tblib 1.3.2 py35hd2cf7e1_0
tensorflow-gpu 1.4.0
tensorflow-tensorboard 0.4.0

tk 8.6.7 vc14hb68737d_1 [vc14] anaconda
toolz 0.9.0 py35_0
tornado 4.5.3 py35_0
vc 14 h0510ff6_3
vs2015_runtime 14.0.25123 3
Werkzeug 0.14.1
wheel 0.30.0 py35h38a90bc_1
wheel 0.30.0
wincertstore 0.2 py35hfebbdb8_0
yaml 0.1.7 vc14hb31d195_1 [vc14] anaconda
zict 0.1.3 py35hf5542e0_0
zlib 1.2.11 vc14h1cdd9ab_1 [vc14] anaconda

@fastlater
Copy link

fastlater commented Feb 26, 2018

The error is as shown below. I know that I need to set the paths but one of the reasons why I was trying to install cuda using anaconda was because in some way, I was getting error when I tried to install cuda 8.0 from the exe file downloaded from nvidia. Since cudnn64_6.dll is included with 'cudnn-8.0-windows7-x64-v6.0' , I guess this dll should be somewhere inside anaconda folder after I installed the package, right?

In case I keep the cuda and cudnn packages installed and install the cuda and cudnn using nvidia files in my computer, which one will be used for the environment when running the script?

(tfenv) C:\Program Files (x86)\Python 3.5.2\tensorflow\Master - Mask_RCNN>python
run.py
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\tfenv\lib\site-packages\tensorflow\python
platform\self_check.py", line 75, in preload_check
ctypes.WinDLL(build_info.cudart_dll_name)
File "C:\ProgramData\Anaconda3\envs\tfenv\lib\ctypes_init_.py", line 351, i
n init
self._handle = _dlopen(self._name, mode)
OSError: [WinError 126] 找不到指定的模組。

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "run.py", line 10, in
import coco
File "C:\Program Files (x86)\Python 3.5.2\tensorflow\Master - Mask_RCNN\coco.p
y", line 49, in
import utils
File "C:\Program Files (x86)\Python 3.5.2\tensorflow\Master - Mask_RCNN\utils.
py", line 15, in
import tensorflow as tf
File "C:\ProgramData\Anaconda3\envs\tfenv\lib\site-packages\tensorflow_init
_.py", line 24, in
from tensorflow.python import *
File "C:\ProgramData\Anaconda3\envs\tfenv\lib\site-packages\tensorflow\python
init.py", line 49, in
from tensorflow.python import pywrap_tensorflow
File "C:\ProgramData\Anaconda3\envs\tfenv\lib\site-packages\tensorflow\python
pywrap_tensorflow.py", line 30, in
self_check.preload_check()
File "C:\ProgramData\Anaconda3\envs\tfenv\lib\site-packages\tensorflow\python
platform\self_check.py", line 82, in preload_check
% (build_info.cudart_dll_name, build_info.cuda_version_number))
ImportError: Could not find 'cudart64_80.dll'. TensorFlow requires that this DLL
be installed in a directory that is named in your %PATH% environment variable.
Download and install CUDA 8.0 from this URL: https://developer.nvidia.com/cuda-t
oolkit

@jchodera
Copy link
Member Author

OK, a few things here:

  • This is the issue tracker for the omnia channel which does not support either the cudatoolkit package (on the anaconda channel) or tensorflow. Are you asking for help about using OpenMM or some other package from omnia, or just tensorflow?
  • The error message you are receiving from tensorflow suggests that the directory containing cudart64_80.dll has to be in your %PATH%:
ImportError: Could not find 'cudart64_80.dll'. TensorFlow requires that this DLL
be installed in a directory that is named in your %PATH% environment variable.

I think the anaconda channel cudatoolkit installs that DLL, but it may not be ending up added to your %PATH%, which tensorflow seems to require it. Maybe you can search for where conda installed it and try adding it to your path?

Good luck!

@aisri
Copy link

aisri commented Apr 10, 2018

Extending the issues with conda installation of cudatoolkit. I dont really see the nvcc binary (or cuda-gdb, nvvp, etc) tools installed when using conda.
Hope they fix this.

@jchodera
Copy link
Member Author

I don't think there's a supported mechanism for installing the CUDA toolkit via conda at this time. The best approach I've seen seems to be pytorch, which uses dummy packages to indicate which version of CUDA you've installed: https://anaconda.org/pytorch/repo

@nospotfer
Copy link

nospotfer commented Nov 6, 2018

Summary of steps I've done to install tensorflow-gpu in Ubuntu 18.04 (should work in any Debian-based distro).

1.Install Nvidia driver on a machine with supported Nvidia card.
$ sudo apt update
$ sudo ubuntu-drivers autoinstall
$ sudo reboot
2. You might need to switch to Nvidia GPU from Intel GPU by using

$ sudo prime-select nvidia
$ sudo reboot
3. Install Anaconda with python 3.6.

https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh (Don’t use python 3.7.)

  1. Install tensorflow-gpu

$ conda install tensorflow-gpu==1.11 cudatoolkit==9.0 cudnn==7.1.2 h5py

  1. You’re done.

Source: https://medium.com/datadriveninvestor/install-tensorflow-gpu-to-use-nvidia-gpu-on-ubuntu-18-04-do-ai-71b0ce64ebc5

@BladeSun
Copy link

@nospotfer this works! Thanks very much!
By the way, to specify the cuda version you must reinstall tensorflow-gpu with cudatoolkit==x.x cudnn==x.x instead of just run conda install -c anaconda cudatoolkit=x.x after the tensorflow is installed.
You could look up in this link to determine which cuda version you want.

@deama
Copy link

deama commented Jan 2, 2019

Guys, I don't get it, how do I get the nvcc comand to work on anaconda? It just says it's unreconginsed?
I'm on Win 7 btw.

@jchodera
Copy link
Member Author

jchodera commented Jan 2, 2019

@deama : We haven't yet migrated to installation of the CUDA tookit via conda. At this point, you still need to install the CUDA toolkit following instructions from NVIDIA.

@jchodera
Copy link
Member Author

jchodera commented Jan 2, 2019

Since we last checked, there are now more cudatoolkit packages available on the anaconda channel which might also be of use to us, though this still seems incomplete. If we only need to support the latest versions (9.2 and 10.0, which sounds eminently reasonable!) we can go with these packages that are complete, though the PPC version is missing (important for Summit), though we could help contribute here.

@peastman: For building nightly development packages of OpenMM, do you think CUDA 9.2 and 10.0 would be sufficient?

@peastman
Copy link
Contributor

peastman commented Jan 2, 2019

For building nightly development packages of OpenMM, do you think CUDA 9.2 and 10.0 would be sufficient?

That seems fine.

@deama
Copy link

deama commented Jan 3, 2019

Would following that nvidia guide work for me if I'm on windows using anaconda?

@jchodera
Copy link
Member Author

jchodera commented Jan 3, 2019

Would following that nvidia guide work for me if I'm on windows using anaconda?

@deama : Yes, that is how we intend it to work for now. Note that the NVIDIA installer will install the CUDA Toolkit at the system level, rather than in anaconda, but the OpenMM package you install via conda should find the system CUDA compiler.

@zeneofa
Copy link

zeneofa commented Jan 4, 2019

Hi, found my way here via googling. I work on summit/titan where there are cuda or cudatoolkit modules. I am wondering how the cudatoolkit conda package interacts with these. Is the package just a wrapper around the existing libraries, so that path/environment resolution is easier?

Earlier it is mentioned that the package can be used to get tensorflow up and running on an ubuntu system. how would this translate to an hpc environment with pytorch? Here we do not have sudo, and the gpu's are not accessible at install time (often the gpu nodes are behind a firewall). Also the install paths are non standard and environmental variables are often renamed, how does the package actually find the system CUDA compiler?

My main reason for asking is 'nccl', titan does not seem to have nccl installed and getting it installed on a cray system is proving challenging. Sometimes getting any thing compiled on cray/power system is challenging, so conda has been a great help.

Sorry for the long post and all the questions.

@jchodera
Copy link
Member Author

jchodera commented Jan 4, 2019

@zeneofa: We don't yet have access to Summit, but on Titan, you can simply load a CUDA toolkit module (7.5 or 9.2) and then install the corresponding version of OpenMM, where we have built a separate package for each CUDA version (7.5 through 10.0) that can be selected via a conda channel label, e.g.:

conda install -c conda-forge -c omnia -c omnia/label/cudaXY openmm

where cudaXY is for CUDA X.Y.

For tensorflow, this comment refers to this blog post, where the tensorflow-gpu metapackage installs the CUDA toolkit via the cudatoolkit package. You can see what the cudatoolkit contains by downloading one of the tarballs here and unpacking it to see what it contains.

It looks like it doesn't include nvcc, but nccl is available through a separate conda install

conda install nccl

though this is only built for a few CUDA versions. Some other versions have been compiled and pushed to anaconda cloud:
https://anaconda.org/search?q=nccl

My main reason for asking is 'nccl', titan does not seem to have nccl installed and getting it installed on a cray system is proving challenging. Sometimes getting any thing compiled on cray/power system is challenging, so conda has been a great help.

Amen!

@zeneofa
Copy link

zeneofa commented Jan 4, 2019

thanks for the info. Will look into that. One of my main problems with using conda packages on titan/rhea/eos and other hpc environment is the incompatibility GLIBC error that often pops up. So far the only way around this I have found is to recompile from scratch. Hence my current issues with getting pytorch working on Titan.

I will dig around the recipes, thanks again.

@1961ned
Copy link

1961ned commented Mar 6, 2019

To answer about the error posted by fastlater. I had to install tensorflow outside anaconda and run my programs with visual studio, ran fine.. trying to resolve problem in anaconda now(windows7)

@whungt
Copy link

whungt commented Apr 18, 2019

@1961ned Is there any update about this problem in anaconda on win7?

@1961ned
Copy link

1961ned commented Apr 21, 2019

nothing yet

@Astlaan
Copy link

Astlaan commented Nov 28, 2019

I installed cudatoolkit (10.0.130) in my computer via anaconda.
However, when I now try to install pycuda (via pip), I still get the following error:

 c:\users\me\appdata\local\temp\pip-install-o2935gt0\pycuda\src\cpp\cuda.hpp(14): fatal error C1083: Cannot open include file: 'cuda.h': No such file or directory
    error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\BuildTools\\VC\\Tools\\MSVC\\14.16.27023\\bin\\HostX86\\x64\\cl.exe' failed with exit status 

What's appears to be wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests