Enabling/disabling GPU during inference #7622
-
With the introduction of the transformer models, I am wondering how we control whether or not to use the GPU for inference. I have cases where I do not wish to interrupt the GPUs on our server and when I am fine with slower CPU inference. I would like to have such control. When using the search functionality in the documentation, I only find GPU references for training, but I am talking about inference. Does that mean that inference is always CPU-bound and inference is never done on GPU? Or if it is, can we control whether or not to use the GPU? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Use |
Beta Was this translation helpful? Give feedback.
-
Is there a way to do similar to enable |
Beta Was this translation helpful? Give feedback.
Use
spacy.require_gpu()
andspacy.require_cpu()
to switch back and forth. A model is loaded on the device specified in the current context, so you have to reload the model after running the command to switch the model, too. A plain thinc model will stay on the device it was loaded on and keep working even if you switch the context, but models that use torch don't work if you switch the mode between CPU and GPU after loading them.