Enabling/disabling GPU during inference #7622

BramVanroy · 2021-03-31T14:05:58Z

BramVanroy
Mar 31, 2021

With the introduction of the transformer models, I am wondering how we control whether or not to use the GPU for inference. I have cases where I do not wish to interrupt the GPUs on our server and when I am fine with slower CPU inference. I would like to have such control.

When using the search functionality in the documentation, I only find GPU references for training, but I am talking about inference. Does that mean that inference is always CPU-bound and inference is never done on GPU? Or if it is, can we control whether or not to use the GPU?

Answered by adrianeboyd

Mar 31, 2021

Use spacy.require_gpu() and spacy.require_cpu() to switch back and forth. A model is loaded on the device specified in the current context, so you have to reload the model after running the command to switch the model, too. A plain thinc model will stay on the device it was loaded on and keep working even if you switch the context, but models that use torch don't work if you switch the mode between CPU and GPU after loading them.

View full answer

adrianeboyd · 2021-03-31T14:23:08Z

adrianeboyd
Mar 31, 2021

Use spacy.require_gpu() and spacy.require_cpu() to switch back and forth. A model is loaded on the device specified in the current context, so you have to reload the model after running the command to switch the model, too. A plain thinc model will stay on the device it was loaded on and keep working even if you switch the context, but models that use torch don't work if you switch the mode between CPU and GPU after loading them.

1 reply

BramVanroy Mar 31, 2021
Author

Thanks, that makes sense! I'll probably not move the models between devices (only once), so I'll probably use something like this (prefer_gpu to avoid errors when cuda-less spaCy was installed).

if use_gpu:
    spacy.prefer_gpu()  # Only use GPU if it is available
else:
    spacy.require_cpu()

nlp = spacy.load(model)

DSLituiev · 2023-03-14T01:15:25Z

DSLituiev
Mar 14, 2023

Is there a way to do similar to enable .to("mps") on Mac M1 for a pytorch backend model?

1 reply

danieldk Mar 14, 2023

What is your goal? If you want to do GPU inference using Metal Performance Shaders, that's already possible by passing --gpu-id 0 to spacy or using spacy.prefer_gpu() before loading the model. On M1 Macs, this will use the mps PyTorch device.

If you want to move the model to another device after it is loaded, that is currently not supported. You could probably do it by changing the active ops to AppleOps or NumpyOps and then poking through the PyTorch wrapper/shim to call to on the PyTorch model. But other things may break.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Enabling/disabling GPU during inference #7622

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Enabling/disabling GPU during inference #7622

Uh oh!

BramVanroy Mar 31, 2021

Replies: 2 comments · 2 replies

Uh oh!

adrianeboyd Mar 31, 2021

Uh oh!

BramVanroy Mar 31, 2021 Author

Uh oh!

Uh oh!

DSLituiev Mar 14, 2023

Uh oh!

danieldk Mar 14, 2023

BramVanroy
Mar 31, 2021

Replies: 2 comments 2 replies

adrianeboyd
Mar 31, 2021

BramVanroy Mar 31, 2021
Author

DSLituiev
Mar 14, 2023