Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues on a large (120724) location dataset #375

Open
Dillon214 opened this issue Jul 13, 2024 · 4 comments
Open

Performance issues on a large (120724) location dataset #375

Dillon214 opened this issue Jul 13, 2024 · 4 comments
Labels
question Further information is requested

Comments

@Dillon214
Copy link

Hello cell2location devs,

I am working with a large visium dataset (12240 features, 120724 locations), and am running into issues when it comes to training the cell2location.models.Cell2location model. The specific issue is very simple, it's just too slow. I was getting a speed of about 10 seconds per iteration, and across 30000 iterations I was hitting estamated completion times of about 80 hours. I am running in GPU mode, and throwing additional a40 cores at the issue didn't seem to improve speed, I was using 8 with my latest run. I also tried batching the data so each batch was about 30,000 cells, which also didn't improve speed.

Do you have any advice for me? I would be greatly appreciative, as I have used this package before on a smaller dataset to great success, and it was done training in only a few hours.

sincerely,
Dillon Brownell

@Dillon214 Dillon214 added the question Further information is requested label Jul 13, 2024
@vitkl
Copy link
Contributor

vitkl commented Jul 13, 2024

Hi @Dillon214

Exciting data you have. Please have a look at this issue for practical suggestions about working with large data #356

We find the best performance when training cell2location with batch size equal to full data. This limits batch size by GPU memory. Roughly 18k genes * 60k locations for 80GB A100.

Are you using batch_size=None?

getting a speed of about 10 seconds per iteration

This is very slow. Could you confirm that the GPU is used?

throwing additional a40 cores at the issue didn't seem to improve speed, I was using 8 with my latest run

Cell2location doesn't support using multiple GPUs - only one GPU was used.

it was done training in only a few hours

This sounds expected.

@Dillon214
Copy link
Author

Hi Viktl,

Thanks for the speedy reply. And no, I was adjusting the batch_size argument to avoid out of memory errors. I'm no expert, but I'm guessing the size of the dataset exceeds the memory capacity of a single GPU. And yes, I can confirm that a GPU is being used. See attached image.

Based on the suggestions in the issue you posted, it seems like splitting the dataset into chunks, stratified by batch and other relevant variables perhaps, is a good idea for evading slowdown issues. Do you still reccomend this? I tested this a bit, and it seemed to speed up processing dramatically. I suppose afterwards the results can be re-merged.

-Dillon

image

@vitkl
Copy link
Contributor

vitkl commented Jul 14, 2024

Using batch_size other than batch_size=None is indeed expected to take between many days and more than a week (10 seconds per epoch is pretty fast). More importantly, training with batch_size equal to full data gives higher accuracy - so we don't recommend using minibatch training.

it seems like splitting the dataset into chunks, stratified by batch and other relevant variables perhaps, is a good idea for evading slowdown issues. Do you still reccomend this?

Yes, but you also need to set batch_size=None.

With batch_size=None the entire dataset is loaded into GPU memory once, while with minibatch training batch_size=X X number of observations are loaded into GPU memory in every training step.

@Dillon214
Copy link
Author

Hi Vitkl,

I followed the code you posted in that other thread for subsetting and individually processing multiple objects, and experienced much better training times.
One final question: say I want to now compute expected expression per cell type, as was done in the tutorial pictured below and at the following link: https://cell2location.readthedocs.io/en/latest/notebooks/cell2location_tutorial.html#Estimate-cell-type-specific-expression-of-every-gene-in-the-spatial-data-(needed-for-NCEM).

Would it be fine to perform this process for each individually processed object, then merge the results, or do you think a different approach should be taken?

-Dillon

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants