Unable to allocate enough memory #32

jcgrenier · 2025-05-12T18:28:19Z

Hello! I've been trying to run neural-admixture in train mode on a big dataset containing almost 500,000 samples and 161k markers but I am not able to make it run in GPU mode. It looks like it tries to send everything in the GPU memory. Do you have any idea how to handle such cases?

Quick note, I've been able to generate the PCA with CPUs only, using more than 1.3Tb of RAM to do so.

I also tried reducing the batch size, but I'm still having the same issue :

Here's the trace :
neural-admixture train --num_cpus 12 --num_gpus 1 --k 2 --name neuralAdmixture --data_path dataset.bed --save_dir neural_admixture_gpus --pca_path neural_admixture_gpus/neuralAdmixture_pca.pt --batch_size 400


    Input format is BED.
Mapping files:   0%|                                                                                                                                                                          | 0/3 [00:00<?, ?it/s] ~/neural-admixture/nadmenv/lib/python3.11/site-packages/neural_admixture/src/snp_reader.py:61: FutureWarning: The 'delim_whitespace' keyword in pd.read_csv is deprecated and will be removed in a future version. Use ``sep='\s+'`` instead
  _, _, G = read_plink(str(Path(file).with_suffix("")))
Mapping files:  33%|██████████████████████████████████████████████████████                                                                                                            | 1/3 [00:00<00:01,  1.24it/s] ~/neural-admixture/nadmenv/lib/python3.11/site-packages/neural_admixture/src/snp_reader.py:61: FutureWarning: The 'delim_whitespace' keyword in pd.read_csv is deprecated and will be removed in a future version. Use ``sep='\s+'`` instead
  _, _, G = read_plink(str(Path(file).with_suffix("")))
Mapping files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:18<00:00,  6.15s/it]
    Data contains missing values. Will perform mean-imputation.
    Data contains 487929 samples and 161240 SNPs.
    Bringing data into memory...


    Unexpected error
Traceback (most recent call last):
  File "~/neural-admixture/nadmenv/bin/neural-admixture", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "~/neural-admixture/nadmenv/lib/python3.11/site-packages/neural_admixture/entry.py", line 64, in main
    sys.exit(train.main(0, arg_list[2:], num_gpus))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/neural-admixture/nadmenv/lib/python3.11/site-packages/neural_admixture/src/train.py", line 139, in main
    raise e
  File "~/neural-admixture/nadmenv/lib/python3.11/site-packages/neural_admixture/src/train.py", line 114, in main
    fit_model(args, trX, device, num_gpus, tr_pops, master)
  File "~/neural-admixture/nadmenv/lib/python3.11/site-packages/neural_admixture/src/train.py", line 30, in fit_model
    data, y = utils.initialize_data(master, trX, tr_pops)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/neural-admixture/nadmenv/lib/python3.11/site-packages/neural_admixture/src/utils.py", line 113, in initialize_data
    data = trX.compute()
           ^^^^^^^^^^^^^
  File "~/neural-admixture/nadmenv/lib/python3.11/site-packages/dask/base.py", line 379, in compute
    (result,) = compute(self, traverse=False, **kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/neural-admixture/nadmenv/lib/python3.11/site-packages/dask/base.py", line 667, in compute
    return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/neural-admixture/nadmenv/lib/python3.11/site-packages/dask/base.py", line 667, in <listcomp>
    return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
                   ^^^^^^^^
  File "~/neural-admixture/nadmenv/lib/python3.11/site-packages/dask/array/core.py", line 1282, in finalize
    return concatenate3(results)
           ^^^^^^^^^^^^^^^^^^^^^
  File "~/neural-admixture/nadmenv/lib/python3.11/site-packages/dask/array/core.py", line 5313, in concatenate3
    result = np.empty(shape=shape, dtype=dtype(deepfirst(arrays)))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 293. GiB for an array with shape (487929, 161240) and data type float32

Thanks for your help!

JC

The text was updated successfully, but these errors were encountered:

joansaurina · 2025-05-13T11:03:22Z

Hi JC,

Thank you for the detailed message and the traceback — this isn’t an error on your side.

Neural-Admixture is indeed built to handle large-scale datasets, including biobank-level data. So your dataset — nearly 500,000 samples and 161k markers — is well within the expected range.

That said, the memory issue you’re encountering stems from a known bug in the data loading pipeline.

The good news: we’re releasing an update later this week that resolves this issue. The new version significantly reduces GPU memory usage during training, especially with large datasets like yours.

I’ll follow up here as soon as the update is live!

Best regards,
Joan

jcgrenier · 2025-05-13T13:49:14Z

Hello @joansaurina,

That's a very good news! I will wait for the update and hope it will resolve our issue!
Thanks a lot for getting back quickly to me!

Best,

JC

jcgrenier · 2025-05-20T15:54:03Z

Hello @joansaurina, any updates about the new release?

Thanks a lot again!
JC

joansaurina · 2025-05-22T15:02:01Z

It's ready it will come out any time soon.
This week or early next week.

Joan

joansaurina · 2025-05-27T09:40:28Z

Hey @jcgrenier — the new version v1.6.1 is now available!

Make sure to reinstall, and let us know how it goes. :)

Joan

jcgrenier · 2025-05-27T18:36:26Z

Thanks for letting me know! Is there new requirements for that new version? Do we need another python version?
Because we are working on a HPC, we are required to use virtual environment instead of conda. I'm not able to find the new version with my previous environment and tried to create a new one, but without success.

Furthermore, when I try to install it from the git, I have multiple issues with some dependencies, but particularly with numpy, for which version 2.2.5 seems needed, but later on during the installation, some other dependency require a previous version. Is it normal?

  Downloading scikit-learn-1.4.2.tar.gz (7.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.8/7.8 MB 18.0 MB/s eta 0:00:00
  Installing build dependencies ... error
  error: subprocess-exited-with-error

  × pip subprocess to install build dependencies did not run successfully.
  │ exit code: 1
  ╰─> [10 lines of output]
      Looking in links: /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo2023/x86-64-v3, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo2023/generic, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/generic, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo2023/x86-64-v3, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/gentoo2023/generic, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/generic
      Collecting setuptools
        Obtaining dependency information for setuptools from https://files.pythonhosted.org/packages/a3/dc/17031897dae0efacfea57dfd3a82fdd2a2aeb58e0ff71b77b87e44edc772/setuptools-80.9.0-py3-none-any.whl.metadata
        Using cached setuptools-80.9.0-py3-none-any.whl.metadata (6.6 kB)
      Processing /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/generic/wheel-0.45.1+computecanada-py3-none-any.whl
      Collecting Cython>=3.0.8
        Obtaining dependency information for Cython>=3.0.8 from https://files.pythonhosted.org/packages/a7/97/8e8637e67afc09f1b51a617b15a0d1caf0b5159b0f79d47ab101e620e491/cython-3.1.1-py3-none-any.whl.metadata
        Using cached cython-3.1.1-py3-none-any.whl.metadata (3.2 kB)
      ERROR: Could not find a version that satisfies the requirement numpy==2.0.0rc1 (from versions: 1.23.2+computecanada, 1.24.4+computecanada, 1.25.2+computecanada, 1.26.4+computecanada, 2.1.1+computecanada, 2.2.2+computecanada)
      ERROR: No matching distribution found for numpy==2.0.0rc1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

Thanks for your help!

joansaurina · 2025-05-28T07:37:38Z

Hey @jcgrenier,

It's strange — we tested with virtualenv and were able to install the new version v1.6.3 successfully.

Could you try again with a fresh Python 3.12 environment?

Feel free to reach out at [[email protected]] to schedule a call if you're still having trouble.

Joan

jcgrenier · 2025-05-28T15:55:25Z

Hello @joansaurina,
I was finally able to install it, but needed to clone the repo, and change some requirements version.
I had also some limitations in the wheels available on my system with Python 3.12, so I tried it with python 3.11.5.

It started with an error regarding numpy while trying to install it from the git :

ERROR: Could not find a version that satisfies the requirement numpy>=2.2.5 (from neural-admixture) 
ERROR: No matching distribution found for numpy>=2.2.5

So I changed the setup.cfg file so it could work with numpy>=1.21.0,<2.0.0.

But then torch had also an issue while running the training :

AttributeError: module 'torch.nn' has no attribute 'RMSNorm'

So I extended the requirements so it can take torch 2.4.1 (because 2.4.0 was not available on the wheels on our system).
It looks like it works now.

Hope these changes won't create any issues thought.

Thanks.
JC

joansaurina self-assigned this May 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unable to allocate enough memory #32

Unable to allocate enough memory #32

jcgrenier commented May 12, 2025 •

edited

Loading

joansaurina commented May 13, 2025

Uh oh!

jcgrenier commented May 13, 2025

Uh oh!

jcgrenier commented May 20, 2025

Uh oh!

joansaurina commented May 22, 2025 •

edited

Loading

Uh oh!

joansaurina commented May 27, 2025 •

edited

Loading

Uh oh!

jcgrenier commented May 27, 2025

Uh oh!

joansaurina commented May 28, 2025

Uh oh!

jcgrenier commented May 28, 2025

Uh oh!

Unable to allocate enough memory #32

Unable to allocate enough memory #32

Comments

jcgrenier commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

joansaurina commented May 13, 2025

Uh oh!

jcgrenier commented May 13, 2025

Uh oh!

jcgrenier commented May 20, 2025

Uh oh!

joansaurina commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joansaurina commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcgrenier commented May 27, 2025

Uh oh!

joansaurina commented May 28, 2025

Uh oh!

jcgrenier commented May 28, 2025

Uh oh!

jcgrenier commented May 12, 2025 •

edited

Loading

joansaurina commented May 22, 2025 •

edited

Loading

joansaurina commented May 27, 2025 •

edited

Loading