Skip to content

[BUG] SCDL concatenation does not use raw.X for large file #1222

@camirr-nv

Description

@camirr-nv

BioNeMo Framework Version

bfe1a33

Bug Description

When using scdl to concatenate these 12 h5ads, the resulting scdl dataset is expected to always use raw.X when available. Raw.X contains integer values while .X contains floats. For the files below, scdl behaved as expected except for the largest file where .X was used instead of raw.X even though it is available.

There's one continuous block of float values. It starts at cell 818,598 and ends at cell 2,061,907. This block contains 1,243,310 cells (53.82% of the dataset). This block corresponds to one h5ad file which is the largest file (23.2 GB) of the 12 input files -> 3faad104-2ab8-4434-816d-474d8d2641db.h5ad

total 46G
-rw-r--r-- 1 ubuntu ubuntu 8.0G Jul 24 20:02 218acb0f-9f2f-4f76-b90b-15a4b7c7f629.h5ad
-rw-r--r-- 1 ubuntu ubuntu 1.9G Jul 24 20:02 21d3e683-80a4-4d9b-bc89-ebb2df513dde.h5ad
-rw-r--r-- 1 ubuntu ubuntu 4.1G Jul 24 20:02 2a498ace-872a-4935-984b-1afa70fd9886.h5ad
-rw-r--r-- 1 ubuntu ubuntu 438M Jul 24 20:02 2adb1f8a-a6b1-4909-8ee8-484814e2d4bf.h5ad
-rw-r--r-- 1 ubuntu ubuntu 655M Jul 24 20:03 30cd5311-6c09-46c9-94f1-71fe4b91813c.h5ad
-rw-r--r-- 1 ubuntu ubuntu 757M Jul 24 20:03 3c75a463-6a87-4132-83a8-c3002624394d.h5ad
-rw-r--r-- 1 ubuntu ubuntu 24G Aug 14 06:23 3faad104-2ab8-4434-816d-474d8d2641db.h5ad
-rw-r--r-- 1 ubuntu ubuntu 541M Jul 24 20:04 59b69042-47c2-47fd-ad03-d21beb99818f.h5ad
-rw-r--r-- 1 ubuntu ubuntu 1.3G Jul 24 20:04 5af90777-6760-4003-9dba-8f945fec6fdf.h5ad
-rw-r--r-- 1 ubuntu ubuntu 270M Jul 24 20:04 5bc42b88-bb76-4954-927b-8bb7369adc64.h5ad
-rw-r--r-- 1 ubuntu ubuntu 120M Jul 24 20:04 8c42cfd0-0b0a-46d5-910c-fc833d83c45e.h5ad
-rw-r--r-- 1 ubuntu ubuntu 4.1G Jul 24 20:05 9dbab10c-118d-496b-966a-67f1763a6b7d.h5ad

Steps to Reproduce

convert_h5ad_to_scdl.py --data-path ./

Error Messages and Logs

Docker Image

No response

System Information

Environment Details:

  • OS: [e.g., Ubuntu 20.04]
  • CPU: [e.g., Intel i9-12900K]
  • RAM: [e.g., 64GB]

GPU Details:

  • GPU Model: [e.g., NVIDIA RTX 4090]
  • GPU Memory: [e.g., 24GB]
  • CUDA Version: [e.g., 12.1]
  • CUDA Driver: [e.g., 525.85.05]
  • cuDNN Version: [e.g., 8.9.0]

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions