ValueError: could not convert integer scalar when running dyn.tl.dynamics on large harmonized dataset (~4M cells)

Dear developers,

thank you for the amazing work you have put into Dynamo. I am encountering an issue when running dynamo on a large harmonized dataset (~4M cells) that was preprocessed and transformed externally across several datasets. Similar analyses with smaller datasets (up to ~100k cells) ran smoothly, but this larger dataset raises the following error at this step:

dyn.tl.dynamics(harmonized_adata,
                assumption_mRNA='ss',
                model='deterministic',
                est_method='ols',
                re_smooth=False,
                del_2nd_moments=True,
                cores=1)

where I obtain the following error:
```
    adata.layers["velocity_S"][cur_cells_ind, valid_ind_] = vel_S
    ret = csr_sample_offsets(M, N, self.indptr, self.indices, n_samples,
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: could not convert integer scalar
```

After some debugging, I implemented a custom patch (safe_set_velocity) ensuring that all relevant layers are stored as sparse matrices and that assignments are shape-consistent:

```
import numpy as np
import scipy.sparse as sp

def safe_set_velocity(
    adata,
    vel_U,
    vel_S,
    vel_N,
    vel_T,
    vel_P,
    _group,
    cur_grp,
    cur_cells_bools,
    valid_bools_,
    ind_for_proteins,
):
    """
    Safe replacement for Dynamo's set_velocity with shape checks and auto-init.
    """
    print("✅ safe_set_velocity CALLED!")

    cells = np.where(cur_cells_bools)[0]
    genes = np.where(valid_bools_)[0]

    n_cells, n_genes = adata.n_obs, adata.n_vars

    # --- ensure the target layer exists ---
    if "velocity_S" not in adata.layers.keys():
        print(f"[safe_set_velocity] Creating adata.layers['velocity_S'] with shape ({n_cells}, {n_genes})")
        adata.layers["velocity_S"] = sp.csr_matrix((n_cells, n_genes), dtype=float)

    # --- ensure dense matrix for assignment ---
    try:
        dense = adata.layers["velocity_S"].toarray()
    except Exception as e:
        print(f"[safe_set_velocity] Failed to read velocity_S as array: {e}")
        dense = np.zeros((n_cells, n_genes), dtype=float)

    # --- prepare vel_S ---
    if sp.issparse(vel_S):
        vel_S = vel_S.toarray()
    vel_S = np.asarray(vel_S)
    vel_S = np.squeeze(vel_S)

    # --- fix shape if necessary ---
    if vel_S.shape != (len(cells), len(genes)):
        if vel_S.T.shape == (len(cells), len(genes)):
            print("[safe_set_velocity] Transposing vel_S to match target slice")
            vel_S = vel_S.T
        elif vel_S.ndim == 1 and vel_S.shape[0] == len(genes):
            vel_S = np.tile(vel_S, (len(cells), 1))
        elif vel_S.ndim == 1 and vel_S.shape[0] == len(cells):
            vel_S = np.tile(vel_S[:, None], (1, len(genes)))
        elif vel_S.size == 1:
            # Handle 1x1 sparse or array cases gracefully
            if sp.issparse(vel_S):
                scalar_value = vel_S.toarray().item()
            elif isinstance(vel_S, np.ndarray):
                scalar_value = vel_S.item()
            else:
                scalar_value = float(vel_S)
            vel_S = np.full((len(cells), len(genes)), scalar_value)
        else:
            raise ValueError(
                f"[safe_set_velocity] Cannot reshape vel_S {vel_S.shape} → ({len(cells)}, {len(genes)})"
            )

    # --- perform safe assignment ---
    try:
        dense[np.ix_(cells, genes)] = vel_S
        adata.layers["velocity_S"] = sp.csr_matrix(dense)
        print(f"[safe_set_velocity] ✅ Assigned block {vel_S.shape} to velocity_S[{len(cells)}x{len(genes)}]")
    except Exception as e:
        print(f"[safe_set_velocity] ❌ Assignment failed: {e}")
        raise

    return adata
```

This allowed me to bypass the original error, but at the next step (dyn.tl.cell_velocities), I encountered another issue, with the calculation of the PCA: ValueError: Input X contains NaN. 

I verified that no NaN are present in adata.X, adata.layers["velocity"], or X_plus_V. This makes me think that my patch may not be the ideal solution.

💭 Questions
	1.	Do you have any recommendations for resolving the ValueError: could not convert integer scalar in large harmonized datasets?
	2.	Would it be acceptable to run dyn.tl.dynamics separately on individual subsets (e.g. per dataset before harmonization) and then merge the results, or would that bias the downstream velocity estimation?

📊 Additional Info
	•	adata.X: dense float32 array
	•	All layers (spliced, unspliced, etc.) are dense float32 arrays
	•	adata dimensions: 4,000,000 × 2,500
	•	No NaNs detected in any layer or .X

Thank you very much for your time and for maintaining such a powerful package.
Any guidance on how to proceed would be greatly appreciated!

Best,
Salvo D. Lombardo



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ValueError: could not convert integer scalar when running dyn.tl.dynamics on large harmonized dataset (~4M cells) #732

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ValueError: could not convert integer scalar when running dyn.tl.dynamics on large harmonized dataset (~4M cells) #732

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions