-
Notifications
You must be signed in to change notification settings - Fork 63
Description
Dear developers,
thank you for the amazing work you have put into Dynamo. I am encountering an issue when running dynamo on a large harmonized dataset (~4M cells) that was preprocessed and transformed externally across several datasets. Similar analyses with smaller datasets (up to ~100k cells) ran smoothly, but this larger dataset raises the following error at this step:
dyn.tl.dynamics(harmonized_adata,
assumption_mRNA='ss',
model='deterministic',
est_method='ols',
re_smooth=False,
del_2nd_moments=True,
cores=1)
where I obtain the following error:
adata.layers["velocity_S"][cur_cells_ind, valid_ind_] = vel_S
ret = csr_sample_offsets(M, N, self.indptr, self.indices, n_samples,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: could not convert integer scalar
After some debugging, I implemented a custom patch (safe_set_velocity) ensuring that all relevant layers are stored as sparse matrices and that assignments are shape-consistent:
import numpy as np
import scipy.sparse as sp
def safe_set_velocity(
adata,
vel_U,
vel_S,
vel_N,
vel_T,
vel_P,
_group,
cur_grp,
cur_cells_bools,
valid_bools_,
ind_for_proteins,
):
"""
Safe replacement for Dynamo's set_velocity with shape checks and auto-init.
"""
print("✅ safe_set_velocity CALLED!")
cells = np.where(cur_cells_bools)[0]
genes = np.where(valid_bools_)[0]
n_cells, n_genes = adata.n_obs, adata.n_vars
# --- ensure the target layer exists ---
if "velocity_S" not in adata.layers.keys():
print(f"[safe_set_velocity] Creating adata.layers['velocity_S'] with shape ({n_cells}, {n_genes})")
adata.layers["velocity_S"] = sp.csr_matrix((n_cells, n_genes), dtype=float)
# --- ensure dense matrix for assignment ---
try:
dense = adata.layers["velocity_S"].toarray()
except Exception as e:
print(f"[safe_set_velocity] Failed to read velocity_S as array: {e}")
dense = np.zeros((n_cells, n_genes), dtype=float)
# --- prepare vel_S ---
if sp.issparse(vel_S):
vel_S = vel_S.toarray()
vel_S = np.asarray(vel_S)
vel_S = np.squeeze(vel_S)
# --- fix shape if necessary ---
if vel_S.shape != (len(cells), len(genes)):
if vel_S.T.shape == (len(cells), len(genes)):
print("[safe_set_velocity] Transposing vel_S to match target slice")
vel_S = vel_S.T
elif vel_S.ndim == 1 and vel_S.shape[0] == len(genes):
vel_S = np.tile(vel_S, (len(cells), 1))
elif vel_S.ndim == 1 and vel_S.shape[0] == len(cells):
vel_S = np.tile(vel_S[:, None], (1, len(genes)))
elif vel_S.size == 1:
# Handle 1x1 sparse or array cases gracefully
if sp.issparse(vel_S):
scalar_value = vel_S.toarray().item()
elif isinstance(vel_S, np.ndarray):
scalar_value = vel_S.item()
else:
scalar_value = float(vel_S)
vel_S = np.full((len(cells), len(genes)), scalar_value)
else:
raise ValueError(
f"[safe_set_velocity] Cannot reshape vel_S {vel_S.shape} → ({len(cells)}, {len(genes)})"
)
# --- perform safe assignment ---
try:
dense[np.ix_(cells, genes)] = vel_S
adata.layers["velocity_S"] = sp.csr_matrix(dense)
print(f"[safe_set_velocity] ✅ Assigned block {vel_S.shape} to velocity_S[{len(cells)}x{len(genes)}]")
except Exception as e:
print(f"[safe_set_velocity] ❌ Assignment failed: {e}")
raise
return adata
This allowed me to bypass the original error, but at the next step (dyn.tl.cell_velocities), I encountered another issue, with the calculation of the PCA: ValueError: Input X contains NaN.
I verified that no NaN are present in adata.X, adata.layers["velocity"], or X_plus_V. This makes me think that my patch may not be the ideal solution.
💭 Questions
1. Do you have any recommendations for resolving the ValueError: could not convert integer scalar in large harmonized datasets?
2. Would it be acceptable to run dyn.tl.dynamics separately on individual subsets (e.g. per dataset before harmonization) and then merge the results, or would that bias the downstream velocity estimation?
📊 Additional Info
• adata.X: dense float32 array
• All layers (spliced, unspliced, etc.) are dense float32 arrays
• adata dimensions: 4,000,000 × 2,500
• No NaNs detected in any layer or .X
Thank you very much for your time and for maintaining such a powerful package.
Any guidance on how to proceed would be greatly appreciated!
Best,
Salvo D. Lombardo