Default nan-handling policy is a memory hog

As single-cell datasets are really sparse, it's important to handle missing values in a way that doesn't consume too much memory. Currently, CellSNP labels missing entries with ".:.:.:.:.:."
 (11 bits at best). I would strongly suggest using an empty string instead of that stub. I have been processing the output of CellSNP, and when I manually replaced all occurrences of ".:.:.:.:.:." with an empty string, I reduced the file size **from 25.6Gb to 2.5Gb**. This is dramatic. Not only that this choice of nan-filling value wastes the memory but it also makes the file harder to process using some convenient tools in Python/R. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Default nan-handling policy is a memory hog #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Default nan-handling policy is a memory hog #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions