Skip to content

Meta issue on improving cubeviz performance #3795

@astrofrog

Description

@astrofrog

I have been investigating the performance of cubeviz with large cubes, which is very poor. Fundamentally the issue is that in a number of places, the whole data is loaded into memory and copied, which means that when running cubeviz, one needs several times more memory than the cube size.

As an example, I took a 3.4GB cube, and simply loaded it in cubeviz:

from jdaviz import Cubeviz
cubeviz = Cubeviz()
cubeviz.load_data('cube.fits')

The peak memory usage was 15.2GB and this took 19s seconds to execute. I then opened the following PRs:

With all these PRs, the peak memory usage is now 9.2GB and the code above takes 7s to run

This is better, but is still not going to solve the issue if someone tries to load a 25GB cube (like @camipacifici did originally), since it seems one still needs 3x more memory than the cube size.

The remaining large memory allocations appear to be mainly due to flux unit conversions - that is, the entire cube is converted to put it in different units (several times). This may however be unnecessary because the glue image viewer supports on-the-fly conversions of the flux unit (I believe this is used in imviz) so in principle the unit conversion could just be done on the spectra (and even then this could be fixed if we implement on-the-fly unit conversion on the profile and scatter viewers).

I guess the question going forward is, should cubeviz be able to handle larger-then-memory data? glue-core and glue-jupyter are designed so as to be able to handle arbitrarily large cubes at least from the memory aspect (of course some computations will be slow) and they should never load the whole data from disk (in that they are designed to take advantage of memory-mapped array). Is supporting large datasets a priority, or outside scope? If we do want to push forward with this, the above PRs are mainly low-hanging fruit - to get to the point where we can handle larger than memory data and not ever load the data fully into memory, it will require a bit more work and coordination and we should probably have a dedicated meeting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions