-
-
Notifications
You must be signed in to change notification settings - Fork 16
Description
At the moment, when we read / write data to layers, obsm, varm, obsp, varp, and uns, the whole named list is loaded into memory in order to fetch / write a single object.
How about we create something like a LazyNamedList:
#' @title Lazy Named List
#'
#' @description A lazy named list that loads elements on-demand to avoid
#' materializing large objects unnecessarily. Used internally for efficient
#' access to layers, obsm, varm, obsp, varp, uns slots.
#'
#' @keywords internal
LazyNamedList <- R6::R6Class(
"LazyNamedList",
public = list(
#' @description Create a new LazyNamedList
#' @param get_keys_fn Function that returns all available keys: function() -> list of strings
#' @param set_keys_fn Function to set all keys: function(keys) -> invisible()
#' @param get_value_fn Function to get element by key: function(key) -> object
#' @param set_value_fn Function to set element by key: function(key, value) -> invisible()
#' @param set_values_fn Function to set multiple elements: function(named_list) -> invisible()
#' @param get_rownames_fn An optional function to get the rownames the values should be aligned to: function() -> list of strings
#' @param get_colnames_fn An optional function to get the rownames the values should be aligned to: function() -> list of strings
initialize = function(
get_keys_fn,
set_keys_fn, # <- strictly speaking we don't need this
get_value_fn,
get_values_fn, # <- strictly speaking we don't need this
set_value_fn,
set_values_fn, # either this
remove_entry_fn, # or that
get_rownames_fn = NULL,
get_colnames_fn = NULL,
# OR if we don't want to pass rownames/colnames, we'll need to pass the original adata_obj:
adata_obj = ...
type = c("obsm", "varm", "obsp", "varp", ...)
) {
# ...
},
),
# ...
)So rather than having a way for reading / writing the entire obsm in an HDF5AnnData, the
The HDF5AnnData, InMemoryAnnData, ReticulateAnnData and AnnDataView will then need to implement the get_keys_fn, set_keys_fn, get_value_fn, etc... functions. In AbstractAnnData, the slots are then updated to something like:
#' @field obsm See [AnnData-usage]
obsm = function(value) {
proxy <- LazyNamedList$new(
get_keys_fn = function() self$obsm_keys(),
set_keys_fn = function(keys) private$.set_obsm_keys(keys),
get_value_fn = function(name) private$.get_obsm_value(name),
set_value_fn = function(name, value) private$.set_obsm_value(name, value),
set_values_fn = function(named_list) private$.set_obsm_values(named_list),
get_rownames_fn = function() self$obs_names
)
if (missing(value)) {
# if there is no value, the user is accessing adata$obsm and thus
# the proxy object should be returned. It might be that they are
# then subsetting the obsm afterwards, e.g. `adata$obsm[["X_pca"]]`
# or `adata$obsm[["X_pca"]] <- ...some matrix...`.
proxy
} else {
# user is setting the obsm with a new named list.
proxy$set_obsm_fn(value)
}
}I wonder whether or not values fetched from the underlying AnnData Backend should be cached. For InMemoryAnnData this wouldn't make sense, but for HDF5AnnData it might, though I'm worried it might cause more issues than it solves.
@lazappi @LouiseDck Wdyt?