Skip to content

Init spatial experiment #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 37 commits into from
Jan 28, 2025
Merged

Init spatial experiment #1

merged 37 commits into from
Jan 28, 2025

Conversation

keviny2
Copy link
Collaborator

@keviny2 keviny2 commented Jan 21, 2025

No description provided.

@keviny2 keviny2 requested a review from jkanche January 21, 2025 23:09
Copy link
Member

@jkanche jkanche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Awesome work!

@jkanche
Copy link
Member

jkanche commented Jan 22, 2025

Just remembered, slicing needs to be updated so it also slices the image and spatial coords props. Technically only need to implement this function similar to SingleCellExperiment here

@jkanche jkanche linked an issue Jan 23, 2025 that may be closed by this pull request
@keviny2
Copy link
Collaborator Author

keviny2 commented Jan 25, 2025

Recording behavior regarding img_data and column_data.

Constructor

  • All sample_ids in img_data must be present in column_data['sample_id'], but not all sample_ids in column_data need to be present in img_data['sample_id'] (i.e. there can be sample_ids in column_data['sample_id'] that are not in img_data['sample_id']

column_data setter

  • Bioconductor only checks that the number of unique sample_ids in column_data remains constant, irrespective of the actual values of the new sample_ids
  • It is possible to set a new column_data with a completely different set of sample_ids (doesn't even have to include the sample_id in img_data); as long as the number of unique sample_ids in the new column_data match the old column_data. Confusingly, doing this this will actually modify the sample_id in the img_data...

Here's an example

library(SpatialExperiment)
dir <- system.file(
   file.path("extdata", "10xVisium", "section1", "outs"),
   package = "SpatialExperiment")

# read in counts
fnm <- file.path(dir, "raw_feature_bc_matrix")
sce <- DropletUtils::read10xCounts(fnm)

# read in image data
img <- readImgData(
    path = file.path(dir, "spatial"),
    sample_id = "foo")

# read in spatial coordinates
fnm <- file.path(dir, "spatial", "tissue_positions_list.csv")
xyz <- read.csv(fnm, header = FALSE,
    col.names = c(
        "barcode", "in_tissue", "array_row", "array_col",
        "pxl_row_in_fullres", "pxl_col_in_fullres"))
xyz$sample_id <- c(rep("bar",25),rep("foo",25))

# construct observation & feature metadata
rd <- S4Vectors::DataFrame(
    symbol = rowData(sce)$Symbol)

# construct 'SpatialExperiment'
spe <- SpatialExperiment(
    assays = list(counts = assay(sce)),
    rowData = rd, 
    colData = DataFrame(xyz), 
    spatialCoordsNames = c("pxl_col_in_fullres", "pxl_row_in_fullres"),
    imgData = img
)

xyz_new <- DataFrame(xyz)
xyz_new$sample_id <- c(rep("bar",25),rep("baz",25))
colData(spe) <- xyz_new

# no more `foo` in `sample_id`
colData(spe)

# `sample_id` changes to `baz`
imgData(spe)

@jkanche How do you think we should handle this?

@jkanche
Copy link
Member

jkanche commented Jan 25, 2025

I can understand why this might be the case (if the image is huge or someone doesn't care too much about the image itself). So I propose In addition to the current check you already have in the constructor, we

Rule (1): check if all sample_ids in column_data are accounted for in imgdata["sample_id"] and if they do not, we don't raise an exception but warn the user. We may in the future after talking to the SPE folks can decide to change this to an Exception rather than a warning.

Constructor

  • All sample_ids in img_data must be present in column_data['sample_id'], but not all sample_ids in column_data need to be present in img_data['sample_id'] (i.e. there can be sample_ids in column_data['sample_id'] that are not in img_data['sample_id']

I don't understand the checking for unique number of sample_ids. So to be consistent, what makes sense to me right now is we do the same as what we do in the constructor: all sample_ids in img_data need to be in column_data's sample_id column and apply Rule (1).

column_data setter

* Bioconductor only checks that the number of unique `sample_id`s in `column_data` remains constant, irrespective of the actual values of the new `sample_id`s

* It is possible to set a new `column_data` with a completely different set of `sample_id`s (doesn't even have to include the `sample_id` in `img_data`); as long as the number of unique `sample_id`s in the new `column_data` match the old `column_data`. Confusingly, doing this this will actually modify the `sample_id` in the `img_data`...

Does that make sense?

edit 1: If you also agree, lets document this behavior in the constructor and the column data setter methods.

@keviny2
Copy link
Collaborator Author

keviny2 commented Jan 27, 2025

@jkanche I think the reason why the sample_ids don't have to match in the column_data setter is so that you can 'rename' samples. See this test case: https://github.com/drighelli/SpatialExperiment/blob/devel/tests/testthat/test_SpatialExperiment-colData.R#L20

See #8

@keviny2 keviny2 self-assigned this Jan 28, 2025
@jkanche jkanche merged commit 673a092 into main Jan 28, 2025
7 checks passed
@jkanche jkanche deleted the init-spatial-experiment branch January 28, 2025 05:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

SpatialExperiment class
2 participants