Skip to content

map.average_precision killed when using large sample size #90

@AMCalejandro

Description

@AMCalejandro

Hi,

I was using copairs to get a Phenotypic activity assesment based on mAP.

I was doing this using cpg0014 extracted features averaged per well. Below I show the shape of the data

>>> feats_meta[1].shape
(9216, 23)

I believe the issue I am experiencing could be solved by adding more resources to my VM but you might want to take care of an scenario when compute resources are limited but still desired to complete the job.

Memory available

               total        used        free      shared  buff/cache   available
Mem:            14Gi       1.7Gi        12Gi       1.0Mi       300Mi        12Gi
Swap:             0B          0B          0B

CPU info

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU @ 2.30GHz
    CPU family:          6
    Model:               63
    Thread(s) per core:  2
    Core(s) per socket:  2
    Socket(s):           1
    Stepping:            0
    BogoMIPS:            4599.99

Running the code below, it gets killed when trying to do map.average_precision on the whole dataset. Subsampling solves the issue and the job completes


from copairs import map
from copairs.matching import assign_reference_index

df_metadata = feats_meta[1]
feats = feats_meta[0]

reference_col = "Metadata_reference_index"
df_metadata_activity = assign_reference_index(
    df_metadata,
    "Metadata_broad_id == 'None'",  # condition to get reference profiles (neg controls)
    reference_col=reference_col,
    default_value=-1,
)

# positive pairs are replicates of the same treatment
pos_sameby = ["Metadata_broad_id", reference_col]
pos_diffby = []
neg_sameby = []
# negative pairs are replicates of different treatments
neg_diffby = ["Metadata_broad_id", reference_col]


metadata = df_metadata_activity
profiles = feats.values

activity_ap = map.average_precision(
    metadata, profiles, pos_sameby, pos_diffby, neg_sameby, neg_diffby
)

activity_ap = activity_ap.query("Metadata_broad_id != 'None'")  # remove DMSO
activity_ap.to_csv("output/mAP/mAP.csv", index=False)

activity_map = map.mean_average_precision(
    activity_ap, pos_sameby, null_size=1000000, threshold=0.05, seed=0
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions