map.average_precision killed when using large sample size

Hi,

I was using copairs to get a Phenotypic activity assesment based on mAP.

I was doing this using cpg0014 extracted features averaged per well. Below I show the shape of the data
```
>>> feats_meta[1].shape
(9216, 23)
```

I believe the issue I am experiencing could be solved by adding more resources to my VM but you might want to take care of an scenario when compute resources are limited but still desired to complete the job.


Memory available
```
               total        used        free      shared  buff/cache   available
Mem:            14Gi       1.7Gi        12Gi       1.0Mi       300Mi        12Gi
Swap:             0B          0B          0B
```

CPU info
```
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU @ 2.30GHz
    CPU family:          6
    Model:               63
    Thread(s) per core:  2
    Core(s) per socket:  2
    Socket(s):           1
    Stepping:            0
    BogoMIPS:            4599.99
```

Running the code below, it gets killed when trying to do map.average_precision on the whole dataset. Subsampling solves the issue and the job completes

```

from copairs import map
from copairs.matching import assign_reference_index

df_metadata = feats_meta[1]
feats = feats_meta[0]

reference_col = "Metadata_reference_index"
df_metadata_activity = assign_reference_index(
    df_metadata,
    "Metadata_broad_id == 'None'",  # condition to get reference profiles (neg controls)
    reference_col=reference_col,
    default_value=-1,
)

# positive pairs are replicates of the same treatment
pos_sameby = ["Metadata_broad_id", reference_col]
pos_diffby = []
neg_sameby = []
# negative pairs are replicates of different treatments
neg_diffby = ["Metadata_broad_id", reference_col]


metadata = df_metadata_activity
profiles = feats.values

activity_ap = map.average_precision(
    metadata, profiles, pos_sameby, pos_diffby, neg_sameby, neg_diffby
)

activity_ap = activity_ap.query("Metadata_broad_id != 'None'")  # remove DMSO
activity_ap.to_csv("output/mAP/mAP.csv", index=False)

activity_map = map.mean_average_precision(
    activity_ap, pos_sameby, null_size=1000000, threshold=0.05, seed=0
)
```





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

map.average_precision killed when using large sample size #90

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

map.average_precision killed when using large sample size #90

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions