Skip to content

jhliu17/SOAR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧫 Single-Cell Omics Arena

Single-Cell Omics Arena (SOAR) is a comprehensive benchmark framework designed to evaluate and improve the performance of instruction-tuned large language models (LLMs) in automated cell type annotation from single-cell omics data.

Updates

  • [2025-08-03] 🎉 We are excited to provide a command line interface and the pre-built package soar_benchmark for easier usages.
  • [2025-05-12] 🎉 We are excited to announce that Single-Cell Omics Arena (SOAR) is now open-source! We welcome contributions from the community to help advance automated cell type annotation using LLMs.

Installation

  1. Create an environment with Python >= 3.11
  2. Clone the repo via git clone [email protected]:jhliu17/SOAR.git
  3. Install soar_benchmark via pip install -e .

API Key Setup

To execute LLMs provided by OpenAI or hosted on Hugging Face Transformers, an env file (env.toml) should be set up in the project folder. A template env file is provided as env_sample.toml.

Cell Type Annotation with LLMs

Annotation Interface

To print out all supported LLMs, please run:

soar annotate -h

This will print out all available annotation options using LLMs.

soar currently supports the following LLMs for cell type annotation:

  • Qwen2 series (1.5B, 7B, 72B)
  • Meta Llama-3 70B
  • Mixtral-8x7B
  • GPT-4o
  • GPT-4o-mini

where all LLMs can leverage zero-shot prompting or zero-shot chain-of-thought prompting to finish cell type annotations.

Each model has specific hardware requirements and configurations:

  • Smaller models (1.5B-7B): Single GPU with 16GB VRAM
  • Larger models (70B-72B): Multi-GPU setup with 4-8 GPUs
  • Mixtral-8x7B: 4 GPUs recommended for optimal performance

For example, to reproduce the SOAR-RNA benchmark result on GPT-4o using zero-shot prompting, one can run

soar annotate soar_rna_with_gpt4_o_zero_shot

Custom Dataset

If one would like to leverage a provided LLM annotation configuration to annotate their own dataset, this can be achieved by

soar annotate soar_rna_with_gpt4_o_zero_shot --config.dataset.json-path YOUR_DATASET_PATH

where the custom dataset should follow the same structure as soar_benchmark/datasets/soar_rna.json.

One can further fine-tune the preset configuration by overriding some arguments. For example, increasing the new token number limit to 2048.

soar annotate soar_rna_with_gpt4_o_zero_shot --config.generation.max-new-tokens 2048

# To see more tunable options
soar annotate soar_rna_with_gpt4_o_zero_shot -h

Custom LLM Configuration

If you would like to implement a custom annotation configuration. Please refer to the detailed configuration settings including batch sizes, memory requirements, and hardware specifications in:

  • Model configs: soar_benchmark/configs/cell_type_annotation/experiment_soar_rna.py

Once you have implemented a custom configuration, you can use it by calling the built-in annotation function

from soar_benchmark import start_annotation_task

# Your custom configuration
custom_configuration = CellTypeAnnotationTaskConfig(...)

# Start annotation
start_annotation_task(custom_configuration)

Evaluations

To run evaluations on annotated results, please refer to

python -m analysis.cell_type_annotation.squad_eval --chat_results_path outputs/.../qwen2-72b-instruct.json --squad_eval_results_path outputs/.../few_shot_squad_eval_inflect.json

Metrics

To evaluate the free-format cell type annotations generated by LLMs, we employ seven widely used metrics from natural language processing and question answering. These include ROUGE (R-1, R-2, R-L) for measuring n-gram and sequence overlap, METEOR for capturing semantic similarity through surface forms, stems, and synonyms, and BLEU (BLEU-1, BLEU-2, and geometric average) for n-gram overlap, particularly suited for short phrases. Additionally, the Exact Match (EM) and F1 score are used to assess token-level precision and recall, ensuring fair evaluation despite label variability and synonym usage.

SOAR-RNA Benchmark

Zero-shot Cell Type Annotation

Model R-1 R-2 R-L MET. B-1 B-2 BLEU
CellMarker2.0 31.88 13.36 31.76 23.83 41.07 18.05 27.23
SingleR 16.51 5.98 16.49 2.96 24.41 0.00 0.00
ScType 12.37 3.44 12.24 20.18 21.47 6.73 10.77
DeepSeek-LLM-67B 33.13 13.47 32.74 24.27 28.27 10.07 16.87
Qwen2-72B 32.39 14.76 32.05 29.96 18.59 6.67 11.13
Llama-3-70B 30.16 13.45 29.83 27.35 22.31 8.85 14.33
Mixtral-8×7B 20.95 13.40 20.78 16.94 17.61 7.16 10.23
Mixtral-8×22B 39.85 18.19 39.95 28.60 42.06 19.40 29.18
Cell2Sentence 26.87 11.48 26.76 19.45 25.24 11.79 17.25
GPT-4o mini 52.63 27.45 52.26 41.08 45.74 23.29 32.64
GPT-4o 58.45 32.07 58.12 45.39 62.85 42.68 51.79

Zero-shot Chain-of-thought Cell Type Annotation

Model R-1 R-2 R-L MET. B-1 B-2 BLEU
CellMarker2.0 - - - - - - -
SingleR - - - - - - -
ScType - - - - - - -
DeepSeek-LLM-67B 40.79 17.50 40.47 31.13 33.72 13.10 21.02
Qwen2-72B 46.56 23.93 46.34 37.09 36.85 17.92 25.69
Llama-3-70B 42.25 21.24 42.02 34.09 25.94 11.64 17.38
Mixtral-8×7B 42.37 21.37 41.82 35.45 31.57 13.83 20.90
Mixtral-8×22B 51.65 26.73 51.26 41.97 40.96 19.40 28.19
Cell2Sentence - - - - - - -
GPT-4o mini 51.63 26.60 51.17 40.84 50.29 27.89 37.45
GPT-4o 57.67 31.55 57.34 45.36 55.27 32.15 42.15

SOAR-MultiOmics Benchmark

RNA-seq

Model R-1 R-2 R-L MET. B-1 B-2 BLEU
Qwen2-72B 21.83 6.02 20.57 11.54 20.77 5.32 10.51
Llama-3-70B 27.41 11.71 27.55 17.73 27.41 10.10 16.64
Mixtral-8×7B 33.41 18.28 33.65 26.11 33.09 18.45 24.71
Mixtral-8×22B 27.66 11.29 27.67 16.27 30.63 8.06 12.90
Cell2Sentence 28.03 18.29 28.42 20.63 41.35 35.29 38.20
GPT-4o mini 39.63 21.28 39.26 29.92 37.40 21.05 28.06
GPT-4o 41.00 23.10 41.37 30.20 43.75 26.32 33.93

ATAC-seq

Model R-1 R-2 R-L MET. B-1 B-2 BLEU
Qwen2-72B 19.55 8.64 18.43 11.80 16.07 3.79 7.80
Llama-3-70B 31.76 13.01 31.83 19.39 29.84 10.23 17.47
Mixtral-8×7B 29.71 14.95 29.36 24.21 23.08 9.77 15.02
Mixtral-8×22B 30.38 13.13 30.22 19.39 24.08 8.33 12.99
Cell2Sentence 26.54 10.07 26.45 17.29 36.04 22.67 28.58
GPT-4o mini 38.15 17.01 37.06 28.08 35.46 17.14 24.66
GPT-4o 38.47 16.88 38.31 25.23 38.84 21.18 28.68

About

The source code and benchmark suite for Single-Cell Omics Arena (SOAR)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages